<body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener('load', function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <div id="navbar-iframe-container"></div> <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script> <script type="text/javascript"> gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() { if (gapi.iframes && gapi.iframes.getContext) { gapi.iframes.getContext().openChild({ url: 'https://www.blogger.com/navbar.g?targetBlogID\x3d8211560\x26blogName\x3dTech+Tips,+Tricks+%26+Trivia\x26publishMode\x3dPUBLISH_MODE_BLOGSPOT\x26navbarType\x3dBLUE\x26layoutType\x3dCLASSIC\x26searchRoot\x3dhttp://mvark.blogspot.com/search\x26blogLocale\x3den\x26v\x3d2\x26homepageUrl\x3dhttp://mvark.blogspot.com/\x26vt\x3d-5147029996388199615', where: document.getElementById("navbar-iframe-container"), id: "navbar-iframe" }); } }); </script>

Tech Tips, Tricks & Trivia

by 'Anil' Radhakrishna
An architect's notes, experiments, discoveries and annotated bookmarks.

Search from over a hundred HOW TO articles, Tips and Tricks

Screen scrape with jQuery, AJAX, JSONP & YQL

Since reading this excellent article about scraping content from a Wikipedia page using Yahoo! Query Language (YQL) as a proxy for cross-domain Ajax, I'm hooked to YQL. YQL helps in circumventing the same-origin policy that prevents a script loaded from one domain from getting or manipulating properties of a document from another domain.  YQL has been around for about 2 years now & last year Yahoo introduced the capability to execute the tables of data built through YQL using JavaScript.

Ajax, jQuery, JSONP (JSON with Padding) & YQL make a heady combination - check Christian Heilmann's code samples.

Some facts about YQL from around the Web (work in progress) -
* YQL is a hosted web service that can scrape HTML for you. It also runs the HTML through HTML Tidy and caches it for you.
* It only returns the body content of the HTML - so no styling (other than inline styles) will get through.
* ...it treats the info on the web as a virtual table that developers can manipulate in a standardized way, regardless of the API that data came from.
* YQL understands and supports data sources like RSS, Atom, JSON, XML, CSV, HTML, Flickr, Yahoo! Finance, Weather, and so on.
* ...makes client-side mashups possible without using server-side proxies.
* Usage Limits:
Per application limit (identified by your Access Key): 100,000 calls per day
Per IP limits: /v1/public/*: 1,000 calls per hour; /v1/yql/*: 10,000 calls per hour

Also see:
HOW TO prevent screen scraping 
Google Spreadsheets functions for scraping external data

Labels: , , ,

Tweet this | Google+ it | Share on FB

« Home | Next »
| Next »
| Next »
| Next »
| Next »
| Next »
| Next »
| Next »
| Next »
| Next »


Post a Comment