Screen scrape with jQuery, AJAX, JSONP & YQL
Since reading this excellent article about scraping content from a Wikipedia page using Yahoo! Query Language (YQL) as a proxy for cross-domain Ajax, I'm hooked to YQL. YQL helps in circumventing the same-origin policy that prevents a script loaded from one domain from getting or manipulating properties of a document from another domain. YQL has been around for about 2 years now & last year Yahoo introduced the capability to execute the tables of data built through YQL using JavaScript.
Ajax, jQuery, JSONP (JSON with Padding) & YQL make a heady combination - check Christian Heilmann's code samples.
Some facts about YQL from around the Web (work in progress) -
* YQL is a hosted web service that can scrape HTML for you. It also runs the HTML through HTML Tidy and caches it for you.
* It only returns the body content of the HTML - so no styling (other than inline styles) will get through.
* ...it treats the info on the web as a virtual table that developers can manipulate in a standardized way, regardless of the API that data came from.
* YQL understands and supports data sources like RSS, Atom, JSON, XML, CSV, HTML, Flickr, Yahoo! Finance, Weather, and so on.
* ...makes client-side mashups possible without using server-side proxies.
* Usage Limits:
Per application limit (identified by your Access Key): 100,000 calls per day
Per IP limits: /v1/public/*: 1,000 calls per hour; /v1/yql/*: 10,000 calls per hour
Also see:
HOW TO prevent screen scraping
Google Spreadsheets functions for scraping external data
Ajax, jQuery, JSONP (JSON with Padding) & YQL make a heady combination - check Christian Heilmann's code samples.
Some facts about YQL from around the Web (work in progress) -
* YQL is a hosted web service that can scrape HTML for you. It also runs the HTML through HTML Tidy and caches it for you.
* It only returns the body content of the HTML - so no styling (other than inline styles) will get through.
* ...it treats the info on the web as a virtual table that developers can manipulate in a standardized way, regardless of the API that data came from.
* YQL understands and supports data sources like RSS, Atom, JSON, XML, CSV, HTML, Flickr, Yahoo! Finance, Weather, and so on.
* ...makes client-side mashups possible without using server-side proxies.
* Usage Limits:
Per application limit (identified by your Access Key): 100,000 calls per day
Per IP limits: /v1/public/*: 1,000 calls per hour; /v1/yql/*: 10,000 calls per hour
Also see:
HOW TO prevent screen scraping
Google Spreadsheets functions for scraping external data
77
Comments
Post a Comment