Screen scrape with jQuery, AJAX, JSONP & YQL

Since reading this excellent article about scraping content from a Wikipedia page using Yahoo! Query Language (YQL) as a proxy for cross-domain Ajax, I'm hooked to YQL. YQL helps in circumventing the same-origin policy that prevents a script loaded from one domain from getting or manipulating properties of a document from another domain.  YQL has been around for about 2 years now & last year Yahoo introduced the capability to execute the tables of data built through YQL using JavaScript.

Ajax, jQuery, JSONP (JSON with Padding) & YQL make a heady combination - check Christian Heilmann's code samples.

Some facts about YQL from around the Web (work in progress) -
* YQL is a hosted web service that can scrape HTML for you. It also runs the HTML through HTML Tidy and caches it for you.
* It only returns the body content of the HTML - so no styling (other than inline styles) will get through.
* ...it treats the info on the web as a virtual table that developers can manipulate in a standardized way, regardless of the API that data came from.
* YQL understands and supports data sources like RSS, Atom, JSON, XML, CSV, HTML, Flickr, Yahoo! Finance, Weather, and so on.
* ...makes client-side mashups possible without using server-side proxies.
* Usage Limits:
Per application limit (identified by your Access Key): 100,000 calls per day
Per IP limits: /v1/public/*: 1,000 calls per hour; /v1/yql/*: 10,000 calls per hour


Also see:
HOW TO prevent screen scraping 
Google Spreadsheets functions for scraping external data
77

Comments

Popular posts from this blog

Datawrapper Makes Data Beautiful & Insightful

GitHub Copilot Q&A - 1

This Week I Learned - Week #3 2025