Posts

Showing posts from February, 2007

HOW TO prevent screen scraping

It is not difficult to screen scrape web pages & get specific portions of the page using regular expressions. This custom C# GetImages method  fetches all images from a specified web page URL. [BTW, it has a nice trick to show complete hyperlinks or image filepaths where relative paths are used. These would otherwise never link to original content from the scraped page. It uses the BASE tag to set the document's base URL. The BASE element should be used within the HEAD tag.] Some of the strategies that this whitepaper[pdf] [Google's cached HTML version] discusses to prevent scraping include proactive measures like putting a website policy forbidding scraping, limiting results, DRM, rendering and reactive measures like policing by monitoring Physical and Internet (through IP address) identities. While it alludes to rendering, one other possibility is to block access based on the referer page & render content conditionally. A page with valuable information could be ...

Inline Spell-check, Inline Search, Find As You Type

As a developer, I try to be browser agnostic and use all the 3 popular browsers (IE, Firefox, Opera) on my development machine. One good thing I liked in Firefox 2.0 is the inline spell checking feature that chips in when you are typing anything in a text area (like this blog post in Blogger) on any web page. I also find the Inline Search & Find As You Type features helpful. I got to know from the "Firefox Myths" article that this is now available for IE (5.5 and above) as well through a free Add-on .

HOW TO add a header or footer to a dynamically generated Word document

Image
There was a question recently on my CodeProject article "Dynamically generate a MS Word document using HTML & CSS" . The article describes how to generate a Word document programmatically without using any components, by exploiting the formatting features exposed through Office XML and CSS. The questioner wanted to know how to add a custom header and footer and show something like Page X of Y (total pages). Based on what I have tried so far, showing a header and footer using Office XML in MS Word is not as easy as it is in Excel . If you want to keep the code simple & don't need any great fireworks i.e. you will just settle for page numbers in the header or footer then it's a matter of adding a few lines to the original source code. The code can be viewed at this Github Gist Update (1-Sep-2010): To add a custom header & footer, check this new post To add a footer that shows Page number at the bottom right, here are the steps - 1) Add these classes...

VS.NET 2005 keyboard shortcuts

I love keyboard shortcuts, especially those which don't involve gymnastics with the finger. I have compiled a list of my favorite shortcuts over a period of time based on several online resources . One cool tip I learnt from Minh T. Nguyen's book "Visual Studio .NET Tips and Tricks" is to enable both " Show ScreenTips on Toolbars " and " Show Shortcut Keys in Screen Tips " options in the dialog box that comes up with Tools > Customize so that keyboard shortcuts associated with an icon are shown on mousing over it. This is a great way to discover new keyboard shortcuts that are available and also allow VS.NET to prompt you when you forget a shortcut & take the toolbar/menu route.

HOW TO block IFRAME based ads

The scourge of pop-up ads has been contained largely by the built-in pop-up blockers of some of the new browsers and browser toolbars . There is however a new epidemic - insensitive and intruding iframe ads. With some cunning web page formatting there are websites that trick visitors into clicking links camouflaged as genuine content with context sensitive ads. There is a luckily a simple way of blocking these iframe intrusions. The solution is to add the URL of the offending website generating the ad to the Windows HOSTS file (Windows\System32\drivers\etc\hosts) and set it to a dummy address. This tip applies to Windows based computers. This technique doesn’t work if your connection to the internet goes through a proxy server (e.g. at a school or many businesses). When you are behind a proxy, the proxy performs DNS lookups on your behalf, and the local HOSTS file is ignored.