Spider in the Web soup

  • Web Spiders or Robots are WWW search engines that "crawl" across the Internet and index pages on Web servers, catalog that information and make it available to the Internet for searching.
  • A Robots.txt file is a special text file located in your Web server's root directory that contains restrictions for Web Spiders, telling them where they have permission to search.
  • Web Robots are not required to respect Robots.txt files.
  • To disallow all Web Spiders, specify the following in the Robots.txt file:
  • User-agent: *
    Disallow: /


    More: How to Write a Robots.txt File

    Comments