Spider in the Web soup

  • Web Spiders or Robots are WWW search engines that "crawl" across the Internet and index pages on Web servers, catalog that information and make it available to the Internet for searching.
  • A Robots.txt file is a special text file located in your Web server's root directory that contains restrictions for Web Spiders, telling them where they have permission to search.
  • Web Robots are not required to respect Robots.txt files.
  • To disallow all Web Spiders, specify the following in the Robots.txt file:
  • User-agent: *
    Disallow: /


    More: How to Write a Robots.txt File

    Comments

    Popular posts from this blog

    Things Near Me – Find & Learn About Landmarks Nearby

    Google Mobilizer Bookmarklet - view just text of web pages

    This Week I Learned - Week #9 2025