SEO

What is robots.txt?

A text file at the root of a website that tells search engine crawlers which pages or files they can or cannot request from the site.

The robots.txt file is a standard used by websites to communicate with web crawlers. It specifies which areas of the site should not be crawled or indexed by search engines.

robots.txt syntax

User-agent: *
Disallow: /private/
Allow: /private/public-page.html

User-agent: Googlebot
Disallow: /no-google/

Sitemap: https://example.com/sitemap.xml

Important directives

  • User-agent: Specifies which crawler the rule applies to
  • Disallow: Tells crawlers not to access a path
  • Allow: Overrides a Disallow for a specific path
  • Sitemap: Points to your XML sitemap

robots.txt risks

A misconfigured robots.txt can accidentally block important pages from search engines. Common mistakes include:

  • Blocking the entire site with Disallow: /
  • Blocking CSS/JS files needed for rendering
  • Leaving development rules in production

How VitalSentinel handles this

Your robots.txt is your website's revenue insurance against accidental deindexing. VitalSentinel's Robots.txt Monitoring checks the file on a schedule, snapshots every change, and alerts you within hours when a Disallow: / or a blocked CSS/JS path slips into production. You find out before Google notices, not weeks later when traffic has already collapsed.

Monitor your website performance

VitalSentinel tracks Core Web Vitals and performance metrics to help you stay ahead of issues.