Does Disallow in robots.txt remove a page from Google?

No. Disallow only asks crawlers not to fetch the URL; it does not prevent indexing. A blocked page can still be listed (often without a description) if other pages link to it. To remove a page from the index, allow crawling and add a noindex directive, or use the URL removal tools.

Where must robots.txt be placed?

It must live at the root of the host, served at /robots.txt over the same protocol and host you want it to govern. Each subdomain and protocol can have its own file, and crawlers will not look for it in subdirectories.

Should I block my staging site with robots.txt?

Blocking with Disallow: / helps, but the most reliable approach for staging is HTTP authentication or a site-wide noindex header, since robots.txt does not stop indexing of linked URLs and is publicly readable.

Can I include my sitemap in robots.txt?

Yes. Adding a Sitemap: line with the absolute URL of your XML sitemap helps crawlers discover your pages faster. You can list multiple Sitemap lines, and they are independent of any User-agent group.

Robots.txt Generator - Free Online Tool

Robots.txt Generator

Generate a robots.txt file for a public site, a blocked staging site, or a common WordPress-style setup.

Preset

Site URL

User-Agent

Crawl Delay

Custom Disallow Paths

Custom Allow Paths

Include sitemap line

Summary

User Agents

Disallow Rules

Allow Rules

Sitemaps

Generated robots.txt

Upload this plain-text file to your site root.

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

About the Robots.txt Generator

The Robots.txt Generator builds a valid robots.txt file that tells search engine crawlers which parts of your site they may or may not request. It produces the standard directives — User-agent, Allow, Disallow, and Sitemap — for common scenarios such as a fully public production site, a staging or development environment that should be blocked entirely, or a site that needs to hide admin paths, search results, and faceted URLs while keeping the rest crawlable.

Robots.txt lives at the root of a domain (for example, https://example.com/robots.txt) and is the first file most well-behaved crawlers fetch. The generator lets you target all bots with User-agent: * or single out specific ones like Googlebot or Bingbot, then layer Disallow rules to exclude directories and Allow rules to carve out exceptions. It also helps you add a Sitemap line pointing to your XML sitemap, which speeds discovery of your indexable URLs.

Use it when launching a new site to keep a staging copy out of the index (a global Disallow: /), when an existing site is leaking low-value URLs into search results, or when you simply want a clean, correctly formatted file rather than hand-editing syntax. It's also handy for documenting crawl policy in one place and for generating per-bot rules when you want to allow Google but slow down or block aggressive scrapers.

Important caveats: robots.txt controls crawling, not indexing — a disallowed URL that is linked from elsewhere can still appear in results without a snippet, so use a noindex meta tag or HTTP header to truly keep a page out of the index. Remember that the file is publicly visible, so never rely on it to hide sensitive paths; protect those with authentication instead. After generating, validate the rules against your live URLs and confirm the Sitemap line and your Canonical URL strategy stay consistent.

Frequently asked questions

Does Disallow in robots.txt remove a page from Google?: No. Disallow only asks crawlers not to fetch the URL; it does not prevent indexing. A blocked page can still be listed (often without a description) if other pages link to it. To remove a page from the index, allow crawling and add a noindex directive, or use the URL removal tools.
Where must robots.txt be placed?: It must live at the root of the host, served at /robots.txt over the same protocol and host you want it to govern. Each subdomain and protocol can have its own file, and crawlers will not look for it in subdirectories.
Should I block my staging site with robots.txt?: Blocking with Disallow: / helps, but the most reliable approach for staging is HTTP authentication or a site-wide noindex header, since robots.txt does not stop indexing of linked URLs and is publicly readable.
Can I include my sitemap in robots.txt?: Yes. Adding a Sitemap: line with the absolute URL of your XML sitemap helps crawlers discover your pages faster. You can list multiple Sitemap lines, and they are independent of any User-agent group.

Related Tools

Robots.txt Checker

Check robots.txt file

Sitemap Checker

Parse XML sitemaps and sitemap index files

Meta Tags Extractor

Extract meta tags from HTML

Favicon Checker

Find favicon for a website

Link Checker

Check for broken links on a page

Image Optimizer Checker

Check if images are optimized