Robots.txt Generator
User Agents
1
Disallow Rules
0
Allow Rules
1
Sitemaps
1
About the Robots.txt Generator
The Robots.txt Generator builds a valid robots.txt file that tells search engine crawlers which parts of your site they may or may not request. It produces the standard directives — User-agent, Allow, Disallow, and Sitemap — for common scenarios such as a fully public production site, a staging or development environment that should be blocked entirely, or a site that needs to hide admin paths, search results, and faceted URLs while keeping the rest crawlable.
Robots.txt lives at the root of a domain (for example, https://example.com/robots.txt) and is the first file most well-behaved crawlers fetch. The generator lets you target all bots with User-agent: * or single out specific ones like Googlebot or Bingbot, then layer Disallow rules to exclude directories and Allow rules to carve out exceptions. It also helps you add a Sitemap line pointing to your XML sitemap, which speeds discovery of your indexable URLs.
Use it when launching a new site to keep a staging copy out of the index (a global Disallow: /), when an existing site is leaking low-value URLs into search results, or when you simply want a clean, correctly formatted file rather than hand-editing syntax. It's also handy for documenting crawl policy in one place and for generating per-bot rules when you want to allow Google but slow down or block aggressive scrapers.
Important caveats: robots.txt controls crawling, not indexing — a disallowed URL that is linked from elsewhere can still appear in results without a snippet, so use a noindex meta tag or HTTP header to truly keep a page out of the index. Remember that the file is publicly visible, so never rely on it to hide sensitive paths; protect those with authentication instead. After generating, validate the rules against your live URLs and confirm the Sitemap line and your Canonical URL strategy stay consistent.
Frequently asked questions
- Does Disallow in robots.txt remove a page from Google?
- No. Disallow only asks crawlers not to fetch the URL; it does not prevent indexing. A blocked page can still be listed (often without a description) if other pages link to it. To remove a page from the index, allow crawling and add a noindex directive, or use the URL removal tools.
- Where must robots.txt be placed?
- It must live at the root of the host, served at /robots.txt over the same protocol and host you want it to govern. Each subdomain and protocol can have its own file, and crawlers will not look for it in subdirectories.
- Should I block my staging site with robots.txt?
- Blocking with Disallow: / helps, but the most reliable approach for staging is HTTP authentication or a site-wide noindex header, since robots.txt does not stop indexing of linked URLs and is publicly readable.
- Can I include my sitemap in robots.txt?
- Yes. Adding a Sitemap: line with the absolute URL of your XML sitemap helps crawlers discover your pages faster. You can list multiple Sitemap lines, and they are independent of any User-agent group.