Portada » Everything you need to know about the Robots.txt file in WordPress

Everything you need to know about the Robots.txt file in WordPress

Q: How to know if a website has a robots.txt file?

To verify the existence of a robots.txt file on a website, enter the site URL followed by /robots.txt in your browser.

Q: Where to find the robots.txt file?

The robots.txt file is generally located in the root directory of the site. Access it via the site URL followed by /robots.txt.

Q: How can I verify if my robots.txt file is well-configured?

Use online tools like TechnicalSEO or Ryte to validate the settings of your robots.txt file.

Q: What commands are important in a robots.txt file?

The key commands are User-agent, which defines which robots the instruction is directed to, and Disallow, which indicates non-crawlable paths. Allow allows crawling of specific content.

Q: What mistakes are commonly made with the robots.txt file?

A common mistake is to use robots.txt to prevent indexing of pages in Google. To do that, it's better to use the meta noindex tag.

Q: How can I create or modify a robots.txt file in WordPress?

Plugins like Yoast SEO, Rank Math, and All in One SEO Pack allow you to create or modify the robots.txt file in WordPress.

Q: How can I view my robots.txt file in WordPress?

Access your robots.txt file at https://your-domain.com/robots.txt or from the WordPress admin panel if you use an SEO plugin.

06 September, 2023

Alejandro Frades

Todo lo que necesitas saber sobre el archivo Robots.txt en WordPress Modular

Certainly, I can translate the post to English while keeping the HTML tags intact. Here’s your translated post:

Search Engine Optimization (SEO) is an essential part in web development, especially for websites built on WordPress. One of the crucial elements for controlling how search engines interact with your website is the robots.txt file. This file allows you to control which pages on your website you want to block or allow for crawling. Through this article, we will explore how you can create and edit the robots.txt file on your WordPress site.

Tabla de contenidos

What is the Robots.txt File and Where Is It Located?

The robots.txt file is simply a text file placed in the root of your website. It serves as a kind of “site map” for search engines, telling them which areas of your site should be explored and which should not. You can find this file by accessing the control panel of your WordPress hosting.

When Is It Useful to Implement a Robots.txt File?

The use of a robots.txt file is useful when you want to block access to certain pages or resources on your site. For example, if you have a folder containing development files that you don’t want to appear in search results, you can use robots.txt to prevent this.

How Can I View My Robots.txt?

To view the robots.txt file of your website, you simply need to add /robots.txt at the end of your website’s URL. For example, if your website is https://www.example.com, you can view your robots.txt file by visiting https://www.example.com/robots.txt in your web browser.
If you get a 404 error, it is most likely that you don’t have one; this doesn’t mean it’s bad, simply that you don’t have one. Robots.txt is necessary ONLY when we want to block crawling access to some pages or sectors of the web.

How to Create, Use, and Update a Robots.txt File

A robots.txt file is a text file located at the root of a website, and it serves to indicate to search engine crawlers which pages they can crawl and which they cannot. These files are based on a set of commands and guidelines that crawlers must follow.

Creating a Robots.txt File

To create a robots.txt file, follow these steps:

Open a plain text editor, such as Notepad on Windows or TextEdit on macOS.
Create a new file and save it as “robots.txt”.
Add the commands and guidelines you wish to the file.

Generic code:

User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /

Sitemap: https://www.example.com/sitemap.xml

Using a Robots.txt File

Once you have created a robots.txt file, you must upload it to the root of your website. Search engine crawlers will follow the instructions in the robots.txt file each time they crawl your site.

Commands and Guidelines for Robots.txt

Robots.txt files use a set of commands and guidelines to indicate to crawlers which pages they can crawl and which they cannot.
Robots.txt Commands

The most common robots.txt commands are:

User-agent: Indicates the type of crawler to which the command applies. For example, the command User-agent: Googlebot applies to Google’s crawler.
Disallow: Tells the crawler not to crawl the specified URL or directory. For example, the command Disallow: /images/ tells the crawler not to crawl the /images/ directory.
Allow: Tells the crawler to crawl the specified URL or directory. For example, the command Allow: /admin/ tells the crawler to crawl the /admin/ directory.
Sitemap: Tells the crawler the URL of a website’s sitemap. A sitemap is a file that provides a list of all the pages on a website.

Robots.txt Guidelines

Robots.txt guidelines are instructions that crawlers must follow, but they are not as strict as commands. The most common guidelines are:

Crawl-delay: Tells the crawler the amount of time it must wait between each request. For example, the command Crawl-delay: 10 tells the crawler to wait 10 seconds between each request.
Cache-control: Tells the crawler how long it should cache the content of a page. For example, the command Cache-control: max-age=300 tells the crawler to cache the content of a page for 300 seconds.
Host: Tells the crawler the hostname to use for crawling a page. For example, the command Host: www.example.com tells the crawler to use the hostname www.example.com for crawling a page.

Updating a Robots.txt File

If you modify your website, it is important to update your robots.txt file to reflect the changes. To update a robots.txt file, follow these steps:

Open the robots.txt file with a plain text editor.
Make the necessary changes.
Save the changes and upload the file to the root of your website.

Plugins for Editing or Accessing the Robots.txt

There are several plugins in WordPress that allow you to create or modify the robots.txt file directly from the admin panel. Here I describe how you could do it with some popular plugins:

Yoast SEO

Install and activate the Yoast SEO plugin from the WordPress dashboard.
Go to “SEO” in the sidebar menu and then select “Tools”.
Find and select the “File editor” option.
Here you can view your current robots.txt file and make modifications, or create a new one if it doesn’t exist.

All in One SEO Pack

Install and activate the All in One SEO Pack plugin.
In the WordPress sidebar menu, go to “All in One SEO” and then to “File Editor”.
Here you can edit your robots.txt file or create a new one.

Rank Math

Install and activate the Rank Math plugin.
Go to “Rank Math” in the sidebar menu and then select “Tools”.
Find and select “File Editor”.
This is where you can modify your robots.txt file.

WP Robots Txt

Install and activate the WP Robots Txt plugin.
Navigate to “Settings” and then to “Reading”.
You will see a section where you can edit your robots.txt file.

Examples of Use and Common Rules in Robots.txt

Block all images

User-agent: *
Disallow: /images/

This robots.txt file blocks all image content on the website.

Block a specific directory

User-agent: *
Disallow: /admin/

This robots.txt file blocks the /admin/ directory on the website.

Allow a specific directory

User-agent: *
Allow: /admin/

This robots.txt file allows the /admin/ directory on the website.

Specify the hostname

User-agent: *
Host: www.example.com

This robots.txt file tells the crawler to use the hostname www.example.com to crawl the website.

Specify the crawl rate

User-agent: *
Crawl-delay: 10

This robots.txt file tells the crawler to wait 10 seconds between each request.

Specify cache duration

User-agent: *
Cache-control: max-age=300

Block GPT

User-agent: GPTBot
Disallow: /

Block all pages containing the word “private”

User-agent: *
Disallow: /*private*/

This robots.txt file blocks all pages containing the word “private” in their URL.

Block all pages containing the “.pdf” extension

User-agent: *
Disallow: /*.pdf$

This robots.txt file blocks all pages that have the “.pdf” extension in their URL.

Allow all pages containing the word “blog”

User-agent: *
Allow: /*blog*/

Allow all pages with the “.png” or “.jpg” extension

User-agent: *
Allow: /*.png$
Allow: /*.jpg$

Tool to Verify the Correctness of Your Robots.txt File

TechnicalSEO Robots.txt Validator: This online tool is easy to use and provides a detailed analysis of your robots.txt file. You can access it by visiting TechnicalSEO Robots.txt Validator.
Google Search Console: This free Google tool allows you to verify various aspects of your website, including the robots.txt file. Simply sign in with your Google account and follow the instructions to add your website. Robots Testing Tool
Ryte Free Tools: Offers a variety of free tools for website analysis, including a robots.txt checker. Access the tool by visiting Ryte Free Tools.

How to View the Robots.txt File of Other Websites

The robots.txt file of a website is generally public and is placed in the root directory of the domain. You can view this file for any website by following the steps detailed below:

Open Web Browser: Open your favorite web browser.
Enter the Website URL: Type the URL of the website whose robots.txt file you want to view. Make sure to go up to the domain, without any additional page or path.
Add /robots.txt at the End of the URL: Once you are on the main domain (for example, https://www.example.com), add /robots.txt at the end of the URL. This should look something like: https://www.example.com/robots.txt.
Press Enter: After adding /robots.txt, press Enter to load the page.

If the site has a robots.txt file, you should be able to see it. If you receive a 404 error, that generally means the site does not have a robots.txt file.

Error: Using the Robots.txt File to Control Indexing in Search Results

The robots.txt file is useful for guiding search engine crawlers on which parts of a website can be crawled and which cannot. However, a common mistake is to use this file with the intent of preventing certain pages from appearing in Google or other search engine results.
While it’s true that a Disallow in the robots.txt file can prevent search engines from indexing a page, it’s not a guarantee that the page will not appear in search results. Sometimes, the search engine can still index the URL and display it in search results, even if it doesn’t crawl the actual content of the page.

Therefore, it’s a good idea to use other methods, such as a noindex meta tag, to control how your pages are indexed by search engines.

Final Conclusions

In summary, the robots.txt file is a powerful but delicate tool that can significantly influence how search engines interact with your website. Using it correctly can improve crawling efficiency and help direct attention to the pages that really matter. However, an error in its configuration could result in indexing issues or the exposure of pages you’d prefer to keep private. Given its impact, it is crucial to understand its commands well, avoid common mistakes, and make use of verification tools to ensure everything is working as expected.

If you want to take your WordPress page’s SEO to the next level, feel free to consult our list of the 5 best SEO plugins for WordPress. In this article, you’ll find plugin options that will allow you not only to manage your robots.txt file but also optimize many other important aspects for your site’s positioning.

Frequently Asked Questions about robots.txt

How to know if a website has a robots.txt file?

To find out if a website has a robots.txt file, simply go to your browser and type the website URL followed by /robots.txt. For example: https://www.example.com/robots.txt. If a text file appears, it means the website has a robots.txt file.

Where to find the robots.txt file?

Normally, the robots.txt file is placed in the root directory of the website. You can access it by typing the website URL followed by /robots.txt in your browser’s address bar.

How can I verify if my robots.txt file is well-configured?

You can use free online tools to validate your robots.txt file’s settings. Some options are TechnicalSEO and Ryte.

What commands are important in a robots.txt file?

The most common and useful commands are User-agent, which specifies which robots the instruction is directed to, and Disallow, which indicates the paths that should not be crawled. Allow is another command that explicitly states what content can be crawled.

What mistakes are commonly made with the robots.txt file?

One of the most common mistakes is using the robots.txt file to prevent certain pages from being indexed in Google’s search results. Although it can prevent crawling, it doesn’t guarantee the page won’t be indexed. If you want to prevent indexing, it’s better to use the meta noindex tag.

How can I create or modify a robots.txt file in WordPress?

There are several plugins available for WordPress that allow you to easily create or modify a robots.txt file. Some of the most popular are Yoast SEO, Rank Math, and All in One SEO Pack.

How can I view my robots.txt file in WordPress?

You can usually view your robots.txt file by navigating to https://your-domain.com/robots.txt. If you are using an SEO plugin like Yoast, you can also view and edit the file from the WordPress admin panel.

Autor