According to the concept of indexation adopted by Google, the completeness, objectivity of information and its compliance with the search query are taken into account when delivering results. If a site with illegal content falls into the indexing, or if the resource is intended for spam, then the pages of such a site will not be marked in the common search engine database. Itโs important for us to find out how to remove a site from the search results of the server.
Google Zero Indexing Options
As soon as the search robot - a program for collecting information about new resources - crawls the site page by page, then, in accordance with the requirements of the Google policy regarding parsing, it will be indexed. But we will also tell you how to remove your site or individual fragments for search engines using robots.txt - a pointer and a search terminator at the same time.
To exclude the entire resource from delivery, a certain text zone is created in the root folder of the server on which the site is located - the mentioned robots.txt. This area is processed by search engines and act according to the instructions read.
Keep in mind that Google will index the page even if the user is denied access to the view. When the browser gives the answer 401 or 403 "Access is invalid", this only applies to visitors, and not the collection programs for this search server.
To understand how to remove a site from search indexing, you should enter the following lines in the text index:
User-agent: Googlebot
Disallow: /
This indicates to the search engine a ban for indexing the entire content of the site. Hereโs how to delete a Google site so that the latter does not cache the resource in the list of discovered ones.
Scan options for various protocols
If you need to list individual communication standards for which you would like to apply special rules regarding Google indexing, for example, separately for the http / https hypertext protocols, this also needs to be written in robots.txt in the following way (example).
(http://yourserver.com/robots.txt) - the domain name of your site (any)
User-agent: * - for any search engine
Allow: / - allow full indexing
How to remove a site from issuing completely for the https protocol
(https://yourserver.com/robots.txt):
User-agent: *
Disallow: / full ban on indexing
Urgent removal of a resource URL from Google search results
If you do not want to wait for re-indexing, and you need to hide the site as soon as possible, I recommend using the service http://services.google.com/urlconsole/controller. Previously, robots.txt should already be located in the root directory of the site server. It should be spelled out relevant instructions.
If for some reason the pointer is not available for editing in the root directory, just create it in the folder with objects that require hiding from search engines. As soon as you do this and contact the service for automatic removal of hypertext addresses, Google will not scan folders that are registered to be removed in robots.txt.
The period of such invisibility is set for 3 months. After this period, the catalog, withdrawn from the issue, will again be processed by the Google server.
How to remove a site for scanning partially
When a search bot reads the contents of robots.txt, certain decisions are made based on its contents. Suppose you want to exclude the entire directory called anatom from showing. To do this, it is enough to write such instructions:
User-agent: Googlebot
Disallow: / anatom
Or, for example, you want all images of the .gif type not to be indexed. To do this, add the following list:
User-agent: Googlebot
Disallow: /*.gif$
Here is another example. Let it be required to remove information about dynamically generated pages from parsing, then add an entry of the type to the pointer:
User-agent: Googlebot
Disallow: / *?
So, roughly, the rules for search engines are written. Another thing is that it is much more convenient to use the META tag for all this. And webmasters more often use just such a standard that regulates the operation of search engines. But we will talk about this in the following articles.