Optimizing an e-commerce by working on the robots.txt file to make the most of the crawl budget, this is in a nutshell the job I want to present in this post.
When I do SEO optimization for e-commerce there is an activity that gives me particular satisfaction and it is precisely the configuration of the robots file to optimize the crawl budget, an activity that serves both e-commerce businesses that have thousands of pages and to new ones that have few pages but a still small crawl budget.
One of the big problems that e-commerce sites have is the automatic and uncontrolled generation of URLs due to filters, wishilists, send to a friend, etc., all of which are excellent tools for the user but have the flaw of generating an infinite number of of useless and duplicate pages.
The consequence is that search engines find themselves scanning thousands of useless pages that have no use for positioning purposes as they offer no added value to users looking for a product, they are only useful to the user who independently decides to use the filter or wishlist.
The final outcome is a waste of a crawl budget that can lead to the inability of the crawlers to scan the important pages of the shop as products and categories, with the result of perhaps being indexed only 1,000 pages against a catalog of 5,000 products.
The attached image shows the response codes detected by Screaming Frog in a scan that simulates Googlebot, before and after getting their hands on the robots file.
In the left part of the image we see a preponderance of the red zone, hundreds of thousands of pages without response code. These are just the url generated automatically by the innumerable combinations of filters, an enormous quantity compared to the small green portion that represents the real and relevant contents (pages and images).
The circular graph clearly indicates the seriousness of the problem: 90% of the pages scanned by the crawler are useless for positioning purposes.
The second graph with preponderance of the green zone is instead the result of the same site analyzed after the configuration of the robots file with a very different outcome: 99% of the scanned pages respond with a code 200 which means OK.
The work done allowed us to
It should be considered that in the first scan a robots file with some basic settings was already active, it was not exactly at a zero level!
Another interesting fact is the time taken to complete the scan, performed with the same computer and with the same connection:
about 84% less time for the same website!