How to reduce server load and OTAPI calls
Website traffic is made up of the number of people opening a website and viewing its pages as well as website visits by various automated robots that constantly crawl websites.
These can be both search engine robots, such as Google, Yandex, and various systems for analyzing various kinds of information. For example, a system such as Ahrefs, which is used in search promotion to analyze the external and internal factors of the site. Moreover, if you do not use such systems, this does not mean that they will not scan you. All sites are scanned to collect and analyze information.
Most of these robots do not bring any harm to the site and any benefit. But such robots can do significant financial damage to website owners using API to obtain some information for which the owner pays, as in the case of OT API Key.
There are several ways to limit the action of such robots, otherwise they can be called bots. Consider two such methods based on the User-Agent that bot transfers.
- Disallow access to the site through robots.txt.
This method is based on the fact that the bot scanning website downloads robots.txt file and, according to instructions in it (allowed or forbidden), continues further work or stops it. This method has two approaches:
- disallow access to everyone except;
- disallow access to certain bots.
The first approach is the easiest one – disallow access to everyone except Google and Yandex:
- This instruction disallows access to everyone.
User-agent: *
Disallow: /
- This instruction allows access for Google search engine bot.
User-agent: Googlebot
Allow: /
- This instruction allows access for Yandex search engine bot.
User-agent: Yandex
Allow: /
- Here is our robots.txt file if we want to block access to everyone except Google and Yandex:
User-agent: Googlebot
Allow: /
User-agent: Yandex
Allow: /
User-agent: *
Disallow: /
Under such a scheme you can open access only to those bots that you need. Don’t forget that there are separate image search bots. Please contact SEO specialists for more detailed information.
It may also be useful to use the directive:
Crawl-Delay: 10
It specifies how often in seconds you allow to make a request to the site. It can be applied both to general block and to a specific bot.
For example:
User-agent: Yandex
Crawl-Delay: 10
Allow: /
This will mean that you allow Yandex bot to scan the site but not more than 1 request per 10 seconds.
The second approach in robots.txt is that you, on the contrary, allow access to everyone except certain bots.
For example:
User-agent: SemrushBot
Disallow: /
User-agent: *
Crawl-Delay: 10
Allow: /
In this case everyone is allowed to scan the site except SemrushBot, no more than 1 request per 6 seconds.
- Disallow via .htaccess file
At its core, this method is similar to the previous one but decision for website access is made by your web server instead of the bot itself.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} (SemrushBot|SolomonoBot|MJ12bot|Ezooms|AhrefsBot|DataForSeoBot|DotBot|Purebot|PetalBot|LinkpadBot|WebDataStats|Jooblebot|Baiduspider|BackupLand|NetcraftSurveyAgent|netEstate|rogerbot|exabot|gigabot|sitebot|Slurp) [NC]
RewriteRule .* – [F,L]
Such construction will close access to all bots with the listed User-Agent. You can make your own list.
Be careful because errors while editing .htaccess can make your site inoperable. Also not all web servers support instructions via ,htaccess. You can get more detailed information from your system administrator or hosting support.
Everyone chooses which option is suitable for the site.
Using robots.txt is more versatile and easier. However, it is based on fair play of search bot, which may ignore these instructions.
How to determine who gives a big load to the site?
You need to constantly analyze web server logs. System administrator or hosting support of your website will advise how to find them.
There are different programs for analyzing logs. For example, WebLog Expert. It has a thirty days’ trial period.
For example:
These are the logs of one of our clients for 7 days. As you can see, PetalBot made over 400,000 requests to the site. Potentially these are all paid calls.
Semrushbot is in second place with 150,000 requests.
Really useful Google and Yandex come in the third-forth places with more modest results.
Such analysis should be carried out not only by User-Agent, but also by IP address from which the site was entered. It is worth blocking access to the site from a certain IP address if there is an abnormally large number of requests from it over a long period of time and you don’t know who it belongs to. However, be careful about this. It’s better to entrust this operation to a specialist. For example, this suspicious IP (as you think) may belong to Google. In this case you will block access to your site for this search engine and it will affect your positions in Google search.
You can add the following bots to your blacklist without consequences for the site: SemrushBot, SolomonoBot, MJ12bot, Ezooms, AhrefsBot, DataForSeoBot, DotBot, Purebot, PetalBot, LinkpadBot, WebDataStats, Jooblebot, Baiduspider, BackupLand, NetcraftSurveyAgent, netEstate, rogerbot, exabot, gigabot, sitebot, Slurp.
Baiduspider is Baidu search engine bot. Slurp is how Yahoo identifies itself.