Published: 30/04/2020

Block Bot Traffic

Why is it necessary to block bot traffic? Bots aren’t exclusively all bad or all good. This is why a blanket, rules or reputation-based block approach is not always effective as a standalone solution in the fight against bots. So how should you block bad bot traffic?

Rules and reputation-based approaches such as WAFs and CDNs will only mitigate known bot threats and not the growing number of technically advanced bots that emulate human behavior.

The capacity to mimic human behavior leaves conventional bot management solutions analyzing visitors’ mouse movements and click patterns ineffective. These solutions are often reliant on additional third-party code that bot operators can easily identify and circumvent. The use of external code also leaves users exposed to privacy risks.

What is bot traffic?

Bot traffic is web traffic that is generated by bots, good or bad. Specifically, bots are designed to create traffic on a website artificially.

Why is blocking bot traffic necessary?

Any business with web-facing systems is a potential target of bad bots, which automate attacks such as scraping web content, scalping items for resale, card cracking, and automated account takeover. Relying on rules or client-side detection alone will only focus efforts on known bad bots as attackers continue to spread their malicious traffic over thousands of new domains with every bot iteration.

What are the key signs of bot traffic?

Unusually high activity: Unexpected spikes in views could indicate bot activity.
Failed logins: A high volume of failed login attempts is a sign of a credential stuffing bot attack.
Abnormal bounce rate: A spike in bounce rate could suggest that bots are landing on a single page.
Inconsistent session durations: Session duration will remain roughly consistent, therefore if a bot is on your page, fast bot browsing may be undertaken.
Fake conversions: A spike in fake conversions could be bots.
Unexpected traffic sources: An influx of users from unusual regions could signify bot traffic.

How can bot traffic be bad for business?

Bot traffic can be bad for businesses for a variety of reasons, for example, you could face:

Ad Revenue Loss: Bots that click on-site can lead to click fraud. This may temporarily boost ad earnings, however, it is not uncommon for advertising networks to detect this and therefore ban the website, which could financially harm you.
Inventory Disruption: Sites which only have a certain amount of inventory may have to deal with inventory hoarding bots. These bots load up shopping baskets, making products unavailable to real shoppers, and causing a potential unnecessary restock.
Fraud and Account Loss: Bots can be used to hijack user accounts, leading to the loss of personally identifiable information (PII), payment details, and enabling fraudulent purchases or theft of loyalty points.
Damaged Reputation: Bots can directly affect customer experience, for example by scalping the whole inventory of desirable items and forcing customers to pay more on secondary markets; stealing access to their accounts and point balances; and even disrupting the availability of services by overwhelming the site with traffic. All these leave a negative impression of businesses with customers.

How can bot traffic hurt performance?

Bot traffic is automated and can be launched at huge volumes, which can overwhelm web servers leading to the site becoming slow for legitimate users or even crashing.

Mitigating vs. blocking bad bots

Sometimes you may not wish to block all bots, because:

You may want to avoid false positives to ensure no genuine user is mistakenly blocked.
You may not want a bot to know that it has been detected and blocked to avoid it retooling.

Why can’t bot traffic be stopped by a conventional SaaS web application firewall (WAF)?

Web Application Firewalls are an important layer of defence against malicious HTTP requests, particularly exploits such as cross-site scripting (XSS) and SQL injection attacks. They use rulesets to filter out this traffic which, if it got though, would cause disaster.

However, modern bots have evolved way beyond the reach of WAF protection. They hide within human traffic and mimic the appearance and behaviors of normal web requests, slipping through the WAF with ease.

Rule-based solutions simply can’t keep pace with ever-changing bots that instantly adapt to anti-bot measures. It’s arbitrary for a bot to proxy its connection through hundreds of different origins and emulate any device, making WAFs ineffective as a means of blocking today’s bots.

What can you do to block bot traffic?

To block sophisticated attacks that evade standard defenses, you need a modern bot management solution to first detect bot activity and identify which requests are automated. This can be done much more efficiently and accurately using machine learning algorithms to monitor malicious activity, which is much harder to spoof than device fingerprints. Once detected, bots can be blocked outright or presented with a challenge such as CAPTCHA or proof of work.

How should you counter the bot threat?

To protect your website from bad bot traffic, you should use server-side bot defense to mitigate these threats at the source, preventing them from reaching your servers. Bot defenses should also allow you to identify and block bad bots without relying on reputation data alone, such as those which rely on external browser plugins whose purpose is not easily identifiable. This would require a more advanced bot defence that aggressively scans bot traffic as it’s sent to your servers.

What are the popular methods of blocking bot traffic?

To block bot traffic, bot defenders must have the ability to anticipate bot activity by identifying and blocking bad bot traffic before it can cause damage. There are many methods bot defenses commonly use to identify and block bad bot traffic:

Using blacklists and whitelists

The most basic form of bot defenses relies on the use of blacklists and whitelists to block unwanted bot traffic. Blacklists contain a list of domains or IPs known to send bad bot traffic, while whitelists contain a list of domains known to be safe. When these lists are used as the basis for an Apache module or iptables rule, any requests coming from IPs in the blacklist will result in being blocked; allowing you to block entire networks of bots from sending bad bot traffic your way.

Using reputation scores

Sophisticated and advanced today’s reputation-based solutions (such as Google’s reCAPTCHA) analyze activity based on user experience rather than relying solely on rules to determine whether or not a request is legitimate.

They assign scores to requests based on combinations of user and browser characteristics, allowing them to effectively detect bad bot traffic even if it’s being sent from IP addresses not listed under their blacklist.

Using commercial solutions

Commercial content filters are used all over the world to block a wide variety of different threats that might appear on your site such as adult materials, reverse engineering and pharming; but they don’t necessarily work well for blocking bad bot traffic as they often rely solely on reputation data alone which can be circumvented by bots.

Using geolocation filters

Geolocation-based solutions are commonly used in today’s IT world as they only allow users in specific geographic regions to access website content; such as restricting US users from viewing European content or vice versa.

This works well for basic bot traffic but it’s arbitrary for more diligent bot operators to reroute their traffic via a proxy so it appears to come from any given location, bypassing geolocation filters.

Using limits on requests per IP

If your hosting provider allows you to set limits on the number of requests per second and/or requests per minute allowed from a specific IP, then you can limit unwanted bot traffic before it even reaches your website by setting an extremely high limit to effectively block any actual users trying to access your content.

Again, bots can bypass this defense by rotating their origin IP after every few requests to spread their attack across IP addresses.

Using browser fingerprint filters

Browser fingerprinting relies on monitoring a variety of factors including the HTTP referrer, operating system, HTTP headers and plugins installed into your clients’ browsers to detect whether or not a bot is generating the website activity.

Fingerprinting detection methods like these rely on client-side data collection using additional JavaScript or SDK code installed on a website or mobile app. These can cause performance and privacy issues, and fingerprints can also be spoofed or stolen and used to mask bots as human to bypass filters.

Using machine learning

Using this method is similar to using IP address restrictions, but it can be more accurate and maintainable because it monitors a website’s traffic to identify bot behavior without restrictions and then classifies the results as either bot or human based on data collected from many different websites over time; rather than relying on simplistic rules that are easily tricked by bots.

By gathering enough data about how both humans and bots behave, you can create an optimized knowledge base that serves as the foundation for detecting bad bots through deep learning algorithms. Deep learning algorithms are usually machine-based tools that try to mimic the way humans think to find patterns of malicious activity within server logs.

They work by comparing your current website’s traffic to known samples of bot behavior, which models are trained to rapidly detect. Machine learning can also be used to cluster similar behaviors together, which requires no prior data or training but can quickly identify anomalies and potentially previously unseen attacks.

Using CAPTCHA

This method is often used by large companies like Google, Facebook and Twitter to prevent human users from being attacked by bots that are designed specifically to post spam on their platforms. The best way to use this method is to create a simple test that can quickly be solved by humans but is not easy enough for even the most advanced bots to solve; because if you make it too hard then you will end up driving legitimate traffic away from your site.

Using web application firewalls (WAF)

WAFs are designed to monitor server logs to detect malicious activity such as SQL injections, HTTP floods, bot traffic and more; which means they can also detect advanced threats that other methods may not be able to, such as DDoS (Distributed Denial of Service) attacks. However, WAFs are easily bypassed by most modern bots using basic anti-bot techniques.

Frequently Asked Questions about Blocking Bot Traffic

What’s the first step I should take to block bot traffic?

The first step you should take to block bot traffic is to use server-side bot defense to mitigate these threats at the source, preventing them from reaching your servers.

How can I block bot traffic from accessing my website?

To block bot traffic from getting to your website, use server-side bot defense.

The right approach to blocking bot traffic

Complex bot attacks require an intelligent approach to bot management, supported by a greater understanding of bot intent and using fast and accurate data to mitigate threats in real time. Once one understands the threats and intent of bad bots, that’s where bot management comes in to block bot traffic.

Schedule Your Demo

Tired of your website being exploited by malicious malware and bots?

We can help

Subscribe and stay updated

Insightful articles, data-driven research, and more cyber security focussed content to your inbox every week.

By registering, you confirm that you agree to Netacea's privacy policy.