Blog, Events & News
Threats Are Evolving, Your Bot Management Solution Should Too
By Gareth Kitson / 03rd Oct 2018
Effective Bot Detection Requires Machine Learning But Also The Right Approach and Expertise
The existing ways of detecting bot attacks often rely on a basic rules-based approach to set thresholds. Companies often use logic such as the following:
We are detecting a very high number of connections all from a particular location.
- Assumption A: large amounts of connections from a specific location is a symptom of bot activity.
- Assumption B: there must be bot activity taking place and action should therefore be taken.
As these rules-based tools evolve, they start to look for more complex and sophisticated ways of detecting these symptoms and more complex sets of symptoms to detect.
The problem with this approach is two-fold:
- It assumes that all bots identified with symptoms are bad (unless individually whitelisted) and can’t provide information to users around why that assessment is made. This can easily lead to large numbers of false positives. For example, a large spike in traffic could be a distributed bot attack or could be a response to a successful advertising campaign.
- It is easy for the bot operators to start to game their attacks to bypass defences. If you look at any sophisticated bot activity against a site, they will initially do reconnaissance attacks to test the defensive capabilities of the site and therefore determine the form and at what rate their full attack should take.
Effective Protection is Impossible Without Understanding Intent
At Netacea we approached this from a different angle. We didn’t see this as a challenge of blocking bots. From the start, this was a traffic management issue. We wanted to know what types of users made up web traffic and within that which were human, and which were a bot, and more importantly what the bot traffic was up to, be it of good or malicious intent.
Some of our early customers drove some fundamental design decisions that have allowed us to differentiate from our competitors:
- They knew they had an unknown amount of bot activity that drove revenue via affiliate and re-seller sites. They had previously just blocked all bot activity and saw a drop off in conversion.
- They had no reliable way of whitelisting all these reseller sites.
This meant that we didn’t just need to understand whether traffic was a bot or not, we also needed to understand the INTENT of the bot. In taking this approach we address the two limitations above. Firstly, we are allowing customers to make much more informed decisions around how they are handling bots by creating policies based on intent not on symptoms, and by being based on data about bot activity allows for in-depth control and reduced false positives. Secondly, it is much harder for bot operators to hide, no matter how slow and stealthy their attack is, as ultimately they have a different intent to legitimate users.
Introducing Machine Learning and Behavioural analysis
Getting this level of information about bot activity though required much more sophisticated approaches. This wasn’t a matter of just matching rulesets of system activity. It required detailed and sophisticated analysis of all user behaviour, both bot and human, to determine what are legitimate and illegitimate intents. Billions of lines of data had to be evaluated, patterns had to be identified and intent defined and re-defined as more activity took place. All this needed to be undertaken in effectively real time without impacting system latency.
Only cutting-edge machine learning technology could undertake this level of complex data processing, so machine learning is a fundamental part of what makes Netacea next generation, but it is only the tool, not the real differentiator. Only by harnessing this toolset did Netacea manage to build the next generation solution required to address the limitations of other bot detection tools.
The Differentiators: Data Scientists and Real-Time Analysis
Firstly, we have our team of excellent data scientists who are working to solve the problems described above using ever more complex algorithms, using supervised and unsupervised learning, neural nets etc. Machine learning provides the toolset to help solve the problem but not the expertise.
Secondly, the cloud-based data processing engine that Netacea uses allows for much more complex analysis to be undertaken, and the combination of real-time and near real time bot analysis means that Netacea can be an ultra-low latency system while still providing the levels of analysis needed to implement more complex machine learning algorithms. It also allows us to be constantly re-evaluating a visitor as their behaviour evolves, when you are detecting intent this evolving of opinion over time is essential as intent only becomes clearer with the more activity that is seen. Some bots can be detected on first connection, but these are the less sophisticated bots and the numbers of those being seen will only reduce over time.
The takeaway from this is that machine learning is a game changer when it comes to bot detection, but it is just a tool, not the solution. When looking for a bot detection solution consider not just whether they are using machine learning, but more importantly what they are doing with it and how this improves your ability to take control over the bot activity on your website.
If you would like to learn more about our use of machine learning or data science, please do get in touch and one of our experts will be happy to help answer your questions.