How Machine Learning has become essential in stopping spam and botsSpam emails and malicious bots are equally unpopular with businesses and their customers. Thankfully data scientists have developed machine learning algorithms to fight back against these ever-evolving threats.
Spam filters are essential. Without them we couldn’t cut through the noise of phishing scams and malware links to read our messages. Hatred of spam is well entrenched in society, just as bots (especially scalpers) have become the bane of so many lives in recent years.
There are several parallels between the email spam in our inboxes and malicious bots that crawl the web. Both are designed to cause harm to businesses and individuals. Both exploit systems designed to be useful. And both are rife; of the 300 billion emails sent daily, it’s estimated that at least half are spam. Likewise, at least half of all web traffic is automated, and over half of that consists of malicious bots.
Why are bots and spam emails so widespread?
Both spam emails and business logic attacks (for example, web scraping and credential stuffing) are usually automated using specialist software and scripts. This gives them the volume and velocity to score a few lucrative hits even if most of their activity fails.
As a result, the means to prevent both types of malicious activity have grown similar over the years. The most effective way to combat such prolific attack vectors is by using machines of our own – in the form of machine learning algorithms designed by data scientists to match the pace and extent of both bots and spam.
Traditional approaches to spam filters and bot management
Let’s first look at how email spam has been traditionally stopped. Email providers like Yahoo Mail, Windows Live Mail and Gmail have long used static rules and block lists to keep spam out of inboxes and in our junk folders. There are a few obvious signals that an email is spam:
- All caps in subject lines
- Very short body content
- Too many BCC recipients
- Sender is on a spam deny list/block list
But rules and lists can only do so much against an ever-evolving spam landscape. Spammers know what these signals are and how to bypass them, and the more malicious the attack, the more likely they are to be sophisticated enough to fly under the radar unnoticed.
However, subsequent generations of bots have evolved to mimic human behavior, emulating full web clients and rotating through thousands of IP addresses per attack, meaning the old ways of detecting bots have become ineffective.
Machine learning: A modern approach to stopping spam and bots
Rather than relying on old-fashioned lists and signals, we now leverage data science to keep spam and bots at bay. Machine learning is particularly effective in detecting both spam and bots because it can detect patterns in data, such as the intent of a message or request, even if the sender or originating IP address is not on a deny list/deny list.
Machine learning is also massively scalable with the algorithms getting more effective as they process more information, which makes it ideal for being trained by the deluge of spam and bots constantly barraging the internet.
Machine learning for spam filters
Machine learning is used by email providers to analyze the content of emails as well as where they are sent from. This is done using natural language processing to determine the intent of a message programmatically.
In natural language processing, a neural network is trained by embedding the relationships between the 170,000 words in the English vocabulary. This allows sentences to be analyzed computationally to evaluate metrics such as intent and sentiment.
Modern spam filters use natural language processing to instantly grade how ‘spammy’ an email’s subject line and body are and filter out any messages that are classified as spam, moving them directly to our junk folders.
Machine learning for bot management
At Netacea we use the same principle of natural language processing in our patent-pending Intent Pathways technology to detect bots based on the way they navigate through websites. Instead of looking at the relationship between words and sentences, Intent Pathways investigates the relationships between request paths; think of each web page as a word, and the order of the requests between these pages as sentences.
Intent Pathways is part of Netacea’s extensive machine learning suite, which combine to form the Intent Analytics® AI engine for bot detection.
Preventing spam from harming websites
What happens when bots use spam to attack businesses and users? Any website with user input fields, such as contact forms (most business sites), comment boxes (common on news and media sites) or ‘get a quote’ enquiry forms are vulnerable to input spam carried out by automated bots.
The goal of these bots could be to simply waste the business’s time in chasing nonexistent leads and customers. Other bots may insert links to malware or scams as part of phishing or man-in-the-middle attacks targeting those who review inputted data. In some cases, sites might be vulnerable to injection of malicious code via publicly available forms (while the latter threat is usually patched, it doesn’t take much effort for a bot operator to try).
Spam bots are used on the web function in exactly the same way as scalper bots, programmed to interact with the target website in a specific way; rather than completing a purchase, they input data to forms.
Once again, it’s the job of fine-tuned machine learning algorithms to spot these malicious bots before they can inflict harm.
with Netacea on the job
users and take a bite out of bottom lines. Netacea brings that world to life.