Blog | 24th Jan 2022 / 15:57

How Machine Learning has become essential in stopping spam and bots

Spam emails and malicious bots are equally unpopular with businesses and their customers. Thankfully data scientists have developed machine learning algorithms to fight back against these ever-evolving threats.
Alex McConnell Cybersecurity Content Specialist

Spam filters are essential. Without them we couldn’t cut through the noise of phishing scams and malware links to read our messages. Hatred of spam is well entrenched in society, just as bots (especially scalpers) have become the bane of so many lives in recent years.

There are several parallels between the email spam in our inboxes and malicious bots that crawl the web. Both are designed to cause harm to businesses and individuals. Both exploit systems designed to be useful. And both are rife; of the 300 billion emails sent daily, it’s estimated that at least half are spam. Likewise, at least half of all web traffic is automated, and over half of that consists of malicious bots.

Machine learning - filtering spam and bots

Why are bots and spam emails so widespread?

Both spam emails and business logic attacks (for example, web scraping and credential stuffing) are usually automated using specialist software and scripts. This gives them the volume and velocity to score a few lucrative hits even if most of their activity fails.

As a result, the means to prevent both types of malicious activity have grown similar over the years. The most effective way to combat such prolific attack vectors is by using machines of our own – in the form of machine learning algorithms designed by data scientists to match the pace and extent of both bots and spam.

Traditional approaches to spam filters and bot management

Let’s first look at how email spam has been traditionally stopped. Email providers like Yahoo Mail, Windows Live Mail and Gmail have long used static rules and block lists to keep spam out of inboxes and in our junk folders. There are a few obvious signals that an email is spam:

  • All caps in subject lines
  • Very short body content
  • Too many BCC recipients
  • Sender is on a spam deny list/block list

But rules and lists can only do so much against an ever-evolving spam landscape. Spammers know what these signals are and how to bypass them, and the more malicious the attack, the more likely they are to be sophisticated enough to fly under the radar unnoticed.

The same can be said for the ‘old-fashioned’ methods of blocking unwanted bots that are now well known to bot operators. Traditional defenses against bots used a similar approach as early spam filters, looking at obvious telltale signs like many requests coming from one IP address and performing simple JavaScript-based tests to see if the request originated from a full browser and not just a scraping tool.

However, subsequent generations of bots have evolved to mimic human behavior, emulating full web clients and rotating through thousands of IP addresses per attack, meaning the old ways of detecting bots have become ineffective.

Machine learning: A modern approach to stopping spam and bots

Rather than relying on old-fashioned lists and signals, we now leverage data science to keep spam and bots at bay. Machine learning is particularly effective in detecting both spam and bots because it can detect patterns in data, such as the intent of a message or request, even if the sender or originating IP address is not on a deny list/deny list.

Machine learning - filtering spam and bots

Machine learning is also massively scalable with the algorithms getting more effective as they process more information, which makes it ideal for being trained by the deluge of spam and bots constantly barraging the internet.

Machine learning for spam filters

Machine learning is used by email providers to analyze the content of emails as well as where they are sent from. This is done using natural language processing to determine the intent of a message programmatically.

In natural language processing, a neural network is trained by embedding the relationships between the 170,000 words in the English vocabulary. This allows sentences to be analyzed computationally to evaluate metrics such as intent and sentiment.

Modern spam filters use natural language processing to instantly grade how ‘spammy’ an email’s subject line and body are and filter out any messages that are classified as spam, moving them directly to our junk folders.

Machine learning for bot management

At Netacea we use the same principle of natural language processing in our patent-pending Intent Pathways technology to detect bots based on the way they navigate through websites. Instead of looking at the relationship between words and sentences, Intent Pathways investigates the relationships between request paths; think of each web page as a word, and the order of the requests between these pages as sentences.

Read a more detailed description of Intent Pathways, and how it saved one client £3 million.

Intent Pathways is part of Netacea’s extensive machine learning suite, which combine to form the Intent Analytics® AI engine for bot detection.

Preventing spam from harming websites

What happens when bots use spam to attack businesses and users? Any website with user input fields, such as contact forms (most business sites), comment boxes (common on news and media sites) or ‘get a quote’ enquiry forms are vulnerable to input spam carried out by automated bots.

The goal of these bots could be to simply waste the business’s time in chasing nonexistent leads and customers. Other bots may insert links to malware or scams as part of phishing or man-in-the-middle attacks targeting those who review inputted data. In some cases, sites might be vulnerable to injection of malicious code via publicly available forms (while the latter threat is usually patched, it doesn’t take much effort for a bot operator to try).

Spam bots are used on the web function in exactly the same way as scalper bots, programmed to interact with the target website in a specific way; rather than completing a purchase, they input data to forms.

Once again, it’s the job of fine-tuned machine learning algorithms to spot these malicious bots before they can inflict harm.

Bots can't hurt your business
with Netacea on the job
Imagine a world where your site traffic is free from bots that prey on your
users and take a bite out of bottom lines. Netacea brings that world to life.

Alex McConnell is a technical writer and cybersecurity content specialist at Netacea. He works closely with the threat research team to create insightful, accessible content on the latest trends within cybersecurity and bot management. Alex has a decade of experience creating content related to internet services, spanning web performance, online user experience and non-human traffic.
Related Resources

American Big Box Retailer Cuts API Abuse By 84%, Elimi...

04th Mar 2022 / 12:14 VIEW case study

Customer Loyalty: How are bots exploiting business logic?

28th Jun 2021 / 16:32 VIEW whitepaper

The Bot Management Review: Separating Bot Fact from Fi...

16th Mar 2022 / 10:48 VIEW guide