Data Sheet

Machine Learning for Account Takeover

By / 10th Jul 2018

 

Machine Learning for Account Takeover

Automated Customer Training of the Machine Learning Algorithm

Web applications are a huge attack vector for any business and when sophisticated threats to these applications are designed to look like the legitimate consumers they’re meant to serve, traditional signature-based or rate limiting detection methods just don’t work.

Being able to gain deep insight into the critical visitor behaviours & intent and subsequently understanding what constitutes a threat to your environment is key. Our machine learning has been developed to learn from your traffic but also from our customers input.

From the very outset, you train the machine learning to understand your critical paths, vulnerabilities, and top business priorities using a simple visual tool. This allows our machine learning to learn from your environment, even before you are live. This ensures that the risk rating is entirely appropriate and aligned with your business goals as soon as the service goes live and can report on and deliver the critical information needed when making a mitigation decision.

Guided Machine Learning

Although you may not know that bots are actually hitting your web site, most businesses have very clear policies on how they want bot visitors to be handled once they know what the bot payload actually is. For example, if you knew that bots were hitting your web site faking the behaviour of well-known search engines, but were in fact competitive scrapers, you probably will know what policies you want to put in place to deal with these obvious fake search engines. Once you set up the policies and key critical paths the machine learning then takes your input and builds up a custom threat score for your actual environment. Everything is then automatic and the set-up for the original learning just takes a few minutes to complete. As we identify new bot threats, you can guide the machine learning at any time by adding your feedback into how you want to treat bots. You can get as granular as you like, or just accept the default settings from your custom configuration list.

Classifications include:

Beyond The Black Box

Quickly and reliably determining which website visitors are human or non-human can be challenging as bots become more sophisticated. Many tools just display an overall threat score, with no data on how the score is determined. This makes apply relevant mitigation steps difficult to build into your existing infrastructure. Instead of the ‘black box’ approach; Netacea clearly shows each of the key metrics that makes up our overall threat score. Customers are also able to use our guided learning to optimise our machine learning to the priorities that are of most concern to them

Threat Feeds

The detailed threat feed API can be integrated in a variety of different ways - from a simple Slack Channel to a custom API integration that feeds a particular threat score into one or more micro-service. For example, the shopping cart abuse threat score can feed into the cart API, and the account takeover threat score can be applied to the access and control API used for single-sign-on.

Business Priorities Settings

If not designed correctly, machine learning can take large amounts of processing power, and examining the complex interactions of web visitors, together with the IP history, digital fingerprints and the whole range of digital provenance data just takes time. Once the threat is assessed and then sent to a WAF or other Bot Mitigation service, the total time elapsed can easily be several minutes - far too late to prevent the breach.

Netacea takes a radically different approach. The cloud-based data processing engine that Netacea uses allows for much more complex analysis to be undertaken, and the combination of real-time and near real time bot analysis means that Netacea can be an ultra-low latency system while still providing the levels of analysis needed to implement more complex machine learning algorithms.

Furthermore, all the data is analysed by the machine learning engine historically, which is then able to provide a rich and detailed profile of who are the authentic versus the fake, browser emulators, and obvious bad actors. The heavy processing needed to establish ‘normal behaviour’ v ‘abnormal behaviour’ is all done out of line without affecting your site’s visitors in any way

Once the machine learning understands your web estate, critical paths and your own risk criteria, which can be set using a simple visual tool, the machine learning can start to understand the visitor flow in the background without being in-line, examining all of the data. Your threat appetite will also change according to the path taken by visitors on your web site. For example, a large sudden increase in inbound visitors to the shopping cart or login pages will naturally cause more concern than visits to a product page.

We use a digital fingerprint to trace and trace the bots so we can aggregate their behaviour by visitor class.

As shown below the threat feed shows the detailed breakdown per behavioural attack time over time - for example shows below is the Account Take Over threat feed, and the overall aggregated threat score for all the threat components.


Critical Path Settings

All web applications have key paths and areas of the application that visitors will utilise time and time again, this is also true for bad actors that are looking to exploit the application in some way.

Netacea’s Critical Path settings can be used to isolate your key visitor paths through your web estate. These critical paths could be your conversion path, login, search or areas that key information is stored. This could also include A/B revision paths for serving alternate content or alternate logins.

You can also have Netacea choose your critical paths from the machine learning analysis of the top paths for incoming bots on your application. The two approaches can be combined so that you have weighted protection on your top critical paths, as well any paths the bots are targeting that you may not perhaps be aware of. This methodology delivers not only insight into how legitimate\illegitimate visitors interact with the application but also allows for the monitoring of key areas from the outset.