How Machine Learning Halts Data Breaches

There are four main types of data breaches that advances in machine learning can help thwart.

February 7, 2020

Although we hear a lot about major cybersecurity breaches in non-insurance organizations – Target, Experian, the IRS, etc. – there have been breaches in the insurance industry, too, albeit less publicized. Nationwide faces a $5 million fine from a breach back in 2012. Horizon Blue Cross Blue Shield is still the defendant in a class action suit over a 2013 breach that affected 800,000 of its insured.

As hard as organizations try to secure their data and systems, hackers continue to become more sophisticated in their methods of breaching. This is why innovation in risk management and insurance is so important.

Can Machine Learning Improve Cybersecurity (and Vice Versa)?

The short answer is yes. Because machine learning can collect and process huge amounts of data, the technology can analyze historical cyber-attacks, predict types that may occur and set up defenses against them.

Here is a very simple example:

An on-site employee has decided to use his computer to access some shopping sites during his lunch break. One of those sites has elements that alert the machine of a potential security threat. The security team is notified immediately. It is then possible to block access permission from that computer to any data that could be useful to hackers until a full investigation can be completed.

This may be a rather far-fetched example because most organizations limit private use of their computers in advance. But consider the Horizon breach – two laptops were stolen from a facility, and access was obtained. Or the case of Target, where a third-party contractor did not have appropriate security in place, and hackers were able to access the company’s systems through this third-party. Machine learning can help to reduce these threats through a proper alert system, and remote shutdowns can then occur.

Common Types of Data Breaches that ML Can Help Thwart

1. Spear Phishing

Company employees receive emails every day, in their company inboxes. Some of these, from sources that may not be known, can include malicious links.

There are now ML algorithms that can identify and classify language patterns – email subject lines, links, body content/communication patterns, phrases and even punctuation patterns. Anomalies can be flagged, and security analysts can investigate, even catching the emails before they are opened, if the system is set up correctly. Some of these emails, for example, may be very poor translations from foreign languages, certainly not professional translations from services like The Word Point. Poor translations will alert machines that spear phishing is a possibility.

2. Ransomware

Most everyone is familiar with this security threat. Users’ files are “kidnapped” and locked. Users must then pay up to get an encryption key that will unlock those files. Often, these files house critical client data, other proprietary information or system files that are necessary for business operations. The other type of ransomware attack will simply lock a user’s computer and not allow access until the demanded amount is paid.

To train a machine to identify potential ransomware requires some pretty deep learning. Data sets of historical ransomware files must be loaded, along with even larger sets of clean files, so the machine can learn to distinguish between the two. Again, so-called micro-behaviors (e.g., language patterns) are then classified as “dirty” or “clean,” and models are developed. A ransom file can then be checked against these models, and necessary action taken before files are encrypted or computer access locked.

3. Watering Hole

Employees, especially insurance agents who are out in the field, may have their favorite spots for coffee or lunch breaks. Or, suppose, a group of employees have favorite food joints from which they frequently order food for delivery or takeout. Whether they are using the Wi-Fi in that watering hole or accessing that business’s website to place an order, there is far less security and an ideal place for hackers to enter a user’s access/credentials through that backdoor.

Sometimes this is called “remote exploitation” and can include a situation like what occurred with Target – a third party is used as the “door” to get in.

ML algorithms can be developed that will track and analyze the path traversals of an external website that employees may be accessing on devices they are using either on- or off-site. Users can be directed to malicious sites while they are “traveling” to a destination site, and this is what ML can detect.

4. Webshell

A Webshell is nothing more than a small piece of code. It is loaded into a website so that a hacker can get in and make changes to the server directory. The hacker then gains access to that system’s database. Most often, hackers look to take banking and credit card information of customers/clients, and this type of attack occurs most often with e-commerce websites. However, medical practices and insurance companies are certainly at risk, too, because they house lots of personal data. When the insured set up automatic payments from their bank accounts, the activity is even more attractive to these hackers. Payments are simply routed somewhere else.

Machines can be trained to recognize normal patterns of behavior and to flag those that are not normal. Machines can also be used to identify webshells preemptively and then prevent them from exploiting a system.

The Requirement? Machines and Humans Must Work Together

Will machines ultimately eliminate the need for in-house or contracted cybersecurity experts? Highly unlikely. At this point, machines cannot engage in the deeper investigations that analysts perform once they are aware of potential breaches or once aberrant behaviors have been detected. But innovation in risk management and insurance should certainly include machine learning. Humans simply cannot gather and analyze data as fast as machine algorithms can. Incorporating ML as a solid part of cybersecurity just makes sense.