Cybersecurity and Machine Learning

Ayush Bhat
4 min readJun 4, 2021

In this article we will discuss how machine learning is being used in Cybersecurity.

Source Internet

What is Cybersecurity?

According to CISA

Cybersecurity is the art of protecting networks, devices, and data from unauthorized access or criminal use and the practice of ensuring confidentiality, integrity, and availability of information.

What is Cybercrime?

According to Britannica

Cybercrime, also called computer crime, the use of a computer as an instrument to further illegal ends, such as committing fraud, trafficking intellectual property, stealing identities, or violating privacy.

What is Cyberattack?

According to Techopedia

A cyberattack is deliberate exploitation of computer systems, technology-dependent enterprises and networks. Cyberattacks use malicious code to alter computer code, logic or data, resulting in disruptive consequences that can compromise data and lead to cybercrimes, such as information and identity theft.

Below are some statistics that show how much damage has industry suffered due to various attacks

Successful Cyberattacks in India

Below is the pie chart that shows the top attacks happened in the year 2019

How Machine Learning is helping in reducing the cyberattack risks?

In simple words, machine learning behind the scenes uses some algorithms that helps DevSecOps team to find out anomalies in the packets received by the server from a particular client. Various features on which cyberattack depends are being analyzed and a model is created with the help of these features, this model is based on the historical data means previous cyberattacks happened and then model predicts when will the next attack will happen.

Various Machine Learning Algorithm used Cybersecurity to prevent from attacks

  1. Isolation Forest
  2. Histogram-based outlier detection

3. Cluster-based local outlier detection

4. Angle-based outlier detection

When we train machine model there is always chances of errors as accuracy reaches b/w 80–90%, because our model can’t predict everything accurate.

Now we will try to learn what is Confusion Matrix?

It is a way of representing the performance of your model.

Below is the sample how it looks.

Source Internet

You can see that there are some terms mentioned in the above table, let’s debug them.

  1. True Positive

In this case the model predicted the right value and it’s a favorable condition.

2. True Negative

In this the model predicted the right value or answer but it’s not favorable result for us.

3. False Positive

In this case, the model give wrong answer, and that answer was favorable for us.

4. False Negative

In this case, the model give wrong answer, and that answer was not favorable for us.

To understand the above vocabulary let’s consider this example.

If you are in the Cybersecurity teams and if the model predicted that an attack is going to happen you will be active and take precautions accordingly and if in future attack happens, then we can relate it with True Negative. And if attack won’t happen then we can relate it with False Negative. Now if model predicted that no attack will happen and in that duration no attack happened we can then relate it with True Positive. Now the last case of False Positive, if model predicted that no attack is going to happen but this time cyberattack happened, this is the only case where our company or team will suffer. As at this time we were not predicting any attacks. This is a major problem that we are facing while we try to predict the attacks by Machine learning models.

Machine Learning Techniques for Intrusion Detection

An Intrusion Detection System (IDS) is a software that monitors a single or a network of computers for malicious activities. In today’s agile world IDS are not able to handle these complex cyberattacks. Here the techniques of machine learning can result in higher detection rates, lower false alarm rates and reasonable computation.

Application of Machine Learning to Snort System.

In a research held in School of Computing, Media and the Arts, Teesside University, England, UK, testing was done on Snort System, performance was evaluated at 10 Gbps network speed. It was observed that Snort triggered a high rate of false positive alarms. To solve this problem a Snort adaptive plug-in was developed. This plugin was built using an optimized Support Vector Machine (SVM) with the firefly algorithm.

Malware classification

Malware is a software designed to cause damage to a computer, server, client, or computer network.

Below is the Academic paper abstract from Deep learning at the shallow end: Malware classification for non-domain experts

Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification

Thank you guys !!

--

--

Ayush Bhat

AWS SAA-C02 | Certified Kubernetes Administrator | Linux Foundation Certified Sysadmin | Ex294 RedHat Certified Engineer