Hybrid classification model with tuned weight for cyber attack detection: Big data perspective

https://doi.org/10.1016/j.advengsoft.2022.103408Get rights and content

Abstract

Cybercrime using big data is growing at an unprecedented rate, posing a serious threat to the Internet sector and global data. Traditional ways of mitigating cyber risks are becoming inadequate due to the more complex attack and offensive methods employed by cyber attackers, as well as the expanding importance of data-driven and intellect competitors. This work introduces new cyber attack detection (CAD) model in Big data that includes: “Preprocessing, Feature Extraction, Feature Selection, and Detection, Mitigation”. The preprocessing is done by using the improved class imbalance process. The variety of 3 features is extracted as “flow-based features, improved entropy-based features, and higher-order statistical features”. For feature selection, the Improved Independent component analysis (ICA) is used. Finally, the hybrid classifier includes LSTM and Deep Max out (DMO) in the detection process. Once the presence of an attack is detected, mitigation takes place via the proposed Bait mitigation process. The weights of Long Short-Term Memory (LSTM) are optimized by using the Self-Enhanced Sea Gull Optimization (SE-SGO) model. The maximum accuracy has been achieved (0.94) for the suggested approach which is 38%, 14.6%, 7.36%, 38.7%, and 10.5% superior to the other existing approaches like HC + SGO, HC + SSOA, HC + DHOA, HC + DOX, and HC + FF, respectively.

Introduction

Data has grown tremendously in many applications over the last 15 years, ushering in the big data age. Big data has several unique characteristics that may be exploited for a variety of reasons [1,2]. The use of big data to detect threats or assaults is one of them. “As our technical capacities grow, so do the side effects and possible risks, as Alvin Toffler put it, which precisely sums up the society we live in now”. Initially, hacking was compared to the public defacing of objects [3], [4], [5], [6], [7], [8], [9], [10], [11]. Hackers did it for the sake of amusement and publicity. Attacks are, however, more deliberate and driven these days. Nation after nation accuses one another of hacking [12,13]. Cyber attacks are on the rise and result in extensive harm to cyber infrastructure and data loss [18]. An IoT device that has been compromised may broadcast wrong cloud data to servers or permit unauthorised access to private corporate documents, economic forecasts, and business facts. This could lead to equipment failure and financial loss [41].

Industrial espionage is also on the rise, with nation-states and competing companies attempting to obtain knowledge or take away a competitor's advantage to improve their own [7] [8]. This is also evident in a variety of areas, including health care, retail, education, the fiscal sector, and the government, [9,10]. As a result of the increased vulnerability and breaching capabilities, cyber security has emerged as a critical topic in computer science. Because it is difficult to safeguard every attack point, cyber security seeks to reduce attack vectors/points to a minimum. Because an attacker just has to be victorious once, protecting systems has become extremely difficult [11,14].

The number of assailants outnumbers the number of persons attempting to defend it. This is due to the abundance of information available, which may transform anyone into an attacker [15]. Cyber security is mostly used to concentrate on computing systems for processing their data and exchanging it in the appropriate channels; any violations that occur are reported and sanctioned by the law [40]. With this in mind, cyber security has evolved from a basic prevention-only approach to a more complex PDR paradigm, which stands for PDR. In this developing PDR paradigm, big data is projected to play a significant role [16,17]. The design and management of existing systems face a few challenges including lack of privacy, no fault tolerance, poor flexibility, handling the size is difficult, more complex, and the rate of availability of data is more and so on. To overcome the current issues this study introduces a hybrid classification model with tuned weight for cyber attack detection.

The main contributions are:

  • 1

    Introduces CAD technique, in which pre-processing is done by using an improved class imbalance model.

  • 2

    Then, features like “flow-based features, improved entropy-based features, higher-order statistical features” are obtained.

  • 3

    The derived ones are chosen via improved ICA and then categorized by employing HC (DMO and LSTM).

  • 4

    The LSTM weights are optimized by using the SE-SGO technique.

  • 5

    Deploys proposed Bait oriented mitigation to get relief from attacks.

Sections 2 and 3 reviews CAD schemes and portrays about CAD system. Sections 4 and 5 described pre-processing & feature extraction and feature selection. Sections 6 and 7 describes classifiers and proposed bait mitigation. The results are discussed in Section 8.

Section snippets

Related works

Gifty et al. [18] focus on the privacy and security issues of handling large data for CPS in 2019 and analyse current data privacy challenges. In hostile big data sets, we also provided defense architecture for interruption recognition and examine characteristics, failure, and reliability rates.

Zhang et al. [19] offered a CAD approach for automated vehicles relying on a safe estimate of transport states in 2021, with a navigation system as an example. Different “nonlinear vehicle dynamic

A short exposition on a CAD system

The proposed CAD scheme includes subsequent steps.

  • Ø

    Primarily, pre-processing is done using an improved class imbalance model.

  • Ø

    Then, features like “flow-based features improved entropy-based features, and higher-order statistical features” are derived.

  • Ø

    Additionally, features are selected by employing IICA.

  • Ø

    Then, detection occurs by utilizing HC (LSTM and DMO).

  • Ø

    Moreover, the mean is taken for LSTM and DMO outcomes to attain the detected outcomes.

  • Ø

    LSTM weights are optimally chosen via the SE-SGO scheme.

  • Ø

Pre-processing

During pre-processing, an improved class imbalance model [26] is adopted which is demonstrated in algorithm 1.

Feature extraction

The three types of features are:

    • ü

      Flow-based features

    • ü

      Improved Entropy

    • ü

      Higher order features

Flow-based Features: These features take account of “source-destination IP addresses and ports as well as protocol types, in addition to the transactional features that include flow data like data lengths. For DDoS attacks, the features namely, Source IP address (srcip), Source port number (port),

Improved ICA

The improved ICA's steps are given below.

  • Step 1: Centre the noticed signal yby deducting the mean.

  • Step 2: Whiten the noticed signal y

  • Step 3: Compute the novel value for whitened signalw

  • Step 4: Normalize w

  • Step 5: Compute the Jaccard coefficient that is measured to find the similarity among samples as in Eq. (5), which, a and b implies the first and second set of variables.JC(a,b)=|ab||ab|

  • Step 6: Ensure that the algorithm has converged and if it has not returned to step 4.

  • Step 7: Take the dot

Hybrid classifiers: DMO and LSTM

Our research utilizes DMO and LSTM for CAD, whose outputs are then averaged.

Proposed bait-based mitigation

The steps in bait-oriented mitigation are as follows:

  • Step 1: The malevolent node conveys an RREP when we transmit bait RREQ to it.

The aim is achieved by making the bait address RREQ which is the address of a nearby node chosen arbitrarily in a source one-hop node.

The baiting process is started when bait RREQ is utilized for initial routing and waiting for a reply

If a malevolent node transmits a reply, then to discover if it was an attack or not, trust is computed using the weight parameter as

Simulation set up

The CAD approach was made in “Matlab”. The HC + SE-SGO was assessed over HC + SGO [30], HC + SSOA, HC + DHOA, HC + DOX, and HC + FF on diverse metrics like FPR, MCC, NPV, specificity, etc. Additionally, the evaluation was conducted using classifiers like SVM, ANN, CNN, LSTM, and DMO. The dataset was downloaded from [39].

“Dataset description: DDoS attack is a menace to network security that aims at exhausting the target networks with malicious traffic. Although many statistical methods have been

Conclusion

This research proposed a CAD technique that pre-processed input data through enhanced class imbalance processing. Then, “flow-based features, improved entropy-based feature, higher-order statistical features” were extracted. Further, improved ICA-based feature selection was done. Finally, the detection was done via HC which included LSTM and DMO. Once the presence of an attack was detected, mitigation takes place via the proposed Bait mitigation process. In this work, the SE-SGO algorithm was

Funding

This research did not receive any specific funding

CRediT authorship contribution statement

Raghunath Kumar Babu D.: Conceptualization, Methodology, Formal analysis, Investigation. A. Packialatha: Resources, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (41)

  • G. Dhiman et al.

    Seagull optimization algorithm: theory and its applications for large-scale industrial engineering problems

    Knowl Based Syst

    (2019)
  • D. Yin et al.

    A DDoS attack detection and mitigation with software-defined internet of things framework

    IEEE Access

    (2018)
  • Md.M. MahmudulHasan et al.

    Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches

    IoT

    (2019)
  • J. Ho

    Efficient and robust detection of code-reuse attacks through probabilistic packet inspection in industrial iot devices

    IEEE Access

    (2018)
  • R.S. QianLi et al.

    Parallel distributed computing based wireless sensor network anomaly data detection in IoT framework

    Cogn Syst Res

    (2018)
  • A. Azmoodeh et al.

    Robust malware detection for internet of (Battlefield) things devices using deep eigenspace learning

    IEEE Trans Sustain Comput

    (2019)
  • A. Ali AlZubi et al.

    Cyber-attack detection in healthcare using cyber-physical system and machine learning techniques

    Soft Comput

    (2021)
  • Q. Su et al.

    Attack detection and secure state estimation for cyber-physical systems with finite-frequency observers

    J Franklin Inst

    (2020)
  • Q. Jiao et al.

    Covert attack detection based on hi/ho optimization for cyber-physical systems based on optimization for cyber-physical systems

    IFAC

    (2020)
  • R. Gifty et al.

    Privacy and security of big data in cyber-physical systems using Weibull distribution-based intrusion detection

    Neural Comput & Appl

    (2019)
  • Cited by (0)

    View full text