Hybrid classification model with tuned weight for cyber attack detection: Big data perspective
Introduction
Data has grown tremendously in many applications over the last 15 years, ushering in the big data age. Big data has several unique characteristics that may be exploited for a variety of reasons [1,2]. The use of big data to detect threats or assaults is one of them. “As our technical capacities grow, so do the side effects and possible risks, as Alvin Toffler put it, which precisely sums up the society we live in now”. Initially, hacking was compared to the public defacing of objects [3], [4], [5], [6], [7], [8], [9], [10], [11]. Hackers did it for the sake of amusement and publicity. Attacks are, however, more deliberate and driven these days. Nation after nation accuses one another of hacking [12,13]. Cyber attacks are on the rise and result in extensive harm to cyber infrastructure and data loss [18]. An IoT device that has been compromised may broadcast wrong cloud data to servers or permit unauthorised access to private corporate documents, economic forecasts, and business facts. This could lead to equipment failure and financial loss [41].
Industrial espionage is also on the rise, with nation-states and competing companies attempting to obtain knowledge or take away a competitor's advantage to improve their own [7] [8]. This is also evident in a variety of areas, including health care, retail, education, the fiscal sector, and the government, [9,10]. As a result of the increased vulnerability and breaching capabilities, cyber security has emerged as a critical topic in computer science. Because it is difficult to safeguard every attack point, cyber security seeks to reduce attack vectors/points to a minimum. Because an attacker just has to be victorious once, protecting systems has become extremely difficult [11,14].
The number of assailants outnumbers the number of persons attempting to defend it. This is due to the abundance of information available, which may transform anyone into an attacker [15]. Cyber security is mostly used to concentrate on computing systems for processing their data and exchanging it in the appropriate channels; any violations that occur are reported and sanctioned by the law [40]. With this in mind, cyber security has evolved from a basic prevention-only approach to a more complex PDR paradigm, which stands for PDR. In this developing PDR paradigm, big data is projected to play a significant role [16,17]. The design and management of existing systems face a few challenges including lack of privacy, no fault tolerance, poor flexibility, handling the size is difficult, more complex, and the rate of availability of data is more and so on. To overcome the current issues this study introduces a hybrid classification model with tuned weight for cyber attack detection.
The main contributions are:
- 1
Introduces CAD technique, in which pre-processing is done by using an improved class imbalance model.
- 2
Then, features like “flow-based features, improved entropy-based features, higher-order statistical features” are obtained.
- 3
The derived ones are chosen via improved ICA and then categorized by employing HC (DMO and LSTM).
- 4
The LSTM weights are optimized by using the SE-SGO technique.
- 5
Deploys proposed Bait oriented mitigation to get relief from attacks.
Sections 2 and 3 reviews CAD schemes and portrays about CAD system. Sections 4 and 5 described pre-processing & feature extraction and feature selection. Sections 6 and 7 describes classifiers and proposed bait mitigation. The results are discussed in Section 8.
Section snippets
Related works
Gifty et al. [18] focus on the privacy and security issues of handling large data for CPS in 2019 and analyse current data privacy challenges. In hostile big data sets, we also provided defense architecture for interruption recognition and examine characteristics, failure, and reliability rates.
Zhang et al. [19] offered a CAD approach for automated vehicles relying on a safe estimate of transport states in 2021, with a navigation system as an example. Different “nonlinear vehicle dynamic
A short exposition on a CAD system
The proposed CAD scheme includes subsequent steps.
- Ø
Primarily, pre-processing is done using an improved class imbalance model.
- Ø
Then, features like “flow-based features improved entropy-based features, and higher-order statistical features” are derived.
- Ø
Additionally, features are selected by employing IICA.
- Ø
Then, detection occurs by utilizing HC (LSTM and DMO).
- Ø
Moreover, the mean is taken for LSTM and DMO outcomes to attain the detected outcomes.
- Ø
LSTM weights are optimally chosen via the SE-SGO scheme.
- Ø
Pre-processing
During pre-processing, an improved class imbalance model [26] is adopted which is demonstrated in algorithm 1.
Feature extraction
The three types of features are:
- ü
Flow-based features
- ü
Improved Entropy
- ü
Higher order features
- ü
Flow-based Features: These features take account of “source-destination IP addresses and ports as well as protocol types, in addition to the transactional features that include flow data like data lengths. For DDoS attacks, the features namely, Source IP address (srcip), Source port number (port),
Improved ICA
The improved ICA's steps are given below.
Step 1: Centre the noticed signal yby deducting the mean.
Step 2: Whiten the noticed signal y
Step 3: Compute the novel value for whitened signalw
Step 4: Normalize w
Step 5: Compute the Jaccard coefficient that is measured to find the similarity among samples as in Eq. (5), which, a and b implies the first and second set of variables.
Step 6: Ensure that the algorithm has converged and if it has not returned to step 4.
Step 7: Take the dot
Hybrid classifiers: DMO and LSTM
Our research utilizes DMO and LSTM for CAD, whose outputs are then averaged.
Proposed bait-based mitigation
The steps in bait-oriented mitigation are as follows:
Step 1: The malevolent node conveys an RREP when we transmit bait RREQ to it.
The aim is achieved by making the bait address RREQ which is the address of a nearby node chosen arbitrarily in a source one-hop node.
The baiting process is started when bait RREQ is utilized for initial routing and waiting for a reply
If a malevolent node transmits a reply, then to discover if it was an attack or not, trust is computed using the weight parameter as
Simulation set up
The CAD approach was made in “Matlab”. The HC + SE-SGO was assessed over HC + SGO [30], HC + SSOA, HC + DHOA, HC + DOX, and HC + FF on diverse metrics like FPR, MCC, NPV, specificity, etc. Additionally, the evaluation was conducted using classifiers like SVM, ANN, CNN, LSTM, and DMO. The dataset was downloaded from [39].
“Dataset description: DDoS attack is a menace to network security that aims at exhausting the target networks with malicious traffic. Although many statistical methods have been
Conclusion
This research proposed a CAD technique that pre-processed input data through enhanced class imbalance processing. Then, “flow-based features, improved entropy-based feature, higher-order statistical features” were extracted. Further, improved ICA-based feature selection was done. Finally, the detection was done via HC which included LSTM and DMO. Once the presence of an attack was detected, mitigation takes place via the proposed Bait mitigation process. In this work, the SE-SGO algorithm was
Funding
This research did not receive any specific funding
CRediT authorship contribution statement
Raghunath Kumar Babu D.: Conceptualization, Methodology, Formal analysis, Investigation. A. Packialatha: Resources, Data curation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (41)
- et al.
Cyber security challenges for IoT-based smart grid networks
Int J Crit Infrastruct Protect
(2019) - et al.
Non-parametric sequence-based learning approach for outlier detection in IoT
Future Gener Comput Syst
(2018) - et al.
SFaaS: keeping an eye on IoT fusion environment with security fusion as a service
Future Gener Comput Syst
(2018) - et al.
Accurate detection of sitting posture activities in a secure IoT based assisted living environment
Future Gener Comput Syst
(2019) - et al.
Ensemble-based spam detection in social IoT using probabilistic data structures
Future Gener Comput Syst
(2018) - et al.
REATO: reacting to denial of service attacks in the internet of things
Comput Netw
(2018) - et al.
Distributed attack detection scheme using deep learning approach for Internet of Things
Future Gener Comput Syst
(2018) - et al.
Modeling and clustering attacker activities in IoT through machine learning techniques
Inf Sci (Ny)
(2019) - et al.
Evidence identification in IoT networks based on threat assessment
Future Gener Comput Syst
(April 2019) - et al.
An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset
Comput Netw
(2020)