Soft Computing Based Classification Technique Using
KDD 99 Data Set for Intrusion Detection System

Mr. Suresh kashyap; Ms. Pooja Agrawal; Mr.Vikas Ch; ra P; ey; Mr. Suraj Prasad Keshri

Soft Computing Based Classification Technique Using KDD 99 Data Set for Intrusion Detection System

Mr. Suresh kashyap¹ ,Ms. Pooja Agrawal², Mr.Vikas Chandra Pandey³, Mr. Suraj Prasad Keshri⁴

Research Scholar (M.Tech.), Dr.C.V.RamanUniversity, Kargi Road Kota, Bilaspur,India
Research Scholar (Ph.D.), Dr.C.V.RamanUniversity, Kargi Road Kota,Bilaspur,India
Research Scholar (Ph.D.), Dr.C.V.RamanUniversity, Kargi Road Kota,Bilaspur,India
Research Scholar (M.Tech.), Dr.C.V.RamanUniversity, Kargi Road Kota, Bilaspur,India

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Abstract

An intrusion detection system (IDS) inspects all inbound and outbound network activity and identifies suspicious patterns that may indicate a network or system attack from someone attempting to break into or compromise a system. Soft computing techniques resemble biological processes more closely than traditional techniques, which are largely based on formal logical systems. Knowledge Discovery in Databases (KDD) is the automated discovery of patterns and relationships in large databases. In this paper we are going to preprocess the different of KDD cup 99 data set. The two algorithms Error back propagation (EBP) which is the most used training algorithm for feedforwrd artificial neural networks (FFANNs) and the Radial basis function (RBF) neural network which is based on supervised learning are compared .After the process we give result that Radial basis function (RBF) is better than Error back propagation (EBP) .For comparison we used MATLAB tool.

Keywords

Detection methods, Matlab, intrusion detection, network security.

INTRODUCTION

An intrusion detection system (IDS) inspects all inbound and outbound network activity and identifies suspicious patterns that may indicate a network or system attack from someone attempting to break into or compromise a system. IDS' initial design and function is to protect the organization's vital information from an outsider. The IDS analyzes the information it gathers and compares it to large databases of attack signatures.

Intrusion detection functions include:-

ÃÂ¯ÃâÃÂ· Monitoring and analyzing both user and system activities.

ÃÂ¯ÃâÃÂ· Analyzing system configurations and vulnerabilities.

ÃÂ¯ÃâÃÂ· Assessing system and file integrity.

ÃÂ¯ÃâÃÂ· Ability to recognize patterns typical of attacks.

ÃÂ¯ÃâÃÂ· Analysis of abnormal activity patterns.

ÃÂ¯ÃâÃÂ· Tracking user policy violations.

II. IDS WITH TRADITIONAL APPROACH

The increasing complexity of modern computing systems makes traditional views of information security impractical, if not impossible. Computing environments are dynamic with near constant changes in configurations, software, and usage patterns. This makes completely securing a given system a difficult theoretical task for static Systems unfeasible for the dynamic nature of today’s systems. This presents the need for a more dynamic view of information security, one that recognizes the insufficiency of static descriptions of policy and security mechanisms and that proposes a dynamic means of providing security which is sufficient for a given system at a given time.

Many draw back has in Traditional Approach:

ÃÂ¯ÃâÃÂ· Signature-based IDSs must be programmed to detect each attack and thus must be constantly updated with signatures of new attacks.

ÃÂ¯ÃâÃÂ· Many signature-based IDSs have narrowly defined signatures that prevent them from detecting variants of common attacks.

ÃÂ¯ÃâÃÂ· Anomaly detection approaches usually produce a large number of false alarms due to the unpredictable nature of users and networks.

ÃÂ¯ÃâÃÂ· Anomaly detection approaches often require extensive “training sets” of system event records in order to characterize normal behavior patterns

ÃÂ¯ÃâÃÂ· Application-based IDSs may be more vulnerable than host-based IDSs to being attacked and disabled since they run as an application on the host they are monitoring.

III.SOFT COMPUTING

Soft Computing became a formal Computer Science area of study in the early 1990. Earlier computational approaches could model and precisely analyze only relatively simple systems. More complex systems arising in biology, medicine, the humanities, management sciences, and similar fields often remained intractable to conventional mathematical and analytical methods

Components of soft computing include:-

ÃÂ¯ÃâÃÂ· Neural networks (NN).

ÃÂ¯ÃâÃÂ· Fuzzy systems (FS).

ÃÂ¯ÃâÃÂ· Evolutionary computation (EC).

ÃÂ¯ÃâÃÂ· Evolutionary algorithms.

A. Why Soft Computing Tools Used For IDS ?

Traditional protection techniques such as user authentication, data encryption, avoiding programming errors and firewalls are used as the first line of defense for computer security. If a password is weak and is compromised, user authentication can not prevent unauthorized use, firewalls are vulnerable to errors in configuration and suspect to ambiguous or undefined security policies. They are generally unable to protect against malicious mobile code, insider attacks and unsecured modems. Programming errors cannot be avoided as the complexity of the system and application software is evolving rapidly leaving behind some exploitable weaknesses. Consequently, computer systems are likely to remain unsecured for the foreseeable future. Intrusion detection is useful not only in detecting successful intrusions, but also in monitoring attempts to break security, which provides important information for timely countermeasures.

An Intrusion Detection System (IDS) itself can be defined as the tools, methods, and resources to help identify, assess, and report unauthorized or unapproved network activity.

IV.KDD 99 DATA SET

Knowledge Discovery in Databases (KDD) is the automated discovery of patterns and relationships in large databases. Large databases are not uncommon. Cheaper and larger computer storage capabilities have contributed to the proliferation of such databases in a wide range of fields.

KDD employs methods from various fields such as machine learning, artificial intelligence, pattern recognition, database management and design, statistics, expert systems, and data visualization. KDD has been more formally defined as the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.The KDD Process is a highly iterative, user involved, multistep process, as can be seen in figure 2.

We see that initially, we have organizational data. This data is the operational data gathered either in one or several locations.

V.CLASSIFICATION OF DATASET

The all dataset classifying the five broad on based of attacks categories these are:

ÃÂ¯ÃâÃÂ· Normal dataset: The normal data set class means the IDS cannot detect any abnormal condition.

ÃÂ¯ÃâÃÂ· DoS (Denial Of Service dataset): Denial of Service (DoS) is a class of attack where an attacker makes a computing or memory resource too busy or too full to handle legitimate requests, thus denying legitimate users access to a machine.

ÃÂ¯ÃâÃÂ· R2L(Unauthorized Access from a Remote Machine dataset):A remote to user (R2L) attack is a class of attack where an attacker sends packets to a machine over a network, then exploits the machine’s vulnerability to illegally gain local access as a user. There are different types of R2U attacks; the most common attack in this class is done using social engineering.

ÃÂ¯ÃâÃÂ· U2Su (Unauthorized Access to Local Super User (root) dataset ): User to root exploits are a class of attacks where an attacker starts out with access to a normal user account on the system and is able to exploit vulnerability to gain root access to the system. Most common exploits in this class of attacks are regular buffer overflows, which are caused by regular programming mistakes and environment assumptions.

ÃÂ¯ÃâÃÂ· Probing (Surveillance and Other Probing dataset): Probing is a class of attack where an attacker scans a network to gather information or find known vulnerabilities. An attacker with a map of machines and services that are available on a network can use the information to look for exploits. There are different types of probes: some of them abuse the computer’s legitimate features; some of them use social engineering techniques.

For training the KDD cup 99 data set we have given number to different types attack including normal attack as shown in table.

VI.CLASSIFICATION

Data classification is a methodology to align business requirements to infrastructure, so that infrastructure service delivery properly supports data storage and management.

Classification of objects is probably one of the most common and ancient decision tasks performed by humans. It can be seen as the ability of assigning a specific object to a predefined group or class based on a number of observed attributes of that object. The classification process was primarily related to our natural senses: humans recognize or classify objects based on the data acquired by their natural sensors. The data collected by the sensors is converted to specific features.

Above Figure 3 show the Schematic view of the Classification process.

VII.CLASSIFICATION THROUGH EBPA

One of the most popular weight updating rules of learning (training) algorithms is Error Back Propagation(EBP).However, most of the EBP based neural learning algorithms strictly depends on the architecture of the ANN and there are many problems associated with the currently existing algorithm based on EBP and its variation. Feed- Forward NN with EBP learning method are a very multi-purpose system. They can be seen as a statistical method, a nonlinear controller, a filter, an agent behavior system and every other complex input-output function approximation and generalization.

Algorithm:

Training a neural net by back-propagation involves three stages:

• Feed-forward of input training pattern,

• Back-propagation of associated error, and

• Adjustment of weights.

The algorithm is as follows:

Step 1: Initialize the weights (set to random values).

Step 2: While stopping condition is false, do steps 2-9.

Step 3: For each training pair, do steps 3-8.

VIII.CLASSIFICATION THROUGH RADIAL BASIS FUNCTION

Radial basis function (RBF) neural network is based on supervised learning. RBF networks were independently proposed by many researchers and are a popular alternative to the MLP. RBF networks are also good at modelling nonlinear data and can be trained in one stage rather than using an iterative process as in MLP and also learn the given application quickly. They are useful in solving problems where the input data are corrupted with additive noise.

Training of RBF neural networks:-

training set is an m labelled pair {Xi, di} that represents associations of a given mapping or samples of a continuous multivariate function. The sum of squared error criterion function can be considered as an error function E to be minimized over the given training set. That is, to develop a training method that minimizes E by adaptively updating the free parameters of the RBF network. These parameters are the receptive field centres μj of the hidden layer Gaussian units, the receptive field widths σj, and the out-put layer weights (w ij ). Because of the differentiable nature of the RBF network transfer characteristics, one of the training methods considered here was a fully supervised gradient-descent method over E.

In particular, μj σjand w ij are updated as follows:

where ρ μ, ρσ and ρ W , are small positive constants. This method is capable of matching or exceeding the performance of neural networks with back-propagation algorithm, butgives training comparable with those of sigmoidal type of FFNN14. The training of the RBF network is radically different from the classical training of standard FFNNs. In this case, there is no changing of weights with the use of the gradient method aimed at function minimization. In RBF networks with the chosen type of radial basis function, training resolves itself into selecting the centres and dimensions of the functions and calculating the weights of the output neuron. Now simulate IDS data through MATLAB s/w using EBPA and RBN then we getting the following result

IX.COMPARISON

As we have trained our Neural network using EBP Algorithm and RBF and we are getting different output .It is clear from above two figurer but for IDS data (training) RBF is working well while EBP has shown less efficient result.

We have tried and trained our Neural network for very less amount of data , this may be the reason why me are getting error full result . for getting error less result we can perform following task .

1) Initializing better weight of connection of Neural Network .

2) Setting another parameter like bias neuron .

3) Considering more number of training data of IDS .

X.CONCLUSION AND FURTHER RESEARCH

This paper consist the training of Neural Network specially designed for IDS data lots of works has been done in this field number of soft computing based tools were designed for Intrution detection, This paper is an effort towards simulation of IDS data for developing intelligent system for IDS. Our result show that this approach of developing IDS can be enhanced by using different technique an discussed in comparison parts.

Further research in this field can be carried out by considering different algorithm for training neural network with more amount of data and to compare and conclude me result that which algorithm will be suitable for IDS data, further a new soft computing tool can be designed for IDS system using hybrid technology of soft computing .

References

William Stallings “ Cryptography and network security” Pearson prentice hall 2009 Fourth edition
An introduction to neural computing. Aleksander, I. and Morton, H. 2nd edition
Neural Networks at Pacific Northwest National Laboratory
http://www.emsl.pnl.gov:2080/docs/cie/neural/neural.homepage.html
Ajith Abraham1 and Ravi. Jain2 “Soft Computing Models for Network Intrusion Detection Systems” 1 Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org 2 School of Information Science, University of South Australia, Australia ravi.jain@unisa.edu.au
An introduction to neural computing. Aleksander, I. and Morton, H. 2nd edition
Simon Haykin “Neural network” Pearson prentice hall 2008 Second edition
Dubois, D. and Prade, H., 1980. Fuzzy Sets and Systems: Theory and Applications. New York: Academic
Kosko, B., 1991. Neural Networks and Fuzzy Systems. Englewood Cliffs, NJ: Prentice Hall.
IC1043 “Neural network & fuzzy logic” RMKEC page no 1to39
kddcup.names A list of features.
http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data.gz The full data set (18M; 743M Uncompressed)
http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz A 10% subset. (2.1M; 75M Uncompressed)
http://kdd.ics.uci.edu/databases/kddcup99/kddcup.newtestdata_10_percent_unlabeled.gz (1.4M; 45M Uncompressed)
http://kdd.ics.uci.edu/databases/kddcup99/kddcup.testdata.unlabeled.gz (11.2M; 430M Uncompressed)
http://kdd.ics.uci.edu/databases/kddcup99/kddcup.testdata.unlabeled_10_percent.gz (1.4M;45M Uncompressed)
http://kdd.ics.uci.edu/databases/kddcup99/corrected.gz Test data with corrected labels.
training_attack_types A list of intrusion types.
typo-correction.txt A brief note on a typo in the data set that has been corrected.
http://www.sigkdd.org/kddcup/index.php?section=1999&method=data
http://matauranga.wordpress.com/rana/kdd-cup-1999-data-evaluation/
http://www.scribd.com/doc/2346440/Neural-Networks-and-Fuzzy-logic-Control-IC-1403