Applications of Neural Networks in Network Intrusion Detection
-
Upload
independent -
Category
Documents
-
view
8 -
download
0
Transcript of Applications of Neural Networks in Network Intrusion Detection
L 8th Seminar on Neural Network Applications in Electrical Engineering, NEUREL-2006 UMMX Faculty of Electrical Engineering, University of Belgrade, Serbia, September 25-27, 2006 20mtE,i, ~~http /nure1l.etf`.1bg.ac.vyu, http://www.ewh .ieee.org/reg/8/conferences .htm I /*IEEE06
Applications of Neural Networks in NetworkIntrusion Detection
Aleksandar Lazarevic, Dragoijub Pokrajac, Jelena Nikolic
Abstract - In this paper, we discuss the applications ofmultilayer perceptrons for classification of networkintrusion detection data characterized by skewed classdistributions. We compare several methods for learningfrom such skewed distributions by manipulating datarecords. The investigated methods include oversampling,undersampling and generating artificial data records usingSMOTE technique. The presented methods are tested onKDDCup99 network intrusion dataset and compared usingvarious classification performance metrics. In addition, theinfluence of decision margin on recall and misclassificationrates is also examined.
Keywords - Neural networks, rare class, networkintrusion detection.
I. INTRODUCTION
W5[ITH the growing number of attacks on networkinfrastructures, the need for techniques to detectand prevent attacks is becoming urgent. Intrusion
detection refers to a broad range of techniques that defendagainst malicious attacks. The most widely deployedmethods for intrusion detection employ signature-baseddetection techniques. These methods extract features fromvarious audit streams, and detect intrusions by comparingthe feature values to a set of attack signatures provided byhuman experts. Such methods can only detect previouslyknown intrusions that have a corresponding signature.The signature database has to be manually revised foreach new type of attack that is discovered. Limitations ofsignature-based methods have led to an increasing interestin intrusion detection techniques base upon data mining[1, 2, 3, 4, 5].
Data mining based intrusion detection techniquesgenerally fall into one of two categories; namely misusedetection and anomaly detection. In misuse detection
Dragoljub Pokrajac has been partially supported by NIH (2P20RR016472-04), DoD/DoA 45395-MA-ISP and NSF (0320991, HRD-0310163, HRD-0630388) grants.
A. Lazarevic is with United Technologies Research Center, 411Silver Lane, East Hartford, CT 06108, USA (phone +1-860-610-7560,fax: +1-860-660-9334, e-mail: lazareaAutrc.utc.com).
D. Pokrajac is with Delaware State University, Applied MathematicsResearch Center, 1200 N DuPont Hwy, Dover, DE, 19904, USA (phone+1-302-857-6640, fax: +1-302-857-6552, e-mail: dpokrajaAdesu.edu).
J. Nikolic is with Faculty of Electronic Engineering, University ofNis, Aleksandra Medvedeva 14, 18000 Nis, Serbia (phone: +381-18-529-105, fax: +381-18-588-399, e-mail: njelenaAelfak.ni.ac.yu).
approaches, each instance in a data set is labeled asnormal or intrusion (attack) and a learning algorithm istrained over the labeled data. These approaches are ableto automatically retrain intrusion detection models ondifferent input data that include new types of attacks aslong as they have been labeled appropriately. The mainadvantage of misuse detection is that it can accuratelydetect known attacks, while its drawback is its inability todetect novel, previously unseen attacks.
Traditional anomaly detection approaches, on the otherhand, build models of normal data and detect deviationsfrom the normal model in observed data. Anomalydetection applied to intrusion detection and computersecurity has been an active area of research since it wasoriginally proposed by Denning [6]. Anomaly detectionalgorithms have the advantage that they can detect newtypes of intrusions as deviations from normal usage [6, 7].In this problem, given a set of normal data to train from,and given a new piece of test data, the goal of theintrusion detection algorithm is to determine whether thetest data belong to "normal" or to an anomalous behavior.We refer to this problem as supervised anomaly detection,since the models are built only according to the normalbehavior on the network. In contrast, unsupervisedanomaly detection attempt to detect anomalous behaviorwithout using any knowledge about the training data.However, both types of anomaly detection schemes sufferfrom a high rate of false alarms. This occurs primarilybecause previously unseen (yet legitimate) systembehaviors are also recognized as anomalies, and henceflagged as potential intrusions.
This paper presents the scope and status of our researchwork in misuse detection. The paper first gives the briefoverview of our research in building predictive modelsfor learning from rare classes and proposes severaltechniques for detecting network intrusions. We presentexperimental results on publicly available KDDCup'99data set [20]. Experimental results on the KDDCup'99data set have demonstrated that rare class predictivemodels are much more efficient in the detection ofintrusive behavior than standard classification techniques.
II. BACKGROUNDIn misuse detection related problems, standard data
mining techniques are not applicable due to severalspecific details that include dealing with skewed classdistribution, learning from data streams (intrusions are
1-4244-0433-9/06/$20.00 C2006 IEEE. 59
sequences of events) and proper labeling networkconnections. The problem of skewed class distribution isvery pronounced in the network intrusion detection sinceintrusion as a class of interest is much smaller i.e. rarerthan the class representing normal network behavior. Insuch scenarios when the normal behavior may typicallyrepresent 98-99% of the entire population a trivialclassifier that labels everything with the majority classcan achieve 98-99% accuracy. It is apparent that in thiscase classification accuracy is not sufficient as a standardperformance measure. ROC analysis and metrics such asprecision, recall and F-value [8, 9, 10] have been used tounderstand the performance of the learning algorithm onthe minority class. A confusion matrix, shown in Table 1,is used to define these metrics.From Table 1, recall, precision and F-value may be
defined as follows:Precision= TP / (TP + FP) (1)Recall = TP / (TP + FN) (2
F-value = (1+ /8) Re call Pr ecision (3)- Re call + Pr ecision
where A corresponds to relative importance of preci-sion vs. recall and is usually set to 1. The main focus ofall learning algorithms is to improve the recall, withoutsacrificing the precision. However, the recall andprecision goals are often conflicting and attacking themsimultaneously may not work well, especially when oneclass is rare. The F-value incorporates both precision andrecall, and the "goodness" of a learning algorithm for theminority class can be measured by the F-value. WhileROC curves represent the trade-off between values of TPand FP, the F-value basically incorporates the relativeeffects/costs of recall and precision into a single number.
TABLE 1: STANDARD METRICS FOR EVALUATIONS OF INTRUSIONS
Confusion matrix (Standard Predicted connection labelmetrics) | Normal Intrusions(Attacks)
Actual Normal True Negative (TN) False Alarm (FP)connection Intrusions False Negative Correctly detected
label (Attacks) (FN) attacks (TP)
III. METHODOLOGYResearchers have dealt with class imbalance using
different techniques such as manipulating data records(e.g. over-sampling the minority class samples withreplacement, under-sampling the majority class [11-14] orgenerating artificial examples from minority class),designing new algorithms suitable for learning rareclasses (e.g. SHRINK, PN-rule, CREDOS), case specificfeature/rule weighting, boosting based algorithms(SMOTEBoost, RareBoost), cost sensitive classification(MetaCost, AdaCost, CSB, SSTBoost). In this paper wewill focus only on techniques based on manipulating datarecords.
The method based on over-sampling the rare classessimply makes the duplicates of the rare class until the dataset contains as many examples as the majority class, sothe classes are balanced. Over-sampling does not increase
information but increase misclassification cost. However,the effect of over-sampling is to identify more specificdecision regions of the minority class in the feature space.This can lead to over-fitting, with the minority classdecision region becoming very specific.
The technique based on under-sampling (down-sizing)the majority class samples the data records from majorityclass by choosing data records completely randomly orrecords by "near miss" technique or data records that arefar from minority class examples (far from decisionboundaries). The under-sampling technique introducessampled data records into the original data set instead oforiginal data records from the majority class, andtypically results in a general loss of information andpotentially overly general rules.SMOTE (Synthetic Minority Oversampling Technique)
was proposed to counter the effect of having fewinstances of the minority class in a data set [15].Operating in the "feature space" rather than the "dataspace" creates synthetic instances of the minority class.By synthetically generating more instances of theminority class, the inductive learners, such as decisiontrees (e.g. C4.5 [16]) or rule-learners (e.g. RIPPER [17]),are able to broaden their decision regions for the minorityclass. We deal with nominal (or discrete) and continuousattributes differently in SMOTE. In the nearest neighborcomputations for the minority classes we use Euclideandistance for the continuous features and the ValueDistance Metric (with the Euclidean assumption) for thenominal features [15, 18, 19]. The new synthetic minoritysamples are created as follows:
* For the continuous featureso Take the difference between a feature vector
(minority class sample) and one of its k nearestneighbors (minority class samples).
o Multiply this difference by a random numberbetween 0 and 1.
o Add this difference to the feature value of theoriginal feature vector, thus creating a new featurevector
* For the nominal featureso Take majority vote between the feature vector in
consideration and its k nearest neighbors for thenominal feature value. In the case of a tie, chooseat random.
o Assign that value to the new synthetic minorityclass sample.
Using this technique, a new minority class sample iscreated along the line segment joining a minority classsample and its nearest neighbor, see Fig 1. Hence, usingSMOTE, more general regions are learned for theminority class, allowing the classifiers to better predictunseen examples belonging to the minority class.According to [8], combination of SMOTE and under-sampling creates potentially optimal classifiers as amajority of points from the SMOTE and under-samplingcombination lie on the convex hull of the family of ROCcurves.
60
I 'j
Minority class example. O I
7 0 m /I
N% 0
000
p
0
0
0Fig. 1. Illustration of SMOTE technique. Point i is among k=5nearest neighbors of a minority class example. A syntheticexample is generated between points i and p.
To perform classification (i.e., to distinguish betweenexamples belonging to normal behavior and those relatedto different attempts of network intrusion), we usedartificial neural networks [22]. Artificial neural networksin general model multivariate non-linear functions usingspecial processing nodes neuron. Each neuron receivessignals from the neurons it is connected to, proportionallyto the synaptic weights, and generates the outputaccording to its transfer function. In turn, the neuron'soutput is fed to other neurons. Artificial neural networkshave layered structure when a neuron from i-th layer getssignals from neurons from i- Ith layer and sends signals toi+lth layer of the network. In this paper, we usemultilayer perceptron, a type of layered artificial neuralnetwork where synaptic weights are learned so that theexpected squared difference between desired outputsignal of the network (outputs of the last output layer)and the actual outputs is minimized. Typically,classification using neural networks consists of thefollowing two steps: a) learning, where a training set issubmitted to the network and synaptic weights areadapted according to a learning algorithm; b) testing,where a class of unknown example is decided based onthe outputs of the network. In this paper, we deal with amulticlass problem, so the classification is performed asfollows: two largest outputs oi and oj of the outputneurons are observed and class i is decided if:
0l oj> threshold. (4)where threshold is a parameter to be determined. Whenthreshold=0, this reduces to ordinary neural networkclassifiers. When threshold>O, we restrain fromclassification and pronounce a data sample as notclassified.
IV. EXPERIMENTS
A. DatasetsOur experiments were performed on the KDD Cup-99
intrusion detection data set from the KDD Cup 1999competition [20].
In KDD Cup 1999 competition, the competition taskwas to build a network intrusion detector, a predictivemodel capable of distinguishing between "bad"connections, called intrusions or attacks, and "good"connections. The data set represents a modification of the
DARPA 1998 Intrusion Detection Evaluation Data [21]prepared by MIT Lincoln Lab and it contains a widevariety of intrusions simulated in a military networkenvironment. The entire data set contains original trainingdata and original test data. The original raw training datacorresponds to seven weeks of network traffic andcontains around five million network connections. Forour experiments we have used a 10% sample originallycreated by the KDDCup99 organizers [20]. A networkconnection is a sequence of TCP packets starting andending at some well defined times, between which dataflows to and from a source IP address to a target IPaddress under some well defined protocol. Eachconnection is labeled as either normal, or as an attack,with exactly one specific attack type. The original testdata corresponds to two weeks of network traffic andcontains 311,029 network connections. In addition to thenormal network connections, the data contains four maincategories of attacks:+ DoS (Denial of Service), for example, ping-of-death,
teardrop, smurf, SYN flood, etc.;+ R2L (Remote to Local), unauthorized access from a
remote machine, for example, guessing password;+ U2R (User to Remote), unauthorized access to local
super-user privileges by a local unprivileged user,for example, various buffer overflow attacks;
+ Probe, surveillance and probing, for example,port-scan, ping-sweep, etc.
The distribution of network connections in the new testdata set is given in Table 2.
TABLE 2: SUMMARY OF KDD99CUP DATA SET USED INEXPERIMENTS
|Number of majority Number of minority 1 Number ofData set class instances class instances classes
DoS 231455 U2R 246KDDCup-99 -Probe 4166Intrusion Normal 60593
R2L 14569
B. ResultsData were pre-processed as follows. Categorical
attributes were encoded using "one of c" encoding, wherec is the number of distinct attribute values. Attributes withzero standard deviation on training set are excluded fromthe training and test data. Subsequently, each attribute ontraining set is normalized to have zero mean and unitstandard deviation and attributes on the test set arenormalized accordingly. Class label c1 of i-th example is
encoded using the vector v = [V1 v2 v3 v4 v5]
where v =I Ji O.1,j # c.
=1,...,5. This way we
prevented the problems related to the output neuronssaturation [22].
For classification, we used three-layer non-linearperceptron [22]. The first layer consists of input neurons,where the number of neurons is equal to the number of
61
I
I
I
input attributes. Second-hidden layer consists of S1neurons with tansig transfer function, described by thefollowing equation:
tan sig(x) =e e-XeX + e-X
where the number of neurons S1 is varied inexperiments. The output layer consists of 5 neurons withlogistic sigmoid transfer function:
log si(x) =1
'- I+e-x ' 'The neural networks were trained using Conjugate
Gradient with Powell/Beale Restarts algorithm [23]. Toimprove neural network generalization, an early stoppingcriterion is used. A validation set consisting of 20% ofdata available for training is used to evaluate mean-squareerror (MSE) of trained network and training stops if theMSE starts to increase. Maximal number of epochs forneural network training was set to 200.To evaluate the performance of the neural networks,
for each setting (training set, parameters choice) werepeated classification experiments ten times and mergedconfusion matrices. The precision, recall and f-value wereestimated based on the combined confusion matrix. Inaddition, we estimated the missclassfication rate therelative number of examples classified to other classes(excluding the examples that are not classified).We experimented with four training datasets. The
trainseti was standard KDDCup99 training dataset. Thetrainset2 was obtained by oversampling of rare classesfrom KDDCup99 dataset, resulting in 22611 samplesfrom each class. The Using SMOTE technique, weoversampled rare classes U2R, Probe and R2L resultingin 5000 samples per class in trainset3. In this training set,all original samples from DoS and Normal classes wereretained. Finally, trainset4 consists the same samplesfrom trainset3 corresponding to rare classes, while DoSand Normal were downsampled such that each contains5000 examples.
In the first experiment, we varied number of hiddenneurons (S1 = 5, 10, 15, 20, 25, 30) and estimatedclassification accuracy on trainsetl with threshold = 0(ordinary NN classification). After S1 = 10, there was nosignificant improvement of the classification accuracy.Hence, the rest of the experiments were performed with10 hidden neurons.
In the second experiment, we compared recall,precision and f-value on the testset when neural networkswere trained on four training sets discussed above. Theresults are reported in Table 3. The best recall rates onfrequent classes (DoS and Normal) were obtained usingoriginal KDDCup99 data, but acceptable results are alsoobtained using oversampling and SMOTE technique(trainsets 2 and 3). On rare classes Probe and R2L thebest performances were obtained using oversampling(trainset2). Class U2R was most difficult to detect: thebest recall obtained was 17.7% (using SMOTE combinedwith undersampling). Achieved precision on DoS classwas good using either training set, but the best results
were obtained on trainset2 and trainsetl. The bestprecision for Normal class was also achieved usingoversampling. For rare class Probe, trainsetl and trainset2provided the best precision. However, for rare classesU2R and R2L achieved precision was rather small usingeither of the training sets.
TABLE 3. PERFORMANCE METRICS ON KDDCUP99 TEST DATAUSING DIFFERENT TRAINING SETS, S1=10 HIDDEN NEURONS ANDZERO DECISION THRESHOLD (THEI BEST VALUES FOR EACH CLASSARE UNDERLINED).
Training Recallset DoS R2L U2R Probe NormalTrainseti 96.7% 2.56%o|1.72% |72.3 |o 97.2%Trainset2 96.7% 49.7% 1.33% 76.7% 96.7%Trainset3 90.9% 40.6% 2.20% 75.7% 93.3%Trainset4 73.4% 53.7% 17.7% 75.0% 45.1%
Training Precisionset DoS R2L U2R Probe NormalTrainseti 99.11% 5.040% 17.5% 75.20% 73.8%Trainset2 99.4% 6.09% 7.90% 74.61% 75.7%Trainset3 99.0% 2.60% 12.3% 23.1% 72.3%Trainset4 91.5% 0.81% 22.3% 5.48% 67.1%
Training f-valueset DoS R2L U2R Probe NormalTrainseti 97.9% 3.400% 3.14% 73.70%o 83.9%Trainset2 98.0% 11.8% 2.28% 75.8% 84.9%Trainset3 94.8% 4.88% 3.73% 35.4% 81.5%Trainset4 81.5% 1.60% 19.7% 10.2% 53.9%
Overall performance, measured by f-value, was bestusing oversampling, with the exception of class U2R,where classification was best if trainset4 was used fortraining. Surprisingly, techniques based on SMOTEmethod were not particularly successful.To have better insight in the character of
misclassification, we looked at the combined confusionmatrices (obtained by merging confusion matrices from10 experimental repetitions), see Table 4. As we can see,the main reason why classification was poor on rareclasses R2L and L2R using original training set waswrong assignment of a "Normal" class label.Oversampling increased the number of samples from R2Lclass in the training set and helped more of the examplesfrom this class being properly classified.To examine the influence of introducing decision
margin (represented by a threshold value), we plot thedependence of misclassification rate on recall rate fordifferent values of threshold (0; 0.01; 0.05; 0.1; 0.2; 0.3;0.4), (see Fig. 2). For the same threshold, the betterperformance is achieved when misclassification rate issmall and recall is high (lower right part of the diagrams).Generally, the decision threshold > 0 results in smallerrecall rate (some samples correctly classified withthreshold = 0 may not be classified) but also in smallernumber of misclassified examples (examples wronglyclassified when the threshold is zero are not assigned toany class). The classifier with margin is hence useful if,for the specific value of the threshold, the drop in recall issmaller than the drop in misclassification rate. In such
62
cases, the examples that are not classified may be subjectto additional classification by another learning algorithm(or in practical setting, by a human expert). For example,for class U2R, when margin was set to 0.1, the recall ratedropped from 17.7% to 14.5% while misclassificationdecreased from 82.3% to 70.50/ (when networks weretrained on trainset4).
TABLE 4. COMBINED CONFUSION MATRICES ON KDDCUP99 TESTDATA USING DIFFERENT TRAINING SETS, S 1=0 HIDDEN NEURONSAND ZERO DECISION THRESHOLD.
sampling) leads to decrease of misclassification rate by10.5% while the recall dropped only by 4.2%. However,the introduction of margin was not always beneficial toperformance: e.g., on class Probe, networks trained ontrainset4 performed better than networks trained ontrainsetl with threshold = 0, but for threshold larger than0.05, the later networks had better performance (higherrecall for the same misclassification rates). Similarly, forclass R2L, the performance of networks trained ontrainset2 and trainset3 approached each other for highervalues of decision margin (threshold = 0.4)
Trainsetl Predicted connection labelDoS R2L U2R Probe Normal
DoS 2236960 994 99349 22558 |65589R2L 100 63 249 336 1712U2R 4774 540 2510 519 137347
¢ Probe 4811 482 1888 30121 4358Normal 9712 72 352 6543 589251
Trainset2 Predicted connection label(oversampling) DoS R2L U2R Probe Normal
DoS 2235609 42266 17343 2306 55066e R2L 1 2 3 623 79 383
:OU2R 4166 10654 1944 164 128761¢ = Probe 1210 1751 2821 32061 3816
Normal 7709 2236 1870 8360 585756
Class: DoS
70 80 90recall (%)
100
Class: U2R
10 15 20recall (%)
Class: R2L100 fo
90 ; 0A 0.2
80 0.3
i70 00.4
60 0
50~~~~~~~~~~~~~50 -0. 1/ 0 0
40 0 2,
30 0"330.4 0.3
200 20 40 60
30r
25
20
E 15
1060
Class: Normal
Class: Probe
65 70recall (%)
75 80
Ltrainsetl (KDDCup99 training data)
..trainset2 (oversampling)
trainset3 (SMOTE)
trainset4(SMOTE with undersampling)20 40 60 80 100
recall (%)
Fig.2. Recall (percent of correctly classified examples) andmisclassication (percentage of examples assigned to wrong
class) for different classes of normal behavior and intrusions anddifferent values of decision margins.
Similarly, for the class Probe, the introduction ofthreshold = 0.3 on networks trained on trainset2 (over-
V. CONCLUSIONS AND FUTURE WORK
In this paper, we discussed application of multilayerperceptrons for network intrusion detection. Weconsidered various techniques to improve class imbalancein datasets with skewed class distributions. Also, theinfluence of decision margin (when the classification isnot performed if the difference between two largestoutputs of the neural network is smaller than a threshold)on the prediction accuracy was considered. All theexperiments were performed on KDDCup99 networkintrusion dataset. The best classification performance wasobtained using the over-sampling technique. However,class of attacks U2R remains difficult to predict (all theexamined techniques typically assign samples belongingto this group into the Normal class). Hence, it would beinteresting to construct a classifier which woulddiscriminate between class U2R and normal samples. Webelieve that the use of such a classifier, combined withthe networks considered here, could lead to furtherimprovements in prediction performance.
The experiments discussed in this paper did notperform feature selection: instead all attributes from thedataset (with non-zero variance) are used to train and testthe prediction model. Further improvements of accuracyare possible when feature selection/feature extractiontechniques are introduced and this is part of our work inprogress.
The SMOTE technique, as proposed in [15] considersall data from each class to belong to a compact set-cluster. A part of our ongoing research efforts is toexamine techniques which perform unsupervised learningon each class subsequent application of the SMOTEtechnique on such distinguished clusters.
REFERENCES[1] W. Lee and S. J. Stolfo. Data Mining Approaches for Intrusion
Detection. Proceedings of the 1998 USENIX Security Symposium,1998.
[2] E. Bloedorn, et al., Data Mining for Network Intrusion Detection:How to Get Started, MITRE Technical Report, August 2001.
[3] J. Luo, Integrating Fuzzy Logic With Data Mining Methods forIntrusion Detection, Master's thesis, Department of ComputerScience, Mississippi State University, 1999.
[4] D. Barbara, N. Wu, S. Jajodia, Detecting Novel Network IntrusionsUsing Bayes Estimators, First SIAM Conference on Data Mining,Chicago, IL, 2001.
[5] s. Manganaris, M. Christensen, D. Serkle, and K. Hermix, "A DataMining Analysis of RTID Alarms," in Proceedings of the 2nd
63
30
25
20
15
10
60
100 _
90
t 80
70
60
50
40
60
50
40
30
20
10
n
International Workshop on Recent Advances in IntrusionDetection (RAID 99), West Lafayette, IN, September 1999.
[6] D.E. Denning, An Intrusion Detection Model, IEEE Transactionson Software Engineering, SE-13:222-232, 1987.
[7] H.S. Javitz, and A. Valdes, The NIDES Statistical Component:Description and Justification, Technical Report, Computer ScienceLaboratory, SRI International, 1993.
[8] F. Provost and T. Fawcett, Robust Classification for ImpreciseEnvironments, Machine Learning, vol. 42/3, pp. 203-231, 2001.
[9] M. Joshi, R. Agarwal, V. Kumar, PNrule, Mining Needles in aHaystack: Classifying Rare Classes via Two-Phase Rule Induction,Proceedings of ACM SIGMOD Conference on Management ofData, May 2001
[10] M. Joshi, R. Agarwal, V. Kumar, Predicting Rare Classes: CanBoosting Make Any Weak Learner Strong?, Proceedings of EightACM Conference ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, Edmonton, Canada, 2002.
[11] M. Kubat, R. Holte, and S. Matwin, Machine Learning for theDetection of Oil Spills in Satellite Radar Images, MachineLearning, vol. 30, pp. 195-215, 1998.
[12] N. Japkowicz, The Class Imbalance Problem: Significance andStrategies, Proceedings of the 2000 International Conference onArtificial Intelligence (IC-AI'2000): Special Track on InductiveLearning, Las Vegas, Nevada, 2000.
[13] D. Lewis and J. Catlett, Heterogeneous Uncertainty Sampling forSupervised Learning, Proceedings of the Eleventh InternationalConference of Machine Learning, San Francisco, CA, 148-156,1994.
[14] C. Ling and C. Li, Data Mining for Direct Marketing Problemsand Solutions, Proceedings of the Fourth International Conferenceon Knowledge Discovery and Data Mining, New York, NY, 1998.
[15] N. Chawla, K. Bowyer, L. Hall, P. Kegelmeyer, SMOTE:Synthetic Minority Over-Sampling Technique, Journal of ArtificialIntelligence Research, vol. 16, 321-357, 2002.
[16] J. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA:Morgan Kaufman, 1992.
[17] W. Cohen, Fast Effective Rule Induction, Proceedings of theTwelfth International Conference on Machine Learning, LakeTahoe, CA, 115-123, 1995.
[18] C. Stanfill and D. Waltz, "Toward Memory-based Reasoning,"Communications of the ACM, vol. 29, no. 12, pp. 1213-1228,1986.
[19] S. Cost and S. Salzberg, "A Weighted Nearest NeighborAlgorithm for Learning with Symbolic Features," MachineLearning, vol. 10, no. 1, pp. 57-78, 1993.
[20] KDD-Cup 1999 Task Description,http:Hkdd.ics.uci.edu/databases/kddcup99/task.html
[21] R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. P. Kendall,D. McClung, D. Weber, S. E. Webster, D. Wyschogrod, R. K.Cunningham, and M. A. Zissman, Evaluating Intrusion DetectionSystems: The 1998 DARPA Off-line Intrusion DetectionEvaluation, Proceedings DARPA Information SurvivabilityConference and Exposition (DISCEX) 2000, Vol 2, pp. 12-26,IEEE Computer Society Press, Los Alamitos, CA, 2000.
[22] Haykin, S., Neural Networks, a Comprehensive foundation, 2ndedn, Prentice Hall, 1998.
[23] Powell, M. J. D.,"Restart procedures for the conjugate gradientmethod," Mathematical Programming, vol. 12, pp. 241-254, 1977.
64