PROPEDEUTICA for Real-time Malware Detection - arXiv

12
1 Learning Fast and Slow: P ROPEDEUTICA for Real-time Malware Detection Ruimin Sun †* , Xiaoyong Yuan ‡* , Pan He § , Qile Zhu § , Aokun Chen § , Andre Gregio , Daniela Oliveira § , and Xiaolin Li ‡‡ Northeastern University Michigan Technological University § University of Florida Federal University of Parana ‡‡ Cognization Lab * Equal Contribution Abstract—Existing malware detectors on safety-critical devices have difficulties in runtime detection due to the performance overhead. In this paper, we introduce PROPEDEUTICA 1 , a frame- work for efficient and effective real-time malware detection, leveraging the best of conventional machine learning (ML) and deep learning (DL) techniques. In PROPEDEUTICA, all software start execution are considered as benign and monitored by a con- ventional ML classifier for fast detection. If the software receives a borderline classification from the ML detector (e.g. the software is 50% likely to be benign and 50% likely to be malicious), the software will be transferred to a more accurate, yet performance demanding DL detector. To address spatial-temporal dynamics and software execution heterogeneity, we introduce a novel DL architecture (DEEPMALWARE) for PROPEDEUTICA with multi- stream inputs. We evaluated PROPEDEUTICA with 9,115 malware samples and 1,338 benign software from various categories for the Windows OS. With a borderline interval of [30%-70%], PROPEDEUTICA achieves an accuracy of 94.34% and a false- positive rate of 8.75%, with 41.45% of the samples moved for DEEPMALWARE analysis. Even using only CPU, PROPEDEUTICA can detect malware within less than 0.1 seconds. Index Terms—deep learning, malware detection, spatial- temporal analysis, multi-stage classification I. I NTRODUCTION Malware is continuously evolving [1], and existing pro- tection mechanisms have not been coping with their sophis- tication [2]. The industry still heavily relies on signature- based technology for malware detection [3], but these methods have many limitations: (i) they are effective only for malware with known signatures; (ii) they are not sustainable, given the massive amount of samples released daily; and (iii) they can be evaded by zero-day and polymorphic/metamorphic malware (practical detection 25%-50%) [4]. Behavior-based approaches attempt to identify malware behaviors using instruction sequences [5], [6], computation trace logic [7], and system or API call sequences [8]–[10]. These solutions have been mostly based on conventional machine learning (ML) models, such as K-nearest neighbor, SVM, neural networks, and decision tree algorithms [11]– [14]. However, current solutions based on ML still suffer from high false-positive rates, mainly because of (i) the complexity 1 Propedeutics refers to diagnosing a patient condition by first performing initial non-specialized, low-cost exams, and then proceeding to specialized, possibly expensive diagnostic procedures if preliminary exams are inconclu- sive. and diversity of modern benign software and malware [1], [9], [15], [16], which are hard to capture during the learning phase of the algorithms; (ii) sub-optimal feature extraction; (iii) limited training/testing datasets, and (iv) the challenge of concept drift [17], which makes it hard to generalize learning models to reflect future malware behavior. The accuracy of malware classification depends on gaining sufficient context information about software execution and on extracting meaningful abstraction of software behavior. For system/API-call based malware classification, longer se- quences of calls likely contain more information. However, conventional ML-based detectors (e.g., Random Forest [18]) often use short windows of system calls during the train- ing phase to avoid the curse of dimensionality (when the dimension increases, the classification needs more data to support and becomes harder to solve [19]), and may not be able to extract useful features for accurate detection. Thus, the main drawback of current behavioral-based approaches is that they might lead to low accuracy and many false-positives because it is hard to analyze complex and longer sequences of malicious behaviors with limited window sizes, especially when malicious and benign behaviors are interposed. For instance, the n-grams of system calls are widely used as features of ML detection, which, however, are prone to bring blind spots in the detection. In contrast, emerging deep learning (DL) models [20] are capable of analyzing longer sequences of system calls and making more accurate classification through higher level information extraction, while circumventing the curse of di- mensionality. However, DL requires more time to gain enough information for classification and to predict the probability of detection. Moreover, the state-of-the-art DL models consist of deep layers and a huge amount of parameters, which results in slow calculation in the prediction. Such slow process makes DL models infeasible to provide predictions on malware in real time. This trade-off is challenging: fast and perhaps not- so-accurate (ML methods) vs. time-consuming and more ac- curate classification (DL methods). Applying a single method only can be either inefficient or ineffective, which becomes impractical for real-time malware detection. In this paper, we introduce and evaluate PROPEDEUTICA,a framework for efficient and effective real-time malware detec- tion, considering both prediction performance and efficacy in the detection phase. PROPEDEUTICA is designed to combine arXiv:1712.01145v2 [cs.CR] 17 Oct 2021

Transcript of PROPEDEUTICA for Real-time Malware Detection - arXiv

1

Learning Fast and Slow PROPEDEUTICA forReal-time Malware Detection

Ruimin Sundaggerlowast Xiaoyong YuanDaggerlowast Pan Hesect Qile Zhusect Aokun Chensect Andre GregioparaDaniela Oliveirasect and Xiaolin LiDaggerDagger

daggerNortheastern University DaggerMichigan Technological University sectUniversity of FloridaparaFederal University of Parana DaggerDaggerCognization Lab

lowastEqual Contribution

AbstractmdashExisting malware detectors on safety-critical deviceshave difficulties in runtime detection due to the performanceoverhead In this paper we introduce PROPEDEUTICA1 a frame-work for efficient and effective real-time malware detectionleveraging the best of conventional machine learning (ML) anddeep learning (DL) techniques In PROPEDEUTICA all softwarestart execution are considered as benign and monitored by a con-ventional ML classifier for fast detection If the software receivesa borderline classification from the ML detector (eg the softwareis 50 likely to be benign and 50 likely to be malicious) thesoftware will be transferred to a more accurate yet performancedemanding DL detector To address spatial-temporal dynamicsand software execution heterogeneity we introduce a novel DLarchitecture (DEEPMALWARE) for PROPEDEUTICA with multi-stream inputs We evaluated PROPEDEUTICA with 9115 malwaresamples and 1338 benign software from various categories forthe Windows OS With a borderline interval of [30-70]PROPEDEUTICA achieves an accuracy of 9434 and a false-positive rate of 875 with 4145 of the samples moved forDEEPMALWARE analysis Even using only CPU PROPEDEUTICAcan detect malware within less than 01 seconds

Index Termsmdashdeep learning malware detection spatial-temporal analysis multi-stage classification

I INTRODUCTION

Malware is continuously evolving [1] and existing pro-tection mechanisms have not been coping with their sophis-tication [2] The industry still heavily relies on signature-based technology for malware detection [3] but these methodshave many limitations (i) they are effective only for malwarewith known signatures (ii) they are not sustainable given themassive amount of samples released daily and (iii) they canbe evaded by zero-day and polymorphicmetamorphic malware(practical detection 25-50) [4]

Behavior-based approaches attempt to identify malwarebehaviors using instruction sequences [5] [6] computationtrace logic [7] and system or API call sequences [8]ndash[10]These solutions have been mostly based on conventionalmachine learning (ML) models such as K-nearest neighborSVM neural networks and decision tree algorithms [11]ndash[14] However current solutions based on ML still suffer fromhigh false-positive rates mainly because of (i) the complexity

1Propedeutics refers to diagnosing a patient condition by first performinginitial non-specialized low-cost exams and then proceeding to specializedpossibly expensive diagnostic procedures if preliminary exams are inconclu-sive

and diversity of modern benign software and malware [1][9] [15] [16] which are hard to capture during the learningphase of the algorithms (ii) sub-optimal feature extraction(iii) limited trainingtesting datasets and (iv) the challenge ofconcept drift [17] which makes it hard to generalize learningmodels to reflect future malware behavior

The accuracy of malware classification depends on gainingsufficient context information about software execution andon extracting meaningful abstraction of software behaviorFor systemAPI-call based malware classification longer se-quences of calls likely contain more information Howeverconventional ML-based detectors (eg Random Forest [18])often use short windows of system calls during the train-ing phase to avoid the curse of dimensionality (when thedimension increases the classification needs more data tosupport and becomes harder to solve [19]) and may not beable to extract useful features for accurate detection Thusthe main drawback of current behavioral-based approaches isthat they might lead to low accuracy and many false-positivesbecause it is hard to analyze complex and longer sequencesof malicious behaviors with limited window sizes especiallywhen malicious and benign behaviors are interposed Forinstance the n-grams of system calls are widely used asfeatures of ML detection which however are prone to bringblind spots in the detection

In contrast emerging deep learning (DL) models [20]are capable of analyzing longer sequences of system callsand making more accurate classification through higher levelinformation extraction while circumventing the curse of di-mensionality However DL requires more time to gain enoughinformation for classification and to predict the probability ofdetection Moreover the state-of-the-art DL models consist ofdeep layers and a huge amount of parameters which results inslow calculation in the prediction Such slow process makesDL models infeasible to provide predictions on malware inreal time This trade-off is challenging fast and perhaps not-so-accurate (ML methods) vs time-consuming and more ac-curate classification (DL methods) Applying a single methodonly can be either inefficient or ineffective which becomesimpractical for real-time malware detection

In this paper we introduce and evaluate PROPEDEUTICA aframework for efficient and effective real-time malware detec-tion considering both prediction performance and efficacy inthe detection phase PROPEDEUTICA is designed to combine

arX

iv1

712

0114

5v2

[cs

CR

] 1

7 O

ct 2

021

2

the best of ML and DL methods ie speed in predictionfrom the ML methods and accuracy from the DL methodsIn PROPEDEUTICA all software in the system is subjected toML for fast classification If a piece of software receives a bor-derline malware classification probability it is then subjectedto additional analysis with a more performance expensiveand more accurate DL classification The DL classificationis performed via our proposed algorithm DEEPMALWARECompared to existing DL methods [21]ndash[23] DEEPMALWAREcan learn spatially local and long-term features and handlesoftware heterogeneity via the application of both ResNextblocks and recurrent neural networks with multi-stream inputsBy leveraging such multi-stage design PROPEDEUTICA iscapable of providing precise and real-time detection

We evaluated PROPEDEUTICA with a set of 9115 malwaresamples and 1338 common benign software from variouscategories for the Windows OS PROPEDEUTICA leveragedsliding windows of system calls for malware detection Ourproposed deep learning algorithm DEEPMALWARE achieveda 9703 accuracy and a 243 false positive rate Bycombining conventional ML and DL PROPEDEUTICA sub-stantially improves the prediction performance while keepingthe detection time less than 01 seconds which makes DLfeasible for the real-time malware detection

In this paper we present the following contributions (i)PROPEDEUTICA a new framework for efficient and effectivereal-time malware detection for Windows OS (ii) an evalu-ation of PROPEDEUTICA with a collection of 9115 malwareand 1338 benign software and (iii) DEEPMALWARE a novelDL algorithm learning multi-scale spatial-temporal systemcall features with multi-stream inputs that can handle softwareheterogeneity

II METHODOLOGY

A Threat Model

PROPEDEUTICA is designed to provide precise real-timemalware detection ie a high accuracy rate few false-positives and less processing time PROPEDEUTICA is capableof defending against the most prevalent malware threatsand not against directed attacks such as specifically-craftedadvanced and persistent threats Hence we assume that mal-ware will eventually get in if an organization is targeted bya well-motivated attacker Therefore attacks either targetingPROPEDEUTICArsquos software implementation or the machinelearning engine (adversarial machine learning [24] [25]) areoutside of PROPEDEUTICArsquos scope We discuss the adversarialattacks against malware detection in Section V-B

PROPEDEUTICA operation assumes that before it loads thesystem is on a pristine state Thus PROPEDEUTICA will notscan all running processes at startup which limits its checkingprocedure to each new process created

B Overall Design

PROPEDEUTICA (Figure 1) consists of (i) a system callreconstruction module (ii) an ML classifier and (iii) our newlyproposed DEEPMALWARE classifier

The workflow of PROPEDEUTICA is outlined as follows

Fig 1 Architecture and workflow of PROPEDEUTICA formulti-stage malware detection

STEP 1 PROPEDEUTICA leverages a System Call Intercep-tion Driver to intercept and log all system calls invoked inthe system and associate them with the PID of the invokingprocessSTEP 2 The Reconstruction Module reads and parses loggedsystem calls compressing the system calls for input to the twoclassifiersSTEP 3 The classifiers generate sliding windows of systemcall traces with the PID of the invoking processSTEP 4 Conventional ML Classifier introduces a configurableborderline probability interval [lower bound upper bound]which determines the range of classification that is consideredinconclusive for detection For example letrsquos consider that theborderline interval is [30-70] If the ML classifier assignsthe label ldquomalwarerdquo for a piece of software whose classifica-tion range is within the predefined borderline interval PRO-PEDEUTICA considers its classification result inconclusive Ifthe software receives a ldquomalwarerdquo classification probabilityless than 30 PROPEDEUTICA considers that the softwareis not malicious If the classification probability is greaterthan 70 PROPEDEUTICA considers the software maliciousFor the inconclusive case ([30-70] interval) the softwarecontinues to run but is subjected to analysis by DeepMalwareClassifier for a definitive classificationSTEP 5 If the software is classified as benign by DeepMal-ware Classifier it continues running under the monitoring ofthe Conventional ML classifierSTEP 6 If the software is considered malicious PROPEDEU-TICA kills the malicious software

By leveraging the fast speed in ML methods and highaccuracy in DL methods PROPEDEUTICA aims to achieveprecise and real-time detection The main goal of this paperusing PROPEDEUTICA is to present the concept of a multi-stage malware detector that combines two distinct classifica-tion modelsmdashML and DL Therefore any presented imple-mentation should be understood as a proof-of-concept (PoC) toaccomplish such a goal and not as a platform-specific solutionWe intend that this PoC leads to a PROPEDEUTICA versionthat will be implemented in the future by OS vendors to beintegrated into their products Moreover PROPEDEUTICA doesnot rely on a GPU-powered system to speed up its detectionprocedures which broaden the application of PROPEDEUTICA

For the sake of simplicity our PoC was implemented inuser land ie as a user process in a non-privileged ringof execution Thus PROPEDEUTICArsquos trusted computing base

3

includes the OS kernel the learning models running in userland and the hardware stack We modeled PROPEDEUTICArsquosPoC for Microsoft Windows since it is the most targetedOS by malware writers [26] We limited our PoC to the 32-bit Microsoft Windows version which currently is the mostpopular Windows architecture

III IMPLEMENTATION DETAILS

This section discusses implementation details of three mainaspects of PROPEDEUTICArsquos operation the system call in-terception driver the reconstruction module and our newlyproposed DEEPMALWARE algorithm

A The System Call Interception Driver

Intercepting system calls is a key step for PROPEDEUTICAoperation as it allows collecting system call invocation datato be input to ML and DL classifiers Despite many existingtools for process monitoring such as Process Monitor [27]drstrace library [28] Core OS Tool [29] and WinDbgrsquosLogger [30] they suffer from various challenges ProcessMonitor provides only coarse-grained file and registry activitydrstrace works at the application level and can be bypassedby malware WinDbgrsquos Logger starts logging only at theentry point of the execution so system calls executed bythe initialization code in statically imported shared librarieswill not be seen Core OS tools trace coarse-grained eventssuch as interrupts memory events thread creation and termi-nation and etc Therefore we opted to implement our ownsolution to have more flexibility for deploying a real-timedetection mechanism PROPEDEUTICA collection driver wasimplemented for Windows 7 SP1 32-bit The driver hooks intothe System Service Dispatch Table (SSDT) which contains anarray of function pointers to important system service routinesincluding system call routines In Windows 7 32-bit systemthere are 400 entries of system calls [31] of which 155 systemcalls are interposed by our driver We selected a comprehensiveset of system calls to be able to monitor multiple distinctsubsystems which includes network-related system calls (egNtListenPort and NtAlpcConnectPort) file-related system calls(eg NtCreateFile and NtReadFile) memory-related systemcalls (eg NtReadVirtualMemory and NtWriteVirtualMemory)process-related system calls (eg NtTerminateProcess andNtResumeProcess) and other types (eg NtOpenSemaphoreand NtGetPlugPlayEvent)

This system call interception driver is publicly availableat [32] We open sourced our system call interceptor so thatfuture studies can generate real-time software execution tracesand evaluate the performance of their MLDL algorithms inmalware detection Our solution can also be used as a baselinefor comparison in future work

System call logs (timestamp PID syscall) were collectedusing DbgPrint Logger [33] which enables real-time kernelmessage logging to a specified IP address (localhost or remoteIP) thus allowing the logging pool and the PROPEDEUTICA toreside on different hosts for scalability and performance Thedriver monitors all processes added to the Borderline PIDslist which keeps track of processes which received borderlineclassification from the conventional ML classifier The time

to collect system calls varies among different software Thetime depends on the functionalities of the software (benignor malicious) and the types of system calls invoked and thefrequency of system call invocations Hence the time to collectsystem calls is software dependent

B The Reconstruction Module

In PROPEDEUTICA both the ML and the DEEPMALWAREclassifiers use the RECONSTRUCTION MODULE to prepro-cess the input system call sequences and obtain the samepreprocessed data The RECONSTRUCTION MODULE splitssystem calls sequences according to the PIDs of processesinvoking them and parses them into three types of sequentialdata process-wide n-gram sequence process-wide densitysequence and system-wide frequency feature vector Then theRECONSTRUCTION MODULE converts the sequential data intowindows using the sliding window method which is usuallyused to translate a sequential classification problem into aclassical one for every sliding window [34]

Process-wide n-gram sequence and density sequence Weuse the n-gram model widely used in natural language pro-cessing (NLP) applications [35] to compress system callsequences N-gram is defined as a combination of n contiguoussystem calls The n-gram model encodes low-level featureswith simplicity and scalability and compresses the data by re-ducing the length of each sequence with encoded informationThe process activity in a Windows system can be intensivedepending on the workload eg more than 1000 system callsper second resulting in very large sliding windows Thereforefor PROPEDEUTICA we decided to further compress thesystem call sequences and translate them into two-streamsequences n-gram sequences and density sequences n-gramsequence is a list of n-gram units while density sequenceis the corresponding frequency statistics of repeated n-gramunits There are many possible n-gram variants such as n-tuplen-bag and other non-trivial and hierarchical combinations(eg tuples of bags bags of bags and bags of grams) [9]PROPEDEUTICA uses 2-gram because (i) compared with n-bag and n-tuple the n-gram model is considered the mostappropriate for malware system call-based classification [9]and (ii) the embedding layer and the first few convolutionalneural layers can automatically extract high-level featuresfrom neighbor grams which can be seen as hierarchicalcombinations of multiple n-grams Once n-gram sequencesfill up a sliding window the RECONSTRUCTION MODULEdelivers the window of sequences to ML classifier and then DLclassifier (if in the inconclusive case) redirects the incomingsystem calls to the new n-gram sequences

System-wide frequency feature vector Our learning modelsleverage system calls as features from all processes runningin the system This holistic opposite to process-specific ap-proach is more effective for malware detection compared tocurrent approaches because modern malware can be multi-threaded [36] [37] System-wide information helps the modelslearn the interactions among all processes in the systemTo gain whole system information the RECONSTRUCTIONMODULE collects the frequency of different types of n-grams

4

from all processes during the sliding window and extractsthem as a frequency feature vector Each element of the vectorrepresents the frequency of an n-gram unit in the system callsequence

C DEEPMALWARE Classifier

PROPEDEUTICA aims to address spatial-temporal dynam-ics and software execution heterogeneity Spatially advancedmalware leverages multiple processes to work together toachieve a long-term common goal Temporally a maliciousprocess may demonstrate different types of behaviors (benignor malicious) during the lifetime of execution such as keepdormant or benign at the beginning of the execution Tothoroughly consider available behavior data in space and timewe introduced a novel DL model DEEPMALWARE First tocapture the spatial connections between multiple processes wefeed both process-wide features and system-wide features intoDEEPMALWARE The process-wide and system-wide inputsprovide detailed information about how the target software(malware or benign software) interacts with the rest softwarein the system Second to capture the temporal features wefirst introduced the bi-directional LSTM [38] to capture theconnection of n-grams However due to the gradient vanishingproblem of LSTM in processing long sequences we furtherintroduce multi-scale convolutional neural networks to extracthigh-level representation of n-gram system calls so as toreduce the input length of LSTM and avoid blind spotsTo facilitate harmonious coordination DEEPMALWARE stacksspatial and temporal models for joint training

System call sequences are taken as input for all processessubjected to DEEPMALWARE analysis (see Figure 2 for aworkflow of the classification approach) DEEPMALWAREleverages n-gram sequences of processes and frequency featurevectors of the system First two streams (process-wide n-gram sequence and density sequence) model the sequence anddensity of n-gram system calls of the process The third streamrepresents the global frequency feature vector of the wholesystem DEEPMALWARE consists of four main components(i) N-gram Embedding (ii) ResNext blocks (iii) Long Short-Term Memory (LSTM) Layers and (iv) Fully ConnectedLayers (Figure 3)

N-gram Embedding DEEPMALWARE adopts an encodingscheme called N-gram Embedding which converts sparse n-gram units into a dense representation DEEPMALWARE treatsn-gram units as discrete atomic inputs N-gram Embeddinghelps to understand the relationship between functionallycorrelated system calls and provides meaningful informationof each n-gram to the learning system Unlike vector rep-resentations such as one-hot vectors (which may cause datasparsity) N-gram Embedding mitigates sparsity and reducesthe dimension of input n-gram units The embedding layermaps sparse system call sequences to a dense vector Thislayer also helps to extract semantic knowledge from low-levelfeatures (system call sequences) and largely reduces featuredimension The embedding layer uses 256 neurons whichreduce the dimension of the n-gram model (number of uniquen-grams in the evaluation) from 3526 to 256

ResNext blocks DEEPMALWARE uses six ResNext blocksto extract different sizes of n-gram system calls in the slidingwindow Conventional sliding window methods leverage smallwindows of system call sequences and therefore experiencechallenges when modeling long sequences These conventionalmethods represent features in a rather simple way (eg onlycounting the total number of system calls in the windowswithout local information of each system calls) which maybe inadequate for a classification task The design of ResNextfollows the implementation in [39] Each ResNext blockaggregates multiple sets of convolutional neural networks withthe same topology ResNext blocks extract different n-grams toavoid blind spots in detection Residual blocks perform shortcuts between blocks to improve the training of deep layersBatch normalization is applied to speed up the training processafter convolutional layers and a non-linear activation functionReLU to avoid saturation

Long Short-Term Memory (LSTM) Layers The internaldependencies between system calls include meaningful contextinformation or unknown patterns for identifying malicious be-haviors To leverage this information DEEPMALWARE adoptsone of the recurrent architectures LSTM [38] for its strongcapability of learning meaningful temporalsequential patternsin a sequence and reflecting internal state changes modulatedby the recurrent weights LSTM networks can avoid gradientvanishing and learn long-term dependencies by using forgetgates LSTM layers gather information from the first twostreams process-wide n-gram sequence and density sequence

Fully Connected Layers DEEPMALWARE deploys a fullyconnected layer is deployed to encode system-wide frequencyThen it is concatenated with the output of bi-directionalLSTMs to gather both sequence-based process informationand frequency-based system-wide information The last fully-connected layer with the softmax function outputs the predic-tion with probabilities

Moreover we adopt three popular techniques to avoid over-training in DEEPMALWARE 1) We add a Dropout layer [40]with a dropout rate of 05 following the fully connected layer2) We include batch normalization [41] in the ResNext blocksafter the convolutional layers 3) We use 20 training data asa validation set and apply early stopping when the loss on thevalidation set does not decrease

D Conventional ML Classifier

We consider three pervasively used conventional ML al-gorithms Random Forest (RF) eXtreme Gradient Boosting(XGBoost) [42] and Adaptive Boosting (AdaBoost) [43]Random Forest and boosted trees have been considered as thebest supervised models on high-dimensional data [44] We useAdaBoost and XGBoost as representatives of boosted trees

To select the best features used in machine learning classi-fiers we consider both efficacy and effectiveness In PROPE-DEUTICA we take the frequency of n-gram units in a sequenceas input features and estimate the probabilities of a sequenceof system calls invoked by a malware Specifically a featurevector will be created for each sequence and each elementin the vector represents the frequency of the corresponding

5

Fig 2 Workflow of DEEPMALWARE classification approach

Fig 3 DEEPMALWARE architecture

n-gram unit We discuss the details of ML classifiers inSection IV

Notice that ML classifiers in PROPEDEUTICA can be gen-eralized to other ML algorithms when ML classifiers canoutput classification probabilities for borderline classification(ie probabilistic ML models) For Non-probabilistic MLclassifiers we will adopt a conversion function to provideclass probabilities For example the classification scores pro-vided by Support Vector Machine can be mapped into theclass probabilities via a logistic transformation with trainedparameters [45] K Nearest Neighbor (kNN) methods canbe extended with probabilistic methods for estimating theposterior probabilities using distances from neighbors [46]The classification probabilities of ML classifiers are used forthe borderline classification (Fig 1 Step 4) It is worth to notethat PROPEDEUTICA is designed to integrate with any machinelearning and deep learning methods in malware detection withminimal efforts Therefore we select commonly used andrepresentative algorithms to show PROPEDEUTICArsquos capabilityin plugging in these algorithms

IV EVALUATION

The main goal of our evaluation is to determine PROPEDEU-TICArsquos effectiveness for real-time malware detection Morespecifically we sought to discover to what extent PROPEDEU-TICA outperforms ML on prediction performance and DL onclassification time

We first describe the dataset used in our evaluation Nextwe evaluate the performance of our newly proposed DEEP-MALWARE algorithm in comparison with the most relevantML classifiers from the literature Then we present the resultsof our experiments on the proposed PROPEDEUTICA In ourevaluation we compare the performance of classifiers whenoperating on a CPU and a GPU GPUs not only reducethe training time for DL models but also help these models

achieve classification time several orders of magnitude lessthan conventional CPUs in the deployment

We performed our test on a server running Ubuntu 1204with 2 CPUs 4GB Memory 60GB Disk DEEPMALWAREwas implemented using Pytorch [47] The training and testingprocedures ran on Nvidia Tesla M40 GPUs

A Software Collection Method

For the malicious part the dataset consisted of 9115Microsoft Windows PE32 format malware samples collectedbetween 2014 and 2018 from threats detected by the incidentresponse team of a major financial institution (who chose toremain anonymous) in the corporate email and Internet accesslinks of the institutionrsquos employees and 7 APTs collectedfrom Rekings [48] Figure 4 shows the categorization of ourmalware samples using AVClass [49]2 The goal of malwarecollection is to be as broad as possible The distribution of ourcollected malware family (Figure 4) correlates to the distribu-tion of malware detected by AV companies For the benignpart our dataset was composed of 1338 samples including866 Windows benchmark software 50 system software 11commonly used GUI-based software and 409 other benignsoftware GUI-based software (eg Chrome Microsoft OfficeVideo Player) had keyboard and mouse operations simulatedusing WinAutomation [50]

B System Call Dataset

For each experiment we run malware and benign softwareand collect system-wide system calls for five minutes Theexperiments were carried out under three different scenar-ios (1) running one general malware one benchmarkGUI-basedother benign software and dozens of system software(2) running two general malware several benchmarkGUI-basedother benign software and dozens of system software(3) running one APT one benchmarkGUI-basedother benignsoftware and dozens of system software

We randomly sampled half of the malware and benign soft-ware for training and the other half for testing to avoid over-fitting [51] We collected 493095 traces in total During therunning of one malware sample multiple benign processes (es-pecially system processes) would run concurrently Thereforethe number of benign process executions we collected wouldbe much larger than that of malicious process executionsOur classifier would end up handling imbalanced datasets

2AVClass is an automatic vendor-agnostic malware labeling tool It pro-vides a fine-grained family name rather than a generic label (eg TrojanVirus etc) which contains scarce meaning for the malware sample

6

Fig 4 Malware families in our dataset classified by AVClass

which could adversely affect the training procedure becausethe classifier would be prone to learn from the majority classHence we under-sampled the dataset by reducing the numberof sliding windows from benign processes to the same asmalware processes so that the ML and DL detectors can learnfrom the same number of positive and negative samples

Window size and stride are two important hyper-parametersin sliding window methods indicating the total and the shiftingnumber of n-gram system calls in each process In real-timedetection large window sizes need more time to fill up thesliding window provide more context for DL models andtherefore achieve higher accuracy We aim to choose the propertwo hyper-parameters to ensure malware are detected accu-rately and in time (before completing its malicious behavior)In our experiments we observed that malware can successfullyinfect the systems in one to two seconds and 1000 systemcalls can be invoked in one second Therefore we selected andcompared three window size and stride pairs (100 50) (200100) (500 250) The results showed that with pair (500 250)DEEPMALWARE needs 1 to 2 seconds to fill up the windowand analyze With pair (200 100) DEEPMALWARE needs 05to 1 second While with pair (100 50) DEEPMALWARE needsonly 01 to 05 seconds The F1-scores for DEEPMALWAREare 9702 9508 and 9497 respectively

Although the detection performance is increased with theincrease of window sizes large window sizes indicate longerwait time for the detector to allow the software to be executedand fill up the windows During this long wait time themalware are more likely to conduct malicious behaviorsTherefore a proper window size needs to be carefully selectedso as to balance the prediction performance and the waittime Since the performance gain with respect to the increasedwindow size is marginal (around 2 increase in terms of F1score from 500 to 100) while the waiting time can be reducedto 10-20 we set the window size and stride as 100 and50 respectively for the following evaluation In practice thevalue of window sizes can be altered in our PROPEDEUTICAframework to meet the required detection performance (ega required accuracy or an acceptable FP rate)

C Performance Analysis of standalone ML and DL classifiers

We started by comparing the performance of DEEPMAL-WARE with relevant ML and DL classifiers Our metrics formodel performance were accuracy precision recall F1 scoreand false positive (FP) rate

Table I shows the detection performance and executiontime of using a single ML or DL model DEEPMALWAREachieves the best performance among all the models Withwindow size 500 and stride 250 DEEPMALWARE achievedan accuracy of 9703 and a false positive rate of 243 Theperformance of DEEPMALWARE increased with the increase inwindow size and stride Random Forest performs the secondbest with an accuracy of 8905 at window size 100 andstride 50 and an accuracy of 9394 at window size 500and stride 250 approximately 10 better than AdaBoostand 5 better than XGBoost Random Forest also achievesthe fastest speed among all the machine learning methodsWhile DEEPMALWARE model is 309 more accurate thanRandom Forest its classification time is approximately 100times slower on CPU and 3 to 5 times slower on GPU

We did not apply GPU devices to ML models because theirclassification time with conventional CPUs is much smaller(at least one order of magnitude) than those measured for DLalgorithms We denote lsquoNArsquo as not applicable in Table I Inreal life GPU devices especially specific DL GPUs for accel-erating DL trainingtesting are still not sufficiently widespreadin end-user or corporate devices In addition the preprocessingtime of Reconstruction Module is around 1098 seconds perrun or 00003 seconds per window (when using 100 windowsize) which is almost negligible compared with ML (00088seconds per window) or DL execution time (00383 secondsper window using GPU or 14104 seconds per window usingCPU)

We observed that malware could successfully infect thetarget system in one to two seconds and 1000 system callscan be invoked in one second Moreover traces from oneprocess may take a significant amount of time to fill upone sliding window because the process might be invokingsystem calls at a low rate Thus in the remainder of thissection we chose to compare all the algorithms using windowsizes of 100 and strides of 50 system calls For all windowsizes and strides the best ML model was Random ForestTherefore for PROPEDEUTICA we chose Random Forest forML classification and DEEPMALWARE for DL classification

In addition we evaluated PROPEDEUTICArsquos performanceon seven APTs with various borderline intervals PROPEDEU-TICA successfully detected all the APTs and six of themwere subjected to DEEPMALWARE classification For differentborderline intervals the false positive rate was approximately10 Random Forest in isolation detected one APT with a525 false positive rate

D Performance Analysis of PROPEDEUTICA combining MLand DL classifiers

Next we evaluated PROPEDEUTICArsquos performance for dif-ferent borderline intervals Based on the results from Table Iwe chose Random Forest as ML classifier and DEEPMAL-WARE as DL classifier with window size 100 and stride50 PROPEDEUTICA is evaluated based on various borderlineintervals As discussed before in PROPEDEUTICA if a pieceof software receives a malware classification probability fromRandom Forest smaller than the lower bound it is considered

7

TABLE I Detection performance of standalone ML or DL classifiers DEEPMALWARE achieves the best performance amongall the models Random Forest is approximately 8 more accurate and much faster than the other machine learning models

Models Window Size Stride Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)

AdaBoost

100 50

7925 7811 8131 7967 2280 NA 00187Random Forest 8905 8463 9544 8971 1735 NA 00089XGBoost 8478 9099 7722 8354 766 NA 00116DEEPMALWARE 9484 9263 9743 9497 776 00383 14104AdaBoost

200 100

7827 7284 9019 8059 3364 NA 00132Random Forest 9349 8999 9790 9377 1091 NA 00063XGBoost 7081 6365 9708 7689 7081 NA 00073DEEPMALWARE 9496 9305 9720 9508 727 00543 13784AdaBoost

500 250

8181 7944 8586 8252 2224 NA 00108Random Forest 9394 9716 9054 9373 265 NA 00048XGBoost 7919 9414 6227 7495 388 NA 00062DEEPMALWARE 9703 9754 9650 9702 243 00797 13344

TABLE II Comparison on different borderline policies for the PROPEDEUTICA combining ML and DL classifiers Borderlineintervals are described with a lower bound and an upper bound The move percentage represents the percentage of software inthe system that received a borderline classification with Random Forest (according to the borderline interval) and was subjectedto further analysis with DEEPMALWARE

LowerBound

UpperBound

Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)Move

Percentage

10 90 9471 9241 9743 9485 800 00223 0381 559520 80 9461 9223 9743 9476 821 00173 01543 483230 70 9434 9177 9743 9451 875 00146 00884 414540 60 9372 9079 9737 9396 992 00123 00497 3496

benign software If the probability is greater than the upperbound it is considered malicious However if the classificationfalls within the borderline interval the software is subjectedto DEEPMALWARE Table II shows the performance of PRO-PEDEUTICA using various (configurable) borderline intervalswhen combining ML and DL methods We chose lower boundas 10 20 30 and 40 upper bound as 60 70 80and 90 We observe that when a small borderline interval isconfigured (eg 30-70 40-60) the detection time isreduced to less than 01 seconds even using CPU with mini-mal degradation of prediction performance With a borderlineinterval of [30-70] PROPEDEUTICA achieved an accuracyof 9434 and a false positive rate of 875 with 4145of the samples moved for DEEPMALWARE analysis Thishighlights the potential of PROPEDEUTICA approximately60 of the samples were quickly classified with high accuracyas malware or benign software using an initial triage withfaster Random Forest Only 40 of the samples needed to besubjected to a more expensive analysis using DEEPMALWAREPROPEDEUTICA achieves high prediction performance whilemaking it feasible to use expensive DL algorithms in real-timemalware detection

E Comparison with Other DL Malware Detectors

We compared PROPEDEUTICA with other DL-based mal-ware detectors in the literature leveraging systemAPI calls asfeatures The features used in our work and [23] are Windowssystem calls The features used in [21] and [22] are WindowsAPI calls System calls provide more insights on the softwareat the kernel level compared with API calls collected at theuser level Moreover we evaluated the detection performance

on a much larger dataset with more traces which providesmore accurate evaluation results

In Table III we summarize the obtained results Comparedwith the existing state-of-the-art DL-based malware detectorsour approaches achieve much better detection performance interms of accuracy precision and recall More importantlycurrent literature mainly focuses on offline analysis [21]ndash[23] and lacks analysis of the detection time in real-timeWe evaluated DEEPMALWARE on a much larger dataset withbetter prediction performance and detection time guarantee

V DISCUSSION

We proposed a real-time malware detection frameworkPROPEDEUTICA accomplishing the fast speed of ML andthe high accuracy of emerging DL models Only softwarereceiving borderline classification from an ML detector neededfurther analysis by DEEPMALWARE which saved computa-tional resources shortened the detection time and improvedaccuracy The key contribution of PROPEDEUTICA is the intel-ligent combination of ML with emerging modern DL methodsin an effective real-time detector which can meet the needsof high accuracy low false-positive rate and short detectiontime required to counter the next generation of malware In thissection we discuss several challenges in the current design andpotential directions of improving PROPEDEUTICA in futurework

A Recurrent Loop in PROPEDEUTICA

PROPEDEUTICA can run into a worst-case scenario whichis having a process continuously loop between Random Forest

8

TABLE III Comparison among DEEPMALWARE and other DL-based malware detection in the literature

Work Model Dataset Performance

DMINrsquo16 [21] DL4MD 45000 traces Accuracy 9565 Precision 9546 Recall 958KDDrsquo17 [22] DNN 26078 traces Accuracy 9399 3

LNCSrsquo16 [23] LSTM 4753 traces Accuracy 894 Precision 856 Recall 894PROPEDEUTICA DEEPMALWARE 493095 traces Accuracy 970 Precision 975 Recall 970

and DEEPMALWARE For example consider a process re-ceiving a borderline classification from Random Forest andthen being moved to DEEPMALWARE Then DEEPMALWAREclassifies it as benign and the software would continue beinganalyzed by Random Forest which again provides a borderlineclassification for the process and so on We plan to mitigatesuch recurrent processes by combining the previous predictionof DEEPMALWARE with the prediction of Random Forest inthe first stage

B Adversarial Attacks against Malware Detection

Despite initial successes on malware classification a re-sourceful and motivated adversary can bypass the protectionmechanisms using evasion attacks A plethora of recent studieshave demonstrated that ML and DL models are vulnerable toevasion attacks such as adversarial examples [24] [25] [52][53] By adding a small perturbation to the input features ofmachine learning models adversaries can mislead the detec-tion system to classify a malware as a benign software [54]ndash[61] However most adversarial attacks target at static malwaredetection malware is not executed and machine learningconducts detection on the static code For example Park etalinjected dummy code into malware source code using anobfuscation technique [57] Liu etal manipulated the malwareimages generated from malware binaries using adversarialattacks [56] Only [62] generated adversarial sequences toattack dynamic behavior-based malware detection systemsHowever their victim detection model is much simpler thanour proposed model In our work we propose a frameworkfor dynamic malware detection where malware is investigatedduring execution which makes the adversarial attacks mucheasier to detect since it is challenging to generate a sequenceof systemAPI calls with normal behaviors In the future weplan to thoroughly investigate the robustness of PROPEDEU-TICA against adversarial attacks

C Weakly Supervised Anomaly Detection

In our work we assume that the malware samples arewell annotated and ML and DL models are trained on thesesamples ie fully supervised anomaly detection However themalware samples have been generated and evolved rapidly inrecent years It becomes impractical to analyze and identifyall the malware samples for model training Therefore aweakly-supervised learning paradigm [63] is urgently neededfor malware detection where only a limited number of labeledpositive samples are provided and a large number of positivesamples are not labeled Although unsupervised learning ap-proaches leveraging Autoencoders and generative adversarial

3The performance in terms of precision and recall is not provided in [22]

networks (GANs) can be used in weakly supervised learningby learning features that are robust to small deviations innegative samples [64]ndash[69] the labeled positive samples arenot fully utilized Recently Pang et al proposed DevNet andPRO to learn anomaly scores using deep learning models forbetter use of labeled positive data [70] [71] The proposedframework in PROPEDEUTICA can be further extended withthe capabilities of learning from weakly labeled samples usingthese advanced weakly-supervised approaches

VI RELATED WORK

Our work pertains to fields related to behavior-based mal-ware detection In this section we summarize the state-of-the-art in these areas and highlights topics currently under-studied

A Behavior-based Malware Detection

Dynamic behavior-based malware detection [5] [15]evolved from Forrest et alrsquos seminal work [34] on detectinganomalies using system calls as features Christodorescu etal [5] [6] extract high-level and unique behavior from thedisassembled binary to detect malware and its variants Thedetection is based on predefined templates covering potentiallymalicious behaviors such as mass-mailing and unpackingWillems et al proposed CWSandbox a dynamic analysissystem that monitors malware API calls in a virtual en-vironment [72] Rieck et al [73] leveraged API calls asfeatures to classify malware into families using Support VectorMachines (SVM) and processed system calls into q-gramsrepresentations using an algorithm similar to the k-nearestneighbors (kNN) [14] Mohaisen et al [74] introduced AMALa framework to dynamically analyze and classify malwareusing SVM linear regression classification trees and kNNKolosnjaji et al [75] proposed maximum-a-posteriori (MAP)to classify malware families using Cuckoorsquos Sandbox [76]Although these techniques were successfully applied for mal-ware classification they did not consider benign samples [77]and were limited to labeling unknown malware pertaining toone of the existing clusters

Wressnegger et al [78] proposed Gordon for detectingFlash-based malware Gordon considered execution behaviorfrom benign and malicious Flash applications and generatedn-grams for an SVM model Bayer et al used a modifiedversion of QEMU to monitor API calls and breakpoints [10]This approach was later used to build Anubis [79] Anubisis better suited for offline malware analysis (by providing adetailed execution report) PROPEDEUTICA on the other handfocuses on real-time detection and monitors a more diverseset of system calls Kirat et al introduced BareBox whichhooked system calls directly from SSDT [80] Barebox runsin bare metal and can potentially obtain behavioral traces frommalware equipped with anti-analysis techniques Barebox aims

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

2

the best of ML and DL methods ie speed in predictionfrom the ML methods and accuracy from the DL methodsIn PROPEDEUTICA all software in the system is subjected toML for fast classification If a piece of software receives a bor-derline malware classification probability it is then subjectedto additional analysis with a more performance expensiveand more accurate DL classification The DL classificationis performed via our proposed algorithm DEEPMALWARECompared to existing DL methods [21]ndash[23] DEEPMALWAREcan learn spatially local and long-term features and handlesoftware heterogeneity via the application of both ResNextblocks and recurrent neural networks with multi-stream inputsBy leveraging such multi-stage design PROPEDEUTICA iscapable of providing precise and real-time detection

We evaluated PROPEDEUTICA with a set of 9115 malwaresamples and 1338 common benign software from variouscategories for the Windows OS PROPEDEUTICA leveragedsliding windows of system calls for malware detection Ourproposed deep learning algorithm DEEPMALWARE achieveda 9703 accuracy and a 243 false positive rate Bycombining conventional ML and DL PROPEDEUTICA sub-stantially improves the prediction performance while keepingthe detection time less than 01 seconds which makes DLfeasible for the real-time malware detection

In this paper we present the following contributions (i)PROPEDEUTICA a new framework for efficient and effectivereal-time malware detection for Windows OS (ii) an evalu-ation of PROPEDEUTICA with a collection of 9115 malwareand 1338 benign software and (iii) DEEPMALWARE a novelDL algorithm learning multi-scale spatial-temporal systemcall features with multi-stream inputs that can handle softwareheterogeneity

II METHODOLOGY

A Threat Model

PROPEDEUTICA is designed to provide precise real-timemalware detection ie a high accuracy rate few false-positives and less processing time PROPEDEUTICA is capableof defending against the most prevalent malware threatsand not against directed attacks such as specifically-craftedadvanced and persistent threats Hence we assume that mal-ware will eventually get in if an organization is targeted bya well-motivated attacker Therefore attacks either targetingPROPEDEUTICArsquos software implementation or the machinelearning engine (adversarial machine learning [24] [25]) areoutside of PROPEDEUTICArsquos scope We discuss the adversarialattacks against malware detection in Section V-B

PROPEDEUTICA operation assumes that before it loads thesystem is on a pristine state Thus PROPEDEUTICA will notscan all running processes at startup which limits its checkingprocedure to each new process created

B Overall Design

PROPEDEUTICA (Figure 1) consists of (i) a system callreconstruction module (ii) an ML classifier and (iii) our newlyproposed DEEPMALWARE classifier

The workflow of PROPEDEUTICA is outlined as follows

Fig 1 Architecture and workflow of PROPEDEUTICA formulti-stage malware detection

STEP 1 PROPEDEUTICA leverages a System Call Intercep-tion Driver to intercept and log all system calls invoked inthe system and associate them with the PID of the invokingprocessSTEP 2 The Reconstruction Module reads and parses loggedsystem calls compressing the system calls for input to the twoclassifiersSTEP 3 The classifiers generate sliding windows of systemcall traces with the PID of the invoking processSTEP 4 Conventional ML Classifier introduces a configurableborderline probability interval [lower bound upper bound]which determines the range of classification that is consideredinconclusive for detection For example letrsquos consider that theborderline interval is [30-70] If the ML classifier assignsthe label ldquomalwarerdquo for a piece of software whose classifica-tion range is within the predefined borderline interval PRO-PEDEUTICA considers its classification result inconclusive Ifthe software receives a ldquomalwarerdquo classification probabilityless than 30 PROPEDEUTICA considers that the softwareis not malicious If the classification probability is greaterthan 70 PROPEDEUTICA considers the software maliciousFor the inconclusive case ([30-70] interval) the softwarecontinues to run but is subjected to analysis by DeepMalwareClassifier for a definitive classificationSTEP 5 If the software is classified as benign by DeepMal-ware Classifier it continues running under the monitoring ofthe Conventional ML classifierSTEP 6 If the software is considered malicious PROPEDEU-TICA kills the malicious software

By leveraging the fast speed in ML methods and highaccuracy in DL methods PROPEDEUTICA aims to achieveprecise and real-time detection The main goal of this paperusing PROPEDEUTICA is to present the concept of a multi-stage malware detector that combines two distinct classifica-tion modelsmdashML and DL Therefore any presented imple-mentation should be understood as a proof-of-concept (PoC) toaccomplish such a goal and not as a platform-specific solutionWe intend that this PoC leads to a PROPEDEUTICA versionthat will be implemented in the future by OS vendors to beintegrated into their products Moreover PROPEDEUTICA doesnot rely on a GPU-powered system to speed up its detectionprocedures which broaden the application of PROPEDEUTICA

For the sake of simplicity our PoC was implemented inuser land ie as a user process in a non-privileged ringof execution Thus PROPEDEUTICArsquos trusted computing base

3

includes the OS kernel the learning models running in userland and the hardware stack We modeled PROPEDEUTICArsquosPoC for Microsoft Windows since it is the most targetedOS by malware writers [26] We limited our PoC to the 32-bit Microsoft Windows version which currently is the mostpopular Windows architecture

III IMPLEMENTATION DETAILS

This section discusses implementation details of three mainaspects of PROPEDEUTICArsquos operation the system call in-terception driver the reconstruction module and our newlyproposed DEEPMALWARE algorithm

A The System Call Interception Driver

Intercepting system calls is a key step for PROPEDEUTICAoperation as it allows collecting system call invocation datato be input to ML and DL classifiers Despite many existingtools for process monitoring such as Process Monitor [27]drstrace library [28] Core OS Tool [29] and WinDbgrsquosLogger [30] they suffer from various challenges ProcessMonitor provides only coarse-grained file and registry activitydrstrace works at the application level and can be bypassedby malware WinDbgrsquos Logger starts logging only at theentry point of the execution so system calls executed bythe initialization code in statically imported shared librarieswill not be seen Core OS tools trace coarse-grained eventssuch as interrupts memory events thread creation and termi-nation and etc Therefore we opted to implement our ownsolution to have more flexibility for deploying a real-timedetection mechanism PROPEDEUTICA collection driver wasimplemented for Windows 7 SP1 32-bit The driver hooks intothe System Service Dispatch Table (SSDT) which contains anarray of function pointers to important system service routinesincluding system call routines In Windows 7 32-bit systemthere are 400 entries of system calls [31] of which 155 systemcalls are interposed by our driver We selected a comprehensiveset of system calls to be able to monitor multiple distinctsubsystems which includes network-related system calls (egNtListenPort and NtAlpcConnectPort) file-related system calls(eg NtCreateFile and NtReadFile) memory-related systemcalls (eg NtReadVirtualMemory and NtWriteVirtualMemory)process-related system calls (eg NtTerminateProcess andNtResumeProcess) and other types (eg NtOpenSemaphoreand NtGetPlugPlayEvent)

This system call interception driver is publicly availableat [32] We open sourced our system call interceptor so thatfuture studies can generate real-time software execution tracesand evaluate the performance of their MLDL algorithms inmalware detection Our solution can also be used as a baselinefor comparison in future work

System call logs (timestamp PID syscall) were collectedusing DbgPrint Logger [33] which enables real-time kernelmessage logging to a specified IP address (localhost or remoteIP) thus allowing the logging pool and the PROPEDEUTICA toreside on different hosts for scalability and performance Thedriver monitors all processes added to the Borderline PIDslist which keeps track of processes which received borderlineclassification from the conventional ML classifier The time

to collect system calls varies among different software Thetime depends on the functionalities of the software (benignor malicious) and the types of system calls invoked and thefrequency of system call invocations Hence the time to collectsystem calls is software dependent

B The Reconstruction Module

In PROPEDEUTICA both the ML and the DEEPMALWAREclassifiers use the RECONSTRUCTION MODULE to prepro-cess the input system call sequences and obtain the samepreprocessed data The RECONSTRUCTION MODULE splitssystem calls sequences according to the PIDs of processesinvoking them and parses them into three types of sequentialdata process-wide n-gram sequence process-wide densitysequence and system-wide frequency feature vector Then theRECONSTRUCTION MODULE converts the sequential data intowindows using the sliding window method which is usuallyused to translate a sequential classification problem into aclassical one for every sliding window [34]

Process-wide n-gram sequence and density sequence Weuse the n-gram model widely used in natural language pro-cessing (NLP) applications [35] to compress system callsequences N-gram is defined as a combination of n contiguoussystem calls The n-gram model encodes low-level featureswith simplicity and scalability and compresses the data by re-ducing the length of each sequence with encoded informationThe process activity in a Windows system can be intensivedepending on the workload eg more than 1000 system callsper second resulting in very large sliding windows Thereforefor PROPEDEUTICA we decided to further compress thesystem call sequences and translate them into two-streamsequences n-gram sequences and density sequences n-gramsequence is a list of n-gram units while density sequenceis the corresponding frequency statistics of repeated n-gramunits There are many possible n-gram variants such as n-tuplen-bag and other non-trivial and hierarchical combinations(eg tuples of bags bags of bags and bags of grams) [9]PROPEDEUTICA uses 2-gram because (i) compared with n-bag and n-tuple the n-gram model is considered the mostappropriate for malware system call-based classification [9]and (ii) the embedding layer and the first few convolutionalneural layers can automatically extract high-level featuresfrom neighbor grams which can be seen as hierarchicalcombinations of multiple n-grams Once n-gram sequencesfill up a sliding window the RECONSTRUCTION MODULEdelivers the window of sequences to ML classifier and then DLclassifier (if in the inconclusive case) redirects the incomingsystem calls to the new n-gram sequences

System-wide frequency feature vector Our learning modelsleverage system calls as features from all processes runningin the system This holistic opposite to process-specific ap-proach is more effective for malware detection compared tocurrent approaches because modern malware can be multi-threaded [36] [37] System-wide information helps the modelslearn the interactions among all processes in the systemTo gain whole system information the RECONSTRUCTIONMODULE collects the frequency of different types of n-grams

4

from all processes during the sliding window and extractsthem as a frequency feature vector Each element of the vectorrepresents the frequency of an n-gram unit in the system callsequence

C DEEPMALWARE Classifier

PROPEDEUTICA aims to address spatial-temporal dynam-ics and software execution heterogeneity Spatially advancedmalware leverages multiple processes to work together toachieve a long-term common goal Temporally a maliciousprocess may demonstrate different types of behaviors (benignor malicious) during the lifetime of execution such as keepdormant or benign at the beginning of the execution Tothoroughly consider available behavior data in space and timewe introduced a novel DL model DEEPMALWARE First tocapture the spatial connections between multiple processes wefeed both process-wide features and system-wide features intoDEEPMALWARE The process-wide and system-wide inputsprovide detailed information about how the target software(malware or benign software) interacts with the rest softwarein the system Second to capture the temporal features wefirst introduced the bi-directional LSTM [38] to capture theconnection of n-grams However due to the gradient vanishingproblem of LSTM in processing long sequences we furtherintroduce multi-scale convolutional neural networks to extracthigh-level representation of n-gram system calls so as toreduce the input length of LSTM and avoid blind spotsTo facilitate harmonious coordination DEEPMALWARE stacksspatial and temporal models for joint training

System call sequences are taken as input for all processessubjected to DEEPMALWARE analysis (see Figure 2 for aworkflow of the classification approach) DEEPMALWAREleverages n-gram sequences of processes and frequency featurevectors of the system First two streams (process-wide n-gram sequence and density sequence) model the sequence anddensity of n-gram system calls of the process The third streamrepresents the global frequency feature vector of the wholesystem DEEPMALWARE consists of four main components(i) N-gram Embedding (ii) ResNext blocks (iii) Long Short-Term Memory (LSTM) Layers and (iv) Fully ConnectedLayers (Figure 3)

N-gram Embedding DEEPMALWARE adopts an encodingscheme called N-gram Embedding which converts sparse n-gram units into a dense representation DEEPMALWARE treatsn-gram units as discrete atomic inputs N-gram Embeddinghelps to understand the relationship between functionallycorrelated system calls and provides meaningful informationof each n-gram to the learning system Unlike vector rep-resentations such as one-hot vectors (which may cause datasparsity) N-gram Embedding mitigates sparsity and reducesthe dimension of input n-gram units The embedding layermaps sparse system call sequences to a dense vector Thislayer also helps to extract semantic knowledge from low-levelfeatures (system call sequences) and largely reduces featuredimension The embedding layer uses 256 neurons whichreduce the dimension of the n-gram model (number of uniquen-grams in the evaluation) from 3526 to 256

ResNext blocks DEEPMALWARE uses six ResNext blocksto extract different sizes of n-gram system calls in the slidingwindow Conventional sliding window methods leverage smallwindows of system call sequences and therefore experiencechallenges when modeling long sequences These conventionalmethods represent features in a rather simple way (eg onlycounting the total number of system calls in the windowswithout local information of each system calls) which maybe inadequate for a classification task The design of ResNextfollows the implementation in [39] Each ResNext blockaggregates multiple sets of convolutional neural networks withthe same topology ResNext blocks extract different n-grams toavoid blind spots in detection Residual blocks perform shortcuts between blocks to improve the training of deep layersBatch normalization is applied to speed up the training processafter convolutional layers and a non-linear activation functionReLU to avoid saturation

Long Short-Term Memory (LSTM) Layers The internaldependencies between system calls include meaningful contextinformation or unknown patterns for identifying malicious be-haviors To leverage this information DEEPMALWARE adoptsone of the recurrent architectures LSTM [38] for its strongcapability of learning meaningful temporalsequential patternsin a sequence and reflecting internal state changes modulatedby the recurrent weights LSTM networks can avoid gradientvanishing and learn long-term dependencies by using forgetgates LSTM layers gather information from the first twostreams process-wide n-gram sequence and density sequence

Fully Connected Layers DEEPMALWARE deploys a fullyconnected layer is deployed to encode system-wide frequencyThen it is concatenated with the output of bi-directionalLSTMs to gather both sequence-based process informationand frequency-based system-wide information The last fully-connected layer with the softmax function outputs the predic-tion with probabilities

Moreover we adopt three popular techniques to avoid over-training in DEEPMALWARE 1) We add a Dropout layer [40]with a dropout rate of 05 following the fully connected layer2) We include batch normalization [41] in the ResNext blocksafter the convolutional layers 3) We use 20 training data asa validation set and apply early stopping when the loss on thevalidation set does not decrease

D Conventional ML Classifier

We consider three pervasively used conventional ML al-gorithms Random Forest (RF) eXtreme Gradient Boosting(XGBoost) [42] and Adaptive Boosting (AdaBoost) [43]Random Forest and boosted trees have been considered as thebest supervised models on high-dimensional data [44] We useAdaBoost and XGBoost as representatives of boosted trees

To select the best features used in machine learning classi-fiers we consider both efficacy and effectiveness In PROPE-DEUTICA we take the frequency of n-gram units in a sequenceas input features and estimate the probabilities of a sequenceof system calls invoked by a malware Specifically a featurevector will be created for each sequence and each elementin the vector represents the frequency of the corresponding

5

Fig 2 Workflow of DEEPMALWARE classification approach

Fig 3 DEEPMALWARE architecture

n-gram unit We discuss the details of ML classifiers inSection IV

Notice that ML classifiers in PROPEDEUTICA can be gen-eralized to other ML algorithms when ML classifiers canoutput classification probabilities for borderline classification(ie probabilistic ML models) For Non-probabilistic MLclassifiers we will adopt a conversion function to provideclass probabilities For example the classification scores pro-vided by Support Vector Machine can be mapped into theclass probabilities via a logistic transformation with trainedparameters [45] K Nearest Neighbor (kNN) methods canbe extended with probabilistic methods for estimating theposterior probabilities using distances from neighbors [46]The classification probabilities of ML classifiers are used forthe borderline classification (Fig 1 Step 4) It is worth to notethat PROPEDEUTICA is designed to integrate with any machinelearning and deep learning methods in malware detection withminimal efforts Therefore we select commonly used andrepresentative algorithms to show PROPEDEUTICArsquos capabilityin plugging in these algorithms

IV EVALUATION

The main goal of our evaluation is to determine PROPEDEU-TICArsquos effectiveness for real-time malware detection Morespecifically we sought to discover to what extent PROPEDEU-TICA outperforms ML on prediction performance and DL onclassification time

We first describe the dataset used in our evaluation Nextwe evaluate the performance of our newly proposed DEEP-MALWARE algorithm in comparison with the most relevantML classifiers from the literature Then we present the resultsof our experiments on the proposed PROPEDEUTICA In ourevaluation we compare the performance of classifiers whenoperating on a CPU and a GPU GPUs not only reducethe training time for DL models but also help these models

achieve classification time several orders of magnitude lessthan conventional CPUs in the deployment

We performed our test on a server running Ubuntu 1204with 2 CPUs 4GB Memory 60GB Disk DEEPMALWAREwas implemented using Pytorch [47] The training and testingprocedures ran on Nvidia Tesla M40 GPUs

A Software Collection Method

For the malicious part the dataset consisted of 9115Microsoft Windows PE32 format malware samples collectedbetween 2014 and 2018 from threats detected by the incidentresponse team of a major financial institution (who chose toremain anonymous) in the corporate email and Internet accesslinks of the institutionrsquos employees and 7 APTs collectedfrom Rekings [48] Figure 4 shows the categorization of ourmalware samples using AVClass [49]2 The goal of malwarecollection is to be as broad as possible The distribution of ourcollected malware family (Figure 4) correlates to the distribu-tion of malware detected by AV companies For the benignpart our dataset was composed of 1338 samples including866 Windows benchmark software 50 system software 11commonly used GUI-based software and 409 other benignsoftware GUI-based software (eg Chrome Microsoft OfficeVideo Player) had keyboard and mouse operations simulatedusing WinAutomation [50]

B System Call Dataset

For each experiment we run malware and benign softwareand collect system-wide system calls for five minutes Theexperiments were carried out under three different scenar-ios (1) running one general malware one benchmarkGUI-basedother benign software and dozens of system software(2) running two general malware several benchmarkGUI-basedother benign software and dozens of system software(3) running one APT one benchmarkGUI-basedother benignsoftware and dozens of system software

We randomly sampled half of the malware and benign soft-ware for training and the other half for testing to avoid over-fitting [51] We collected 493095 traces in total During therunning of one malware sample multiple benign processes (es-pecially system processes) would run concurrently Thereforethe number of benign process executions we collected wouldbe much larger than that of malicious process executionsOur classifier would end up handling imbalanced datasets

2AVClass is an automatic vendor-agnostic malware labeling tool It pro-vides a fine-grained family name rather than a generic label (eg TrojanVirus etc) which contains scarce meaning for the malware sample

6

Fig 4 Malware families in our dataset classified by AVClass

which could adversely affect the training procedure becausethe classifier would be prone to learn from the majority classHence we under-sampled the dataset by reducing the numberof sliding windows from benign processes to the same asmalware processes so that the ML and DL detectors can learnfrom the same number of positive and negative samples

Window size and stride are two important hyper-parametersin sliding window methods indicating the total and the shiftingnumber of n-gram system calls in each process In real-timedetection large window sizes need more time to fill up thesliding window provide more context for DL models andtherefore achieve higher accuracy We aim to choose the propertwo hyper-parameters to ensure malware are detected accu-rately and in time (before completing its malicious behavior)In our experiments we observed that malware can successfullyinfect the systems in one to two seconds and 1000 systemcalls can be invoked in one second Therefore we selected andcompared three window size and stride pairs (100 50) (200100) (500 250) The results showed that with pair (500 250)DEEPMALWARE needs 1 to 2 seconds to fill up the windowand analyze With pair (200 100) DEEPMALWARE needs 05to 1 second While with pair (100 50) DEEPMALWARE needsonly 01 to 05 seconds The F1-scores for DEEPMALWAREare 9702 9508 and 9497 respectively

Although the detection performance is increased with theincrease of window sizes large window sizes indicate longerwait time for the detector to allow the software to be executedand fill up the windows During this long wait time themalware are more likely to conduct malicious behaviorsTherefore a proper window size needs to be carefully selectedso as to balance the prediction performance and the waittime Since the performance gain with respect to the increasedwindow size is marginal (around 2 increase in terms of F1score from 500 to 100) while the waiting time can be reducedto 10-20 we set the window size and stride as 100 and50 respectively for the following evaluation In practice thevalue of window sizes can be altered in our PROPEDEUTICAframework to meet the required detection performance (ega required accuracy or an acceptable FP rate)

C Performance Analysis of standalone ML and DL classifiers

We started by comparing the performance of DEEPMAL-WARE with relevant ML and DL classifiers Our metrics formodel performance were accuracy precision recall F1 scoreand false positive (FP) rate

Table I shows the detection performance and executiontime of using a single ML or DL model DEEPMALWAREachieves the best performance among all the models Withwindow size 500 and stride 250 DEEPMALWARE achievedan accuracy of 9703 and a false positive rate of 243 Theperformance of DEEPMALWARE increased with the increase inwindow size and stride Random Forest performs the secondbest with an accuracy of 8905 at window size 100 andstride 50 and an accuracy of 9394 at window size 500and stride 250 approximately 10 better than AdaBoostand 5 better than XGBoost Random Forest also achievesthe fastest speed among all the machine learning methodsWhile DEEPMALWARE model is 309 more accurate thanRandom Forest its classification time is approximately 100times slower on CPU and 3 to 5 times slower on GPU

We did not apply GPU devices to ML models because theirclassification time with conventional CPUs is much smaller(at least one order of magnitude) than those measured for DLalgorithms We denote lsquoNArsquo as not applicable in Table I Inreal life GPU devices especially specific DL GPUs for accel-erating DL trainingtesting are still not sufficiently widespreadin end-user or corporate devices In addition the preprocessingtime of Reconstruction Module is around 1098 seconds perrun or 00003 seconds per window (when using 100 windowsize) which is almost negligible compared with ML (00088seconds per window) or DL execution time (00383 secondsper window using GPU or 14104 seconds per window usingCPU)

We observed that malware could successfully infect thetarget system in one to two seconds and 1000 system callscan be invoked in one second Moreover traces from oneprocess may take a significant amount of time to fill upone sliding window because the process might be invokingsystem calls at a low rate Thus in the remainder of thissection we chose to compare all the algorithms using windowsizes of 100 and strides of 50 system calls For all windowsizes and strides the best ML model was Random ForestTherefore for PROPEDEUTICA we chose Random Forest forML classification and DEEPMALWARE for DL classification

In addition we evaluated PROPEDEUTICArsquos performanceon seven APTs with various borderline intervals PROPEDEU-TICA successfully detected all the APTs and six of themwere subjected to DEEPMALWARE classification For differentborderline intervals the false positive rate was approximately10 Random Forest in isolation detected one APT with a525 false positive rate

D Performance Analysis of PROPEDEUTICA combining MLand DL classifiers

Next we evaluated PROPEDEUTICArsquos performance for dif-ferent borderline intervals Based on the results from Table Iwe chose Random Forest as ML classifier and DEEPMAL-WARE as DL classifier with window size 100 and stride50 PROPEDEUTICA is evaluated based on various borderlineintervals As discussed before in PROPEDEUTICA if a pieceof software receives a malware classification probability fromRandom Forest smaller than the lower bound it is considered

7

TABLE I Detection performance of standalone ML or DL classifiers DEEPMALWARE achieves the best performance amongall the models Random Forest is approximately 8 more accurate and much faster than the other machine learning models

Models Window Size Stride Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)

AdaBoost

100 50

7925 7811 8131 7967 2280 NA 00187Random Forest 8905 8463 9544 8971 1735 NA 00089XGBoost 8478 9099 7722 8354 766 NA 00116DEEPMALWARE 9484 9263 9743 9497 776 00383 14104AdaBoost

200 100

7827 7284 9019 8059 3364 NA 00132Random Forest 9349 8999 9790 9377 1091 NA 00063XGBoost 7081 6365 9708 7689 7081 NA 00073DEEPMALWARE 9496 9305 9720 9508 727 00543 13784AdaBoost

500 250

8181 7944 8586 8252 2224 NA 00108Random Forest 9394 9716 9054 9373 265 NA 00048XGBoost 7919 9414 6227 7495 388 NA 00062DEEPMALWARE 9703 9754 9650 9702 243 00797 13344

TABLE II Comparison on different borderline policies for the PROPEDEUTICA combining ML and DL classifiers Borderlineintervals are described with a lower bound and an upper bound The move percentage represents the percentage of software inthe system that received a borderline classification with Random Forest (according to the borderline interval) and was subjectedto further analysis with DEEPMALWARE

LowerBound

UpperBound

Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)Move

Percentage

10 90 9471 9241 9743 9485 800 00223 0381 559520 80 9461 9223 9743 9476 821 00173 01543 483230 70 9434 9177 9743 9451 875 00146 00884 414540 60 9372 9079 9737 9396 992 00123 00497 3496

benign software If the probability is greater than the upperbound it is considered malicious However if the classificationfalls within the borderline interval the software is subjectedto DEEPMALWARE Table II shows the performance of PRO-PEDEUTICA using various (configurable) borderline intervalswhen combining ML and DL methods We chose lower boundas 10 20 30 and 40 upper bound as 60 70 80and 90 We observe that when a small borderline interval isconfigured (eg 30-70 40-60) the detection time isreduced to less than 01 seconds even using CPU with mini-mal degradation of prediction performance With a borderlineinterval of [30-70] PROPEDEUTICA achieved an accuracyof 9434 and a false positive rate of 875 with 4145of the samples moved for DEEPMALWARE analysis Thishighlights the potential of PROPEDEUTICA approximately60 of the samples were quickly classified with high accuracyas malware or benign software using an initial triage withfaster Random Forest Only 40 of the samples needed to besubjected to a more expensive analysis using DEEPMALWAREPROPEDEUTICA achieves high prediction performance whilemaking it feasible to use expensive DL algorithms in real-timemalware detection

E Comparison with Other DL Malware Detectors

We compared PROPEDEUTICA with other DL-based mal-ware detectors in the literature leveraging systemAPI calls asfeatures The features used in our work and [23] are Windowssystem calls The features used in [21] and [22] are WindowsAPI calls System calls provide more insights on the softwareat the kernel level compared with API calls collected at theuser level Moreover we evaluated the detection performance

on a much larger dataset with more traces which providesmore accurate evaluation results

In Table III we summarize the obtained results Comparedwith the existing state-of-the-art DL-based malware detectorsour approaches achieve much better detection performance interms of accuracy precision and recall More importantlycurrent literature mainly focuses on offline analysis [21]ndash[23] and lacks analysis of the detection time in real-timeWe evaluated DEEPMALWARE on a much larger dataset withbetter prediction performance and detection time guarantee

V DISCUSSION

We proposed a real-time malware detection frameworkPROPEDEUTICA accomplishing the fast speed of ML andthe high accuracy of emerging DL models Only softwarereceiving borderline classification from an ML detector neededfurther analysis by DEEPMALWARE which saved computa-tional resources shortened the detection time and improvedaccuracy The key contribution of PROPEDEUTICA is the intel-ligent combination of ML with emerging modern DL methodsin an effective real-time detector which can meet the needsof high accuracy low false-positive rate and short detectiontime required to counter the next generation of malware In thissection we discuss several challenges in the current design andpotential directions of improving PROPEDEUTICA in futurework

A Recurrent Loop in PROPEDEUTICA

PROPEDEUTICA can run into a worst-case scenario whichis having a process continuously loop between Random Forest

8

TABLE III Comparison among DEEPMALWARE and other DL-based malware detection in the literature

Work Model Dataset Performance

DMINrsquo16 [21] DL4MD 45000 traces Accuracy 9565 Precision 9546 Recall 958KDDrsquo17 [22] DNN 26078 traces Accuracy 9399 3

LNCSrsquo16 [23] LSTM 4753 traces Accuracy 894 Precision 856 Recall 894PROPEDEUTICA DEEPMALWARE 493095 traces Accuracy 970 Precision 975 Recall 970

and DEEPMALWARE For example consider a process re-ceiving a borderline classification from Random Forest andthen being moved to DEEPMALWARE Then DEEPMALWAREclassifies it as benign and the software would continue beinganalyzed by Random Forest which again provides a borderlineclassification for the process and so on We plan to mitigatesuch recurrent processes by combining the previous predictionof DEEPMALWARE with the prediction of Random Forest inthe first stage

B Adversarial Attacks against Malware Detection

Despite initial successes on malware classification a re-sourceful and motivated adversary can bypass the protectionmechanisms using evasion attacks A plethora of recent studieshave demonstrated that ML and DL models are vulnerable toevasion attacks such as adversarial examples [24] [25] [52][53] By adding a small perturbation to the input features ofmachine learning models adversaries can mislead the detec-tion system to classify a malware as a benign software [54]ndash[61] However most adversarial attacks target at static malwaredetection malware is not executed and machine learningconducts detection on the static code For example Park etalinjected dummy code into malware source code using anobfuscation technique [57] Liu etal manipulated the malwareimages generated from malware binaries using adversarialattacks [56] Only [62] generated adversarial sequences toattack dynamic behavior-based malware detection systemsHowever their victim detection model is much simpler thanour proposed model In our work we propose a frameworkfor dynamic malware detection where malware is investigatedduring execution which makes the adversarial attacks mucheasier to detect since it is challenging to generate a sequenceof systemAPI calls with normal behaviors In the future weplan to thoroughly investigate the robustness of PROPEDEU-TICA against adversarial attacks

C Weakly Supervised Anomaly Detection

In our work we assume that the malware samples arewell annotated and ML and DL models are trained on thesesamples ie fully supervised anomaly detection However themalware samples have been generated and evolved rapidly inrecent years It becomes impractical to analyze and identifyall the malware samples for model training Therefore aweakly-supervised learning paradigm [63] is urgently neededfor malware detection where only a limited number of labeledpositive samples are provided and a large number of positivesamples are not labeled Although unsupervised learning ap-proaches leveraging Autoencoders and generative adversarial

3The performance in terms of precision and recall is not provided in [22]

networks (GANs) can be used in weakly supervised learningby learning features that are robust to small deviations innegative samples [64]ndash[69] the labeled positive samples arenot fully utilized Recently Pang et al proposed DevNet andPRO to learn anomaly scores using deep learning models forbetter use of labeled positive data [70] [71] The proposedframework in PROPEDEUTICA can be further extended withthe capabilities of learning from weakly labeled samples usingthese advanced weakly-supervised approaches

VI RELATED WORK

Our work pertains to fields related to behavior-based mal-ware detection In this section we summarize the state-of-the-art in these areas and highlights topics currently under-studied

A Behavior-based Malware Detection

Dynamic behavior-based malware detection [5] [15]evolved from Forrest et alrsquos seminal work [34] on detectinganomalies using system calls as features Christodorescu etal [5] [6] extract high-level and unique behavior from thedisassembled binary to detect malware and its variants Thedetection is based on predefined templates covering potentiallymalicious behaviors such as mass-mailing and unpackingWillems et al proposed CWSandbox a dynamic analysissystem that monitors malware API calls in a virtual en-vironment [72] Rieck et al [73] leveraged API calls asfeatures to classify malware into families using Support VectorMachines (SVM) and processed system calls into q-gramsrepresentations using an algorithm similar to the k-nearestneighbors (kNN) [14] Mohaisen et al [74] introduced AMALa framework to dynamically analyze and classify malwareusing SVM linear regression classification trees and kNNKolosnjaji et al [75] proposed maximum-a-posteriori (MAP)to classify malware families using Cuckoorsquos Sandbox [76]Although these techniques were successfully applied for mal-ware classification they did not consider benign samples [77]and were limited to labeling unknown malware pertaining toone of the existing clusters

Wressnegger et al [78] proposed Gordon for detectingFlash-based malware Gordon considered execution behaviorfrom benign and malicious Flash applications and generatedn-grams for an SVM model Bayer et al used a modifiedversion of QEMU to monitor API calls and breakpoints [10]This approach was later used to build Anubis [79] Anubisis better suited for offline malware analysis (by providing adetailed execution report) PROPEDEUTICA on the other handfocuses on real-time detection and monitors a more diverseset of system calls Kirat et al introduced BareBox whichhooked system calls directly from SSDT [80] Barebox runsin bare metal and can potentially obtain behavioral traces frommalware equipped with anti-analysis techniques Barebox aims

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

3

includes the OS kernel the learning models running in userland and the hardware stack We modeled PROPEDEUTICArsquosPoC for Microsoft Windows since it is the most targetedOS by malware writers [26] We limited our PoC to the 32-bit Microsoft Windows version which currently is the mostpopular Windows architecture

III IMPLEMENTATION DETAILS

This section discusses implementation details of three mainaspects of PROPEDEUTICArsquos operation the system call in-terception driver the reconstruction module and our newlyproposed DEEPMALWARE algorithm

A The System Call Interception Driver

Intercepting system calls is a key step for PROPEDEUTICAoperation as it allows collecting system call invocation datato be input to ML and DL classifiers Despite many existingtools for process monitoring such as Process Monitor [27]drstrace library [28] Core OS Tool [29] and WinDbgrsquosLogger [30] they suffer from various challenges ProcessMonitor provides only coarse-grained file and registry activitydrstrace works at the application level and can be bypassedby malware WinDbgrsquos Logger starts logging only at theentry point of the execution so system calls executed bythe initialization code in statically imported shared librarieswill not be seen Core OS tools trace coarse-grained eventssuch as interrupts memory events thread creation and termi-nation and etc Therefore we opted to implement our ownsolution to have more flexibility for deploying a real-timedetection mechanism PROPEDEUTICA collection driver wasimplemented for Windows 7 SP1 32-bit The driver hooks intothe System Service Dispatch Table (SSDT) which contains anarray of function pointers to important system service routinesincluding system call routines In Windows 7 32-bit systemthere are 400 entries of system calls [31] of which 155 systemcalls are interposed by our driver We selected a comprehensiveset of system calls to be able to monitor multiple distinctsubsystems which includes network-related system calls (egNtListenPort and NtAlpcConnectPort) file-related system calls(eg NtCreateFile and NtReadFile) memory-related systemcalls (eg NtReadVirtualMemory and NtWriteVirtualMemory)process-related system calls (eg NtTerminateProcess andNtResumeProcess) and other types (eg NtOpenSemaphoreand NtGetPlugPlayEvent)

This system call interception driver is publicly availableat [32] We open sourced our system call interceptor so thatfuture studies can generate real-time software execution tracesand evaluate the performance of their MLDL algorithms inmalware detection Our solution can also be used as a baselinefor comparison in future work

System call logs (timestamp PID syscall) were collectedusing DbgPrint Logger [33] which enables real-time kernelmessage logging to a specified IP address (localhost or remoteIP) thus allowing the logging pool and the PROPEDEUTICA toreside on different hosts for scalability and performance Thedriver monitors all processes added to the Borderline PIDslist which keeps track of processes which received borderlineclassification from the conventional ML classifier The time

to collect system calls varies among different software Thetime depends on the functionalities of the software (benignor malicious) and the types of system calls invoked and thefrequency of system call invocations Hence the time to collectsystem calls is software dependent

B The Reconstruction Module

In PROPEDEUTICA both the ML and the DEEPMALWAREclassifiers use the RECONSTRUCTION MODULE to prepro-cess the input system call sequences and obtain the samepreprocessed data The RECONSTRUCTION MODULE splitssystem calls sequences according to the PIDs of processesinvoking them and parses them into three types of sequentialdata process-wide n-gram sequence process-wide densitysequence and system-wide frequency feature vector Then theRECONSTRUCTION MODULE converts the sequential data intowindows using the sliding window method which is usuallyused to translate a sequential classification problem into aclassical one for every sliding window [34]

Process-wide n-gram sequence and density sequence Weuse the n-gram model widely used in natural language pro-cessing (NLP) applications [35] to compress system callsequences N-gram is defined as a combination of n contiguoussystem calls The n-gram model encodes low-level featureswith simplicity and scalability and compresses the data by re-ducing the length of each sequence with encoded informationThe process activity in a Windows system can be intensivedepending on the workload eg more than 1000 system callsper second resulting in very large sliding windows Thereforefor PROPEDEUTICA we decided to further compress thesystem call sequences and translate them into two-streamsequences n-gram sequences and density sequences n-gramsequence is a list of n-gram units while density sequenceis the corresponding frequency statistics of repeated n-gramunits There are many possible n-gram variants such as n-tuplen-bag and other non-trivial and hierarchical combinations(eg tuples of bags bags of bags and bags of grams) [9]PROPEDEUTICA uses 2-gram because (i) compared with n-bag and n-tuple the n-gram model is considered the mostappropriate for malware system call-based classification [9]and (ii) the embedding layer and the first few convolutionalneural layers can automatically extract high-level featuresfrom neighbor grams which can be seen as hierarchicalcombinations of multiple n-grams Once n-gram sequencesfill up a sliding window the RECONSTRUCTION MODULEdelivers the window of sequences to ML classifier and then DLclassifier (if in the inconclusive case) redirects the incomingsystem calls to the new n-gram sequences

System-wide frequency feature vector Our learning modelsleverage system calls as features from all processes runningin the system This holistic opposite to process-specific ap-proach is more effective for malware detection compared tocurrent approaches because modern malware can be multi-threaded [36] [37] System-wide information helps the modelslearn the interactions among all processes in the systemTo gain whole system information the RECONSTRUCTIONMODULE collects the frequency of different types of n-grams

4

from all processes during the sliding window and extractsthem as a frequency feature vector Each element of the vectorrepresents the frequency of an n-gram unit in the system callsequence

C DEEPMALWARE Classifier

PROPEDEUTICA aims to address spatial-temporal dynam-ics and software execution heterogeneity Spatially advancedmalware leverages multiple processes to work together toachieve a long-term common goal Temporally a maliciousprocess may demonstrate different types of behaviors (benignor malicious) during the lifetime of execution such as keepdormant or benign at the beginning of the execution Tothoroughly consider available behavior data in space and timewe introduced a novel DL model DEEPMALWARE First tocapture the spatial connections between multiple processes wefeed both process-wide features and system-wide features intoDEEPMALWARE The process-wide and system-wide inputsprovide detailed information about how the target software(malware or benign software) interacts with the rest softwarein the system Second to capture the temporal features wefirst introduced the bi-directional LSTM [38] to capture theconnection of n-grams However due to the gradient vanishingproblem of LSTM in processing long sequences we furtherintroduce multi-scale convolutional neural networks to extracthigh-level representation of n-gram system calls so as toreduce the input length of LSTM and avoid blind spotsTo facilitate harmonious coordination DEEPMALWARE stacksspatial and temporal models for joint training

System call sequences are taken as input for all processessubjected to DEEPMALWARE analysis (see Figure 2 for aworkflow of the classification approach) DEEPMALWAREleverages n-gram sequences of processes and frequency featurevectors of the system First two streams (process-wide n-gram sequence and density sequence) model the sequence anddensity of n-gram system calls of the process The third streamrepresents the global frequency feature vector of the wholesystem DEEPMALWARE consists of four main components(i) N-gram Embedding (ii) ResNext blocks (iii) Long Short-Term Memory (LSTM) Layers and (iv) Fully ConnectedLayers (Figure 3)

N-gram Embedding DEEPMALWARE adopts an encodingscheme called N-gram Embedding which converts sparse n-gram units into a dense representation DEEPMALWARE treatsn-gram units as discrete atomic inputs N-gram Embeddinghelps to understand the relationship between functionallycorrelated system calls and provides meaningful informationof each n-gram to the learning system Unlike vector rep-resentations such as one-hot vectors (which may cause datasparsity) N-gram Embedding mitigates sparsity and reducesthe dimension of input n-gram units The embedding layermaps sparse system call sequences to a dense vector Thislayer also helps to extract semantic knowledge from low-levelfeatures (system call sequences) and largely reduces featuredimension The embedding layer uses 256 neurons whichreduce the dimension of the n-gram model (number of uniquen-grams in the evaluation) from 3526 to 256

ResNext blocks DEEPMALWARE uses six ResNext blocksto extract different sizes of n-gram system calls in the slidingwindow Conventional sliding window methods leverage smallwindows of system call sequences and therefore experiencechallenges when modeling long sequences These conventionalmethods represent features in a rather simple way (eg onlycounting the total number of system calls in the windowswithout local information of each system calls) which maybe inadequate for a classification task The design of ResNextfollows the implementation in [39] Each ResNext blockaggregates multiple sets of convolutional neural networks withthe same topology ResNext blocks extract different n-grams toavoid blind spots in detection Residual blocks perform shortcuts between blocks to improve the training of deep layersBatch normalization is applied to speed up the training processafter convolutional layers and a non-linear activation functionReLU to avoid saturation

Long Short-Term Memory (LSTM) Layers The internaldependencies between system calls include meaningful contextinformation or unknown patterns for identifying malicious be-haviors To leverage this information DEEPMALWARE adoptsone of the recurrent architectures LSTM [38] for its strongcapability of learning meaningful temporalsequential patternsin a sequence and reflecting internal state changes modulatedby the recurrent weights LSTM networks can avoid gradientvanishing and learn long-term dependencies by using forgetgates LSTM layers gather information from the first twostreams process-wide n-gram sequence and density sequence

Fully Connected Layers DEEPMALWARE deploys a fullyconnected layer is deployed to encode system-wide frequencyThen it is concatenated with the output of bi-directionalLSTMs to gather both sequence-based process informationand frequency-based system-wide information The last fully-connected layer with the softmax function outputs the predic-tion with probabilities

Moreover we adopt three popular techniques to avoid over-training in DEEPMALWARE 1) We add a Dropout layer [40]with a dropout rate of 05 following the fully connected layer2) We include batch normalization [41] in the ResNext blocksafter the convolutional layers 3) We use 20 training data asa validation set and apply early stopping when the loss on thevalidation set does not decrease

D Conventional ML Classifier

We consider three pervasively used conventional ML al-gorithms Random Forest (RF) eXtreme Gradient Boosting(XGBoost) [42] and Adaptive Boosting (AdaBoost) [43]Random Forest and boosted trees have been considered as thebest supervised models on high-dimensional data [44] We useAdaBoost and XGBoost as representatives of boosted trees

To select the best features used in machine learning classi-fiers we consider both efficacy and effectiveness In PROPE-DEUTICA we take the frequency of n-gram units in a sequenceas input features and estimate the probabilities of a sequenceof system calls invoked by a malware Specifically a featurevector will be created for each sequence and each elementin the vector represents the frequency of the corresponding

5

Fig 2 Workflow of DEEPMALWARE classification approach

Fig 3 DEEPMALWARE architecture

n-gram unit We discuss the details of ML classifiers inSection IV

Notice that ML classifiers in PROPEDEUTICA can be gen-eralized to other ML algorithms when ML classifiers canoutput classification probabilities for borderline classification(ie probabilistic ML models) For Non-probabilistic MLclassifiers we will adopt a conversion function to provideclass probabilities For example the classification scores pro-vided by Support Vector Machine can be mapped into theclass probabilities via a logistic transformation with trainedparameters [45] K Nearest Neighbor (kNN) methods canbe extended with probabilistic methods for estimating theposterior probabilities using distances from neighbors [46]The classification probabilities of ML classifiers are used forthe borderline classification (Fig 1 Step 4) It is worth to notethat PROPEDEUTICA is designed to integrate with any machinelearning and deep learning methods in malware detection withminimal efforts Therefore we select commonly used andrepresentative algorithms to show PROPEDEUTICArsquos capabilityin plugging in these algorithms

IV EVALUATION

The main goal of our evaluation is to determine PROPEDEU-TICArsquos effectiveness for real-time malware detection Morespecifically we sought to discover to what extent PROPEDEU-TICA outperforms ML on prediction performance and DL onclassification time

We first describe the dataset used in our evaluation Nextwe evaluate the performance of our newly proposed DEEP-MALWARE algorithm in comparison with the most relevantML classifiers from the literature Then we present the resultsof our experiments on the proposed PROPEDEUTICA In ourevaluation we compare the performance of classifiers whenoperating on a CPU and a GPU GPUs not only reducethe training time for DL models but also help these models

achieve classification time several orders of magnitude lessthan conventional CPUs in the deployment

We performed our test on a server running Ubuntu 1204with 2 CPUs 4GB Memory 60GB Disk DEEPMALWAREwas implemented using Pytorch [47] The training and testingprocedures ran on Nvidia Tesla M40 GPUs

A Software Collection Method

For the malicious part the dataset consisted of 9115Microsoft Windows PE32 format malware samples collectedbetween 2014 and 2018 from threats detected by the incidentresponse team of a major financial institution (who chose toremain anonymous) in the corporate email and Internet accesslinks of the institutionrsquos employees and 7 APTs collectedfrom Rekings [48] Figure 4 shows the categorization of ourmalware samples using AVClass [49]2 The goal of malwarecollection is to be as broad as possible The distribution of ourcollected malware family (Figure 4) correlates to the distribu-tion of malware detected by AV companies For the benignpart our dataset was composed of 1338 samples including866 Windows benchmark software 50 system software 11commonly used GUI-based software and 409 other benignsoftware GUI-based software (eg Chrome Microsoft OfficeVideo Player) had keyboard and mouse operations simulatedusing WinAutomation [50]

B System Call Dataset

For each experiment we run malware and benign softwareand collect system-wide system calls for five minutes Theexperiments were carried out under three different scenar-ios (1) running one general malware one benchmarkGUI-basedother benign software and dozens of system software(2) running two general malware several benchmarkGUI-basedother benign software and dozens of system software(3) running one APT one benchmarkGUI-basedother benignsoftware and dozens of system software

We randomly sampled half of the malware and benign soft-ware for training and the other half for testing to avoid over-fitting [51] We collected 493095 traces in total During therunning of one malware sample multiple benign processes (es-pecially system processes) would run concurrently Thereforethe number of benign process executions we collected wouldbe much larger than that of malicious process executionsOur classifier would end up handling imbalanced datasets

2AVClass is an automatic vendor-agnostic malware labeling tool It pro-vides a fine-grained family name rather than a generic label (eg TrojanVirus etc) which contains scarce meaning for the malware sample

6

Fig 4 Malware families in our dataset classified by AVClass

which could adversely affect the training procedure becausethe classifier would be prone to learn from the majority classHence we under-sampled the dataset by reducing the numberof sliding windows from benign processes to the same asmalware processes so that the ML and DL detectors can learnfrom the same number of positive and negative samples

Window size and stride are two important hyper-parametersin sliding window methods indicating the total and the shiftingnumber of n-gram system calls in each process In real-timedetection large window sizes need more time to fill up thesliding window provide more context for DL models andtherefore achieve higher accuracy We aim to choose the propertwo hyper-parameters to ensure malware are detected accu-rately and in time (before completing its malicious behavior)In our experiments we observed that malware can successfullyinfect the systems in one to two seconds and 1000 systemcalls can be invoked in one second Therefore we selected andcompared three window size and stride pairs (100 50) (200100) (500 250) The results showed that with pair (500 250)DEEPMALWARE needs 1 to 2 seconds to fill up the windowand analyze With pair (200 100) DEEPMALWARE needs 05to 1 second While with pair (100 50) DEEPMALWARE needsonly 01 to 05 seconds The F1-scores for DEEPMALWAREare 9702 9508 and 9497 respectively

Although the detection performance is increased with theincrease of window sizes large window sizes indicate longerwait time for the detector to allow the software to be executedand fill up the windows During this long wait time themalware are more likely to conduct malicious behaviorsTherefore a proper window size needs to be carefully selectedso as to balance the prediction performance and the waittime Since the performance gain with respect to the increasedwindow size is marginal (around 2 increase in terms of F1score from 500 to 100) while the waiting time can be reducedto 10-20 we set the window size and stride as 100 and50 respectively for the following evaluation In practice thevalue of window sizes can be altered in our PROPEDEUTICAframework to meet the required detection performance (ega required accuracy or an acceptable FP rate)

C Performance Analysis of standalone ML and DL classifiers

We started by comparing the performance of DEEPMAL-WARE with relevant ML and DL classifiers Our metrics formodel performance were accuracy precision recall F1 scoreand false positive (FP) rate

Table I shows the detection performance and executiontime of using a single ML or DL model DEEPMALWAREachieves the best performance among all the models Withwindow size 500 and stride 250 DEEPMALWARE achievedan accuracy of 9703 and a false positive rate of 243 Theperformance of DEEPMALWARE increased with the increase inwindow size and stride Random Forest performs the secondbest with an accuracy of 8905 at window size 100 andstride 50 and an accuracy of 9394 at window size 500and stride 250 approximately 10 better than AdaBoostand 5 better than XGBoost Random Forest also achievesthe fastest speed among all the machine learning methodsWhile DEEPMALWARE model is 309 more accurate thanRandom Forest its classification time is approximately 100times slower on CPU and 3 to 5 times slower on GPU

We did not apply GPU devices to ML models because theirclassification time with conventional CPUs is much smaller(at least one order of magnitude) than those measured for DLalgorithms We denote lsquoNArsquo as not applicable in Table I Inreal life GPU devices especially specific DL GPUs for accel-erating DL trainingtesting are still not sufficiently widespreadin end-user or corporate devices In addition the preprocessingtime of Reconstruction Module is around 1098 seconds perrun or 00003 seconds per window (when using 100 windowsize) which is almost negligible compared with ML (00088seconds per window) or DL execution time (00383 secondsper window using GPU or 14104 seconds per window usingCPU)

We observed that malware could successfully infect thetarget system in one to two seconds and 1000 system callscan be invoked in one second Moreover traces from oneprocess may take a significant amount of time to fill upone sliding window because the process might be invokingsystem calls at a low rate Thus in the remainder of thissection we chose to compare all the algorithms using windowsizes of 100 and strides of 50 system calls For all windowsizes and strides the best ML model was Random ForestTherefore for PROPEDEUTICA we chose Random Forest forML classification and DEEPMALWARE for DL classification

In addition we evaluated PROPEDEUTICArsquos performanceon seven APTs with various borderline intervals PROPEDEU-TICA successfully detected all the APTs and six of themwere subjected to DEEPMALWARE classification For differentborderline intervals the false positive rate was approximately10 Random Forest in isolation detected one APT with a525 false positive rate

D Performance Analysis of PROPEDEUTICA combining MLand DL classifiers

Next we evaluated PROPEDEUTICArsquos performance for dif-ferent borderline intervals Based on the results from Table Iwe chose Random Forest as ML classifier and DEEPMAL-WARE as DL classifier with window size 100 and stride50 PROPEDEUTICA is evaluated based on various borderlineintervals As discussed before in PROPEDEUTICA if a pieceof software receives a malware classification probability fromRandom Forest smaller than the lower bound it is considered

7

TABLE I Detection performance of standalone ML or DL classifiers DEEPMALWARE achieves the best performance amongall the models Random Forest is approximately 8 more accurate and much faster than the other machine learning models

Models Window Size Stride Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)

AdaBoost

100 50

7925 7811 8131 7967 2280 NA 00187Random Forest 8905 8463 9544 8971 1735 NA 00089XGBoost 8478 9099 7722 8354 766 NA 00116DEEPMALWARE 9484 9263 9743 9497 776 00383 14104AdaBoost

200 100

7827 7284 9019 8059 3364 NA 00132Random Forest 9349 8999 9790 9377 1091 NA 00063XGBoost 7081 6365 9708 7689 7081 NA 00073DEEPMALWARE 9496 9305 9720 9508 727 00543 13784AdaBoost

500 250

8181 7944 8586 8252 2224 NA 00108Random Forest 9394 9716 9054 9373 265 NA 00048XGBoost 7919 9414 6227 7495 388 NA 00062DEEPMALWARE 9703 9754 9650 9702 243 00797 13344

TABLE II Comparison on different borderline policies for the PROPEDEUTICA combining ML and DL classifiers Borderlineintervals are described with a lower bound and an upper bound The move percentage represents the percentage of software inthe system that received a borderline classification with Random Forest (according to the borderline interval) and was subjectedto further analysis with DEEPMALWARE

LowerBound

UpperBound

Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)Move

Percentage

10 90 9471 9241 9743 9485 800 00223 0381 559520 80 9461 9223 9743 9476 821 00173 01543 483230 70 9434 9177 9743 9451 875 00146 00884 414540 60 9372 9079 9737 9396 992 00123 00497 3496

benign software If the probability is greater than the upperbound it is considered malicious However if the classificationfalls within the borderline interval the software is subjectedto DEEPMALWARE Table II shows the performance of PRO-PEDEUTICA using various (configurable) borderline intervalswhen combining ML and DL methods We chose lower boundas 10 20 30 and 40 upper bound as 60 70 80and 90 We observe that when a small borderline interval isconfigured (eg 30-70 40-60) the detection time isreduced to less than 01 seconds even using CPU with mini-mal degradation of prediction performance With a borderlineinterval of [30-70] PROPEDEUTICA achieved an accuracyof 9434 and a false positive rate of 875 with 4145of the samples moved for DEEPMALWARE analysis Thishighlights the potential of PROPEDEUTICA approximately60 of the samples were quickly classified with high accuracyas malware or benign software using an initial triage withfaster Random Forest Only 40 of the samples needed to besubjected to a more expensive analysis using DEEPMALWAREPROPEDEUTICA achieves high prediction performance whilemaking it feasible to use expensive DL algorithms in real-timemalware detection

E Comparison with Other DL Malware Detectors

We compared PROPEDEUTICA with other DL-based mal-ware detectors in the literature leveraging systemAPI calls asfeatures The features used in our work and [23] are Windowssystem calls The features used in [21] and [22] are WindowsAPI calls System calls provide more insights on the softwareat the kernel level compared with API calls collected at theuser level Moreover we evaluated the detection performance

on a much larger dataset with more traces which providesmore accurate evaluation results

In Table III we summarize the obtained results Comparedwith the existing state-of-the-art DL-based malware detectorsour approaches achieve much better detection performance interms of accuracy precision and recall More importantlycurrent literature mainly focuses on offline analysis [21]ndash[23] and lacks analysis of the detection time in real-timeWe evaluated DEEPMALWARE on a much larger dataset withbetter prediction performance and detection time guarantee

V DISCUSSION

We proposed a real-time malware detection frameworkPROPEDEUTICA accomplishing the fast speed of ML andthe high accuracy of emerging DL models Only softwarereceiving borderline classification from an ML detector neededfurther analysis by DEEPMALWARE which saved computa-tional resources shortened the detection time and improvedaccuracy The key contribution of PROPEDEUTICA is the intel-ligent combination of ML with emerging modern DL methodsin an effective real-time detector which can meet the needsof high accuracy low false-positive rate and short detectiontime required to counter the next generation of malware In thissection we discuss several challenges in the current design andpotential directions of improving PROPEDEUTICA in futurework

A Recurrent Loop in PROPEDEUTICA

PROPEDEUTICA can run into a worst-case scenario whichis having a process continuously loop between Random Forest

8

TABLE III Comparison among DEEPMALWARE and other DL-based malware detection in the literature

Work Model Dataset Performance

DMINrsquo16 [21] DL4MD 45000 traces Accuracy 9565 Precision 9546 Recall 958KDDrsquo17 [22] DNN 26078 traces Accuracy 9399 3

LNCSrsquo16 [23] LSTM 4753 traces Accuracy 894 Precision 856 Recall 894PROPEDEUTICA DEEPMALWARE 493095 traces Accuracy 970 Precision 975 Recall 970

and DEEPMALWARE For example consider a process re-ceiving a borderline classification from Random Forest andthen being moved to DEEPMALWARE Then DEEPMALWAREclassifies it as benign and the software would continue beinganalyzed by Random Forest which again provides a borderlineclassification for the process and so on We plan to mitigatesuch recurrent processes by combining the previous predictionof DEEPMALWARE with the prediction of Random Forest inthe first stage

B Adversarial Attacks against Malware Detection

Despite initial successes on malware classification a re-sourceful and motivated adversary can bypass the protectionmechanisms using evasion attacks A plethora of recent studieshave demonstrated that ML and DL models are vulnerable toevasion attacks such as adversarial examples [24] [25] [52][53] By adding a small perturbation to the input features ofmachine learning models adversaries can mislead the detec-tion system to classify a malware as a benign software [54]ndash[61] However most adversarial attacks target at static malwaredetection malware is not executed and machine learningconducts detection on the static code For example Park etalinjected dummy code into malware source code using anobfuscation technique [57] Liu etal manipulated the malwareimages generated from malware binaries using adversarialattacks [56] Only [62] generated adversarial sequences toattack dynamic behavior-based malware detection systemsHowever their victim detection model is much simpler thanour proposed model In our work we propose a frameworkfor dynamic malware detection where malware is investigatedduring execution which makes the adversarial attacks mucheasier to detect since it is challenging to generate a sequenceof systemAPI calls with normal behaviors In the future weplan to thoroughly investigate the robustness of PROPEDEU-TICA against adversarial attacks

C Weakly Supervised Anomaly Detection

In our work we assume that the malware samples arewell annotated and ML and DL models are trained on thesesamples ie fully supervised anomaly detection However themalware samples have been generated and evolved rapidly inrecent years It becomes impractical to analyze and identifyall the malware samples for model training Therefore aweakly-supervised learning paradigm [63] is urgently neededfor malware detection where only a limited number of labeledpositive samples are provided and a large number of positivesamples are not labeled Although unsupervised learning ap-proaches leveraging Autoencoders and generative adversarial

3The performance in terms of precision and recall is not provided in [22]

networks (GANs) can be used in weakly supervised learningby learning features that are robust to small deviations innegative samples [64]ndash[69] the labeled positive samples arenot fully utilized Recently Pang et al proposed DevNet andPRO to learn anomaly scores using deep learning models forbetter use of labeled positive data [70] [71] The proposedframework in PROPEDEUTICA can be further extended withthe capabilities of learning from weakly labeled samples usingthese advanced weakly-supervised approaches

VI RELATED WORK

Our work pertains to fields related to behavior-based mal-ware detection In this section we summarize the state-of-the-art in these areas and highlights topics currently under-studied

A Behavior-based Malware Detection

Dynamic behavior-based malware detection [5] [15]evolved from Forrest et alrsquos seminal work [34] on detectinganomalies using system calls as features Christodorescu etal [5] [6] extract high-level and unique behavior from thedisassembled binary to detect malware and its variants Thedetection is based on predefined templates covering potentiallymalicious behaviors such as mass-mailing and unpackingWillems et al proposed CWSandbox a dynamic analysissystem that monitors malware API calls in a virtual en-vironment [72] Rieck et al [73] leveraged API calls asfeatures to classify malware into families using Support VectorMachines (SVM) and processed system calls into q-gramsrepresentations using an algorithm similar to the k-nearestneighbors (kNN) [14] Mohaisen et al [74] introduced AMALa framework to dynamically analyze and classify malwareusing SVM linear regression classification trees and kNNKolosnjaji et al [75] proposed maximum-a-posteriori (MAP)to classify malware families using Cuckoorsquos Sandbox [76]Although these techniques were successfully applied for mal-ware classification they did not consider benign samples [77]and were limited to labeling unknown malware pertaining toone of the existing clusters

Wressnegger et al [78] proposed Gordon for detectingFlash-based malware Gordon considered execution behaviorfrom benign and malicious Flash applications and generatedn-grams for an SVM model Bayer et al used a modifiedversion of QEMU to monitor API calls and breakpoints [10]This approach was later used to build Anubis [79] Anubisis better suited for offline malware analysis (by providing adetailed execution report) PROPEDEUTICA on the other handfocuses on real-time detection and monitors a more diverseset of system calls Kirat et al introduced BareBox whichhooked system calls directly from SSDT [80] Barebox runsin bare metal and can potentially obtain behavioral traces frommalware equipped with anti-analysis techniques Barebox aims

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

4

from all processes during the sliding window and extractsthem as a frequency feature vector Each element of the vectorrepresents the frequency of an n-gram unit in the system callsequence

C DEEPMALWARE Classifier

PROPEDEUTICA aims to address spatial-temporal dynam-ics and software execution heterogeneity Spatially advancedmalware leverages multiple processes to work together toachieve a long-term common goal Temporally a maliciousprocess may demonstrate different types of behaviors (benignor malicious) during the lifetime of execution such as keepdormant or benign at the beginning of the execution Tothoroughly consider available behavior data in space and timewe introduced a novel DL model DEEPMALWARE First tocapture the spatial connections between multiple processes wefeed both process-wide features and system-wide features intoDEEPMALWARE The process-wide and system-wide inputsprovide detailed information about how the target software(malware or benign software) interacts with the rest softwarein the system Second to capture the temporal features wefirst introduced the bi-directional LSTM [38] to capture theconnection of n-grams However due to the gradient vanishingproblem of LSTM in processing long sequences we furtherintroduce multi-scale convolutional neural networks to extracthigh-level representation of n-gram system calls so as toreduce the input length of LSTM and avoid blind spotsTo facilitate harmonious coordination DEEPMALWARE stacksspatial and temporal models for joint training

System call sequences are taken as input for all processessubjected to DEEPMALWARE analysis (see Figure 2 for aworkflow of the classification approach) DEEPMALWAREleverages n-gram sequences of processes and frequency featurevectors of the system First two streams (process-wide n-gram sequence and density sequence) model the sequence anddensity of n-gram system calls of the process The third streamrepresents the global frequency feature vector of the wholesystem DEEPMALWARE consists of four main components(i) N-gram Embedding (ii) ResNext blocks (iii) Long Short-Term Memory (LSTM) Layers and (iv) Fully ConnectedLayers (Figure 3)

N-gram Embedding DEEPMALWARE adopts an encodingscheme called N-gram Embedding which converts sparse n-gram units into a dense representation DEEPMALWARE treatsn-gram units as discrete atomic inputs N-gram Embeddinghelps to understand the relationship between functionallycorrelated system calls and provides meaningful informationof each n-gram to the learning system Unlike vector rep-resentations such as one-hot vectors (which may cause datasparsity) N-gram Embedding mitigates sparsity and reducesthe dimension of input n-gram units The embedding layermaps sparse system call sequences to a dense vector Thislayer also helps to extract semantic knowledge from low-levelfeatures (system call sequences) and largely reduces featuredimension The embedding layer uses 256 neurons whichreduce the dimension of the n-gram model (number of uniquen-grams in the evaluation) from 3526 to 256

ResNext blocks DEEPMALWARE uses six ResNext blocksto extract different sizes of n-gram system calls in the slidingwindow Conventional sliding window methods leverage smallwindows of system call sequences and therefore experiencechallenges when modeling long sequences These conventionalmethods represent features in a rather simple way (eg onlycounting the total number of system calls in the windowswithout local information of each system calls) which maybe inadequate for a classification task The design of ResNextfollows the implementation in [39] Each ResNext blockaggregates multiple sets of convolutional neural networks withthe same topology ResNext blocks extract different n-grams toavoid blind spots in detection Residual blocks perform shortcuts between blocks to improve the training of deep layersBatch normalization is applied to speed up the training processafter convolutional layers and a non-linear activation functionReLU to avoid saturation

Long Short-Term Memory (LSTM) Layers The internaldependencies between system calls include meaningful contextinformation or unknown patterns for identifying malicious be-haviors To leverage this information DEEPMALWARE adoptsone of the recurrent architectures LSTM [38] for its strongcapability of learning meaningful temporalsequential patternsin a sequence and reflecting internal state changes modulatedby the recurrent weights LSTM networks can avoid gradientvanishing and learn long-term dependencies by using forgetgates LSTM layers gather information from the first twostreams process-wide n-gram sequence and density sequence

Fully Connected Layers DEEPMALWARE deploys a fullyconnected layer is deployed to encode system-wide frequencyThen it is concatenated with the output of bi-directionalLSTMs to gather both sequence-based process informationand frequency-based system-wide information The last fully-connected layer with the softmax function outputs the predic-tion with probabilities

Moreover we adopt three popular techniques to avoid over-training in DEEPMALWARE 1) We add a Dropout layer [40]with a dropout rate of 05 following the fully connected layer2) We include batch normalization [41] in the ResNext blocksafter the convolutional layers 3) We use 20 training data asa validation set and apply early stopping when the loss on thevalidation set does not decrease

D Conventional ML Classifier

We consider three pervasively used conventional ML al-gorithms Random Forest (RF) eXtreme Gradient Boosting(XGBoost) [42] and Adaptive Boosting (AdaBoost) [43]Random Forest and boosted trees have been considered as thebest supervised models on high-dimensional data [44] We useAdaBoost and XGBoost as representatives of boosted trees

To select the best features used in machine learning classi-fiers we consider both efficacy and effectiveness In PROPE-DEUTICA we take the frequency of n-gram units in a sequenceas input features and estimate the probabilities of a sequenceof system calls invoked by a malware Specifically a featurevector will be created for each sequence and each elementin the vector represents the frequency of the corresponding

5

Fig 2 Workflow of DEEPMALWARE classification approach

Fig 3 DEEPMALWARE architecture

n-gram unit We discuss the details of ML classifiers inSection IV

Notice that ML classifiers in PROPEDEUTICA can be gen-eralized to other ML algorithms when ML classifiers canoutput classification probabilities for borderline classification(ie probabilistic ML models) For Non-probabilistic MLclassifiers we will adopt a conversion function to provideclass probabilities For example the classification scores pro-vided by Support Vector Machine can be mapped into theclass probabilities via a logistic transformation with trainedparameters [45] K Nearest Neighbor (kNN) methods canbe extended with probabilistic methods for estimating theposterior probabilities using distances from neighbors [46]The classification probabilities of ML classifiers are used forthe borderline classification (Fig 1 Step 4) It is worth to notethat PROPEDEUTICA is designed to integrate with any machinelearning and deep learning methods in malware detection withminimal efforts Therefore we select commonly used andrepresentative algorithms to show PROPEDEUTICArsquos capabilityin plugging in these algorithms

IV EVALUATION

The main goal of our evaluation is to determine PROPEDEU-TICArsquos effectiveness for real-time malware detection Morespecifically we sought to discover to what extent PROPEDEU-TICA outperforms ML on prediction performance and DL onclassification time

We first describe the dataset used in our evaluation Nextwe evaluate the performance of our newly proposed DEEP-MALWARE algorithm in comparison with the most relevantML classifiers from the literature Then we present the resultsof our experiments on the proposed PROPEDEUTICA In ourevaluation we compare the performance of classifiers whenoperating on a CPU and a GPU GPUs not only reducethe training time for DL models but also help these models

achieve classification time several orders of magnitude lessthan conventional CPUs in the deployment

We performed our test on a server running Ubuntu 1204with 2 CPUs 4GB Memory 60GB Disk DEEPMALWAREwas implemented using Pytorch [47] The training and testingprocedures ran on Nvidia Tesla M40 GPUs

A Software Collection Method

For the malicious part the dataset consisted of 9115Microsoft Windows PE32 format malware samples collectedbetween 2014 and 2018 from threats detected by the incidentresponse team of a major financial institution (who chose toremain anonymous) in the corporate email and Internet accesslinks of the institutionrsquos employees and 7 APTs collectedfrom Rekings [48] Figure 4 shows the categorization of ourmalware samples using AVClass [49]2 The goal of malwarecollection is to be as broad as possible The distribution of ourcollected malware family (Figure 4) correlates to the distribu-tion of malware detected by AV companies For the benignpart our dataset was composed of 1338 samples including866 Windows benchmark software 50 system software 11commonly used GUI-based software and 409 other benignsoftware GUI-based software (eg Chrome Microsoft OfficeVideo Player) had keyboard and mouse operations simulatedusing WinAutomation [50]

B System Call Dataset

For each experiment we run malware and benign softwareand collect system-wide system calls for five minutes Theexperiments were carried out under three different scenar-ios (1) running one general malware one benchmarkGUI-basedother benign software and dozens of system software(2) running two general malware several benchmarkGUI-basedother benign software and dozens of system software(3) running one APT one benchmarkGUI-basedother benignsoftware and dozens of system software

We randomly sampled half of the malware and benign soft-ware for training and the other half for testing to avoid over-fitting [51] We collected 493095 traces in total During therunning of one malware sample multiple benign processes (es-pecially system processes) would run concurrently Thereforethe number of benign process executions we collected wouldbe much larger than that of malicious process executionsOur classifier would end up handling imbalanced datasets

2AVClass is an automatic vendor-agnostic malware labeling tool It pro-vides a fine-grained family name rather than a generic label (eg TrojanVirus etc) which contains scarce meaning for the malware sample

6

Fig 4 Malware families in our dataset classified by AVClass

which could adversely affect the training procedure becausethe classifier would be prone to learn from the majority classHence we under-sampled the dataset by reducing the numberof sliding windows from benign processes to the same asmalware processes so that the ML and DL detectors can learnfrom the same number of positive and negative samples

Window size and stride are two important hyper-parametersin sliding window methods indicating the total and the shiftingnumber of n-gram system calls in each process In real-timedetection large window sizes need more time to fill up thesliding window provide more context for DL models andtherefore achieve higher accuracy We aim to choose the propertwo hyper-parameters to ensure malware are detected accu-rately and in time (before completing its malicious behavior)In our experiments we observed that malware can successfullyinfect the systems in one to two seconds and 1000 systemcalls can be invoked in one second Therefore we selected andcompared three window size and stride pairs (100 50) (200100) (500 250) The results showed that with pair (500 250)DEEPMALWARE needs 1 to 2 seconds to fill up the windowand analyze With pair (200 100) DEEPMALWARE needs 05to 1 second While with pair (100 50) DEEPMALWARE needsonly 01 to 05 seconds The F1-scores for DEEPMALWAREare 9702 9508 and 9497 respectively

Although the detection performance is increased with theincrease of window sizes large window sizes indicate longerwait time for the detector to allow the software to be executedand fill up the windows During this long wait time themalware are more likely to conduct malicious behaviorsTherefore a proper window size needs to be carefully selectedso as to balance the prediction performance and the waittime Since the performance gain with respect to the increasedwindow size is marginal (around 2 increase in terms of F1score from 500 to 100) while the waiting time can be reducedto 10-20 we set the window size and stride as 100 and50 respectively for the following evaluation In practice thevalue of window sizes can be altered in our PROPEDEUTICAframework to meet the required detection performance (ega required accuracy or an acceptable FP rate)

C Performance Analysis of standalone ML and DL classifiers

We started by comparing the performance of DEEPMAL-WARE with relevant ML and DL classifiers Our metrics formodel performance were accuracy precision recall F1 scoreand false positive (FP) rate

Table I shows the detection performance and executiontime of using a single ML or DL model DEEPMALWAREachieves the best performance among all the models Withwindow size 500 and stride 250 DEEPMALWARE achievedan accuracy of 9703 and a false positive rate of 243 Theperformance of DEEPMALWARE increased with the increase inwindow size and stride Random Forest performs the secondbest with an accuracy of 8905 at window size 100 andstride 50 and an accuracy of 9394 at window size 500and stride 250 approximately 10 better than AdaBoostand 5 better than XGBoost Random Forest also achievesthe fastest speed among all the machine learning methodsWhile DEEPMALWARE model is 309 more accurate thanRandom Forest its classification time is approximately 100times slower on CPU and 3 to 5 times slower on GPU

We did not apply GPU devices to ML models because theirclassification time with conventional CPUs is much smaller(at least one order of magnitude) than those measured for DLalgorithms We denote lsquoNArsquo as not applicable in Table I Inreal life GPU devices especially specific DL GPUs for accel-erating DL trainingtesting are still not sufficiently widespreadin end-user or corporate devices In addition the preprocessingtime of Reconstruction Module is around 1098 seconds perrun or 00003 seconds per window (when using 100 windowsize) which is almost negligible compared with ML (00088seconds per window) or DL execution time (00383 secondsper window using GPU or 14104 seconds per window usingCPU)

We observed that malware could successfully infect thetarget system in one to two seconds and 1000 system callscan be invoked in one second Moreover traces from oneprocess may take a significant amount of time to fill upone sliding window because the process might be invokingsystem calls at a low rate Thus in the remainder of thissection we chose to compare all the algorithms using windowsizes of 100 and strides of 50 system calls For all windowsizes and strides the best ML model was Random ForestTherefore for PROPEDEUTICA we chose Random Forest forML classification and DEEPMALWARE for DL classification

In addition we evaluated PROPEDEUTICArsquos performanceon seven APTs with various borderline intervals PROPEDEU-TICA successfully detected all the APTs and six of themwere subjected to DEEPMALWARE classification For differentborderline intervals the false positive rate was approximately10 Random Forest in isolation detected one APT with a525 false positive rate

D Performance Analysis of PROPEDEUTICA combining MLand DL classifiers

Next we evaluated PROPEDEUTICArsquos performance for dif-ferent borderline intervals Based on the results from Table Iwe chose Random Forest as ML classifier and DEEPMAL-WARE as DL classifier with window size 100 and stride50 PROPEDEUTICA is evaluated based on various borderlineintervals As discussed before in PROPEDEUTICA if a pieceof software receives a malware classification probability fromRandom Forest smaller than the lower bound it is considered

7

TABLE I Detection performance of standalone ML or DL classifiers DEEPMALWARE achieves the best performance amongall the models Random Forest is approximately 8 more accurate and much faster than the other machine learning models

Models Window Size Stride Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)

AdaBoost

100 50

7925 7811 8131 7967 2280 NA 00187Random Forest 8905 8463 9544 8971 1735 NA 00089XGBoost 8478 9099 7722 8354 766 NA 00116DEEPMALWARE 9484 9263 9743 9497 776 00383 14104AdaBoost

200 100

7827 7284 9019 8059 3364 NA 00132Random Forest 9349 8999 9790 9377 1091 NA 00063XGBoost 7081 6365 9708 7689 7081 NA 00073DEEPMALWARE 9496 9305 9720 9508 727 00543 13784AdaBoost

500 250

8181 7944 8586 8252 2224 NA 00108Random Forest 9394 9716 9054 9373 265 NA 00048XGBoost 7919 9414 6227 7495 388 NA 00062DEEPMALWARE 9703 9754 9650 9702 243 00797 13344

TABLE II Comparison on different borderline policies for the PROPEDEUTICA combining ML and DL classifiers Borderlineintervals are described with a lower bound and an upper bound The move percentage represents the percentage of software inthe system that received a borderline classification with Random Forest (according to the borderline interval) and was subjectedto further analysis with DEEPMALWARE

LowerBound

UpperBound

Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)Move

Percentage

10 90 9471 9241 9743 9485 800 00223 0381 559520 80 9461 9223 9743 9476 821 00173 01543 483230 70 9434 9177 9743 9451 875 00146 00884 414540 60 9372 9079 9737 9396 992 00123 00497 3496

benign software If the probability is greater than the upperbound it is considered malicious However if the classificationfalls within the borderline interval the software is subjectedto DEEPMALWARE Table II shows the performance of PRO-PEDEUTICA using various (configurable) borderline intervalswhen combining ML and DL methods We chose lower boundas 10 20 30 and 40 upper bound as 60 70 80and 90 We observe that when a small borderline interval isconfigured (eg 30-70 40-60) the detection time isreduced to less than 01 seconds even using CPU with mini-mal degradation of prediction performance With a borderlineinterval of [30-70] PROPEDEUTICA achieved an accuracyof 9434 and a false positive rate of 875 with 4145of the samples moved for DEEPMALWARE analysis Thishighlights the potential of PROPEDEUTICA approximately60 of the samples were quickly classified with high accuracyas malware or benign software using an initial triage withfaster Random Forest Only 40 of the samples needed to besubjected to a more expensive analysis using DEEPMALWAREPROPEDEUTICA achieves high prediction performance whilemaking it feasible to use expensive DL algorithms in real-timemalware detection

E Comparison with Other DL Malware Detectors

We compared PROPEDEUTICA with other DL-based mal-ware detectors in the literature leveraging systemAPI calls asfeatures The features used in our work and [23] are Windowssystem calls The features used in [21] and [22] are WindowsAPI calls System calls provide more insights on the softwareat the kernel level compared with API calls collected at theuser level Moreover we evaluated the detection performance

on a much larger dataset with more traces which providesmore accurate evaluation results

In Table III we summarize the obtained results Comparedwith the existing state-of-the-art DL-based malware detectorsour approaches achieve much better detection performance interms of accuracy precision and recall More importantlycurrent literature mainly focuses on offline analysis [21]ndash[23] and lacks analysis of the detection time in real-timeWe evaluated DEEPMALWARE on a much larger dataset withbetter prediction performance and detection time guarantee

V DISCUSSION

We proposed a real-time malware detection frameworkPROPEDEUTICA accomplishing the fast speed of ML andthe high accuracy of emerging DL models Only softwarereceiving borderline classification from an ML detector neededfurther analysis by DEEPMALWARE which saved computa-tional resources shortened the detection time and improvedaccuracy The key contribution of PROPEDEUTICA is the intel-ligent combination of ML with emerging modern DL methodsin an effective real-time detector which can meet the needsof high accuracy low false-positive rate and short detectiontime required to counter the next generation of malware In thissection we discuss several challenges in the current design andpotential directions of improving PROPEDEUTICA in futurework

A Recurrent Loop in PROPEDEUTICA

PROPEDEUTICA can run into a worst-case scenario whichis having a process continuously loop between Random Forest

8

TABLE III Comparison among DEEPMALWARE and other DL-based malware detection in the literature

Work Model Dataset Performance

DMINrsquo16 [21] DL4MD 45000 traces Accuracy 9565 Precision 9546 Recall 958KDDrsquo17 [22] DNN 26078 traces Accuracy 9399 3

LNCSrsquo16 [23] LSTM 4753 traces Accuracy 894 Precision 856 Recall 894PROPEDEUTICA DEEPMALWARE 493095 traces Accuracy 970 Precision 975 Recall 970

and DEEPMALWARE For example consider a process re-ceiving a borderline classification from Random Forest andthen being moved to DEEPMALWARE Then DEEPMALWAREclassifies it as benign and the software would continue beinganalyzed by Random Forest which again provides a borderlineclassification for the process and so on We plan to mitigatesuch recurrent processes by combining the previous predictionof DEEPMALWARE with the prediction of Random Forest inthe first stage

B Adversarial Attacks against Malware Detection

Despite initial successes on malware classification a re-sourceful and motivated adversary can bypass the protectionmechanisms using evasion attacks A plethora of recent studieshave demonstrated that ML and DL models are vulnerable toevasion attacks such as adversarial examples [24] [25] [52][53] By adding a small perturbation to the input features ofmachine learning models adversaries can mislead the detec-tion system to classify a malware as a benign software [54]ndash[61] However most adversarial attacks target at static malwaredetection malware is not executed and machine learningconducts detection on the static code For example Park etalinjected dummy code into malware source code using anobfuscation technique [57] Liu etal manipulated the malwareimages generated from malware binaries using adversarialattacks [56] Only [62] generated adversarial sequences toattack dynamic behavior-based malware detection systemsHowever their victim detection model is much simpler thanour proposed model In our work we propose a frameworkfor dynamic malware detection where malware is investigatedduring execution which makes the adversarial attacks mucheasier to detect since it is challenging to generate a sequenceof systemAPI calls with normal behaviors In the future weplan to thoroughly investigate the robustness of PROPEDEU-TICA against adversarial attacks

C Weakly Supervised Anomaly Detection

In our work we assume that the malware samples arewell annotated and ML and DL models are trained on thesesamples ie fully supervised anomaly detection However themalware samples have been generated and evolved rapidly inrecent years It becomes impractical to analyze and identifyall the malware samples for model training Therefore aweakly-supervised learning paradigm [63] is urgently neededfor malware detection where only a limited number of labeledpositive samples are provided and a large number of positivesamples are not labeled Although unsupervised learning ap-proaches leveraging Autoencoders and generative adversarial

3The performance in terms of precision and recall is not provided in [22]

networks (GANs) can be used in weakly supervised learningby learning features that are robust to small deviations innegative samples [64]ndash[69] the labeled positive samples arenot fully utilized Recently Pang et al proposed DevNet andPRO to learn anomaly scores using deep learning models forbetter use of labeled positive data [70] [71] The proposedframework in PROPEDEUTICA can be further extended withthe capabilities of learning from weakly labeled samples usingthese advanced weakly-supervised approaches

VI RELATED WORK

Our work pertains to fields related to behavior-based mal-ware detection In this section we summarize the state-of-the-art in these areas and highlights topics currently under-studied

A Behavior-based Malware Detection

Dynamic behavior-based malware detection [5] [15]evolved from Forrest et alrsquos seminal work [34] on detectinganomalies using system calls as features Christodorescu etal [5] [6] extract high-level and unique behavior from thedisassembled binary to detect malware and its variants Thedetection is based on predefined templates covering potentiallymalicious behaviors such as mass-mailing and unpackingWillems et al proposed CWSandbox a dynamic analysissystem that monitors malware API calls in a virtual en-vironment [72] Rieck et al [73] leveraged API calls asfeatures to classify malware into families using Support VectorMachines (SVM) and processed system calls into q-gramsrepresentations using an algorithm similar to the k-nearestneighbors (kNN) [14] Mohaisen et al [74] introduced AMALa framework to dynamically analyze and classify malwareusing SVM linear regression classification trees and kNNKolosnjaji et al [75] proposed maximum-a-posteriori (MAP)to classify malware families using Cuckoorsquos Sandbox [76]Although these techniques were successfully applied for mal-ware classification they did not consider benign samples [77]and were limited to labeling unknown malware pertaining toone of the existing clusters

Wressnegger et al [78] proposed Gordon for detectingFlash-based malware Gordon considered execution behaviorfrom benign and malicious Flash applications and generatedn-grams for an SVM model Bayer et al used a modifiedversion of QEMU to monitor API calls and breakpoints [10]This approach was later used to build Anubis [79] Anubisis better suited for offline malware analysis (by providing adetailed execution report) PROPEDEUTICA on the other handfocuses on real-time detection and monitors a more diverseset of system calls Kirat et al introduced BareBox whichhooked system calls directly from SSDT [80] Barebox runsin bare metal and can potentially obtain behavioral traces frommalware equipped with anti-analysis techniques Barebox aims

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

5

Fig 2 Workflow of DEEPMALWARE classification approach

Fig 3 DEEPMALWARE architecture

n-gram unit We discuss the details of ML classifiers inSection IV

Notice that ML classifiers in PROPEDEUTICA can be gen-eralized to other ML algorithms when ML classifiers canoutput classification probabilities for borderline classification(ie probabilistic ML models) For Non-probabilistic MLclassifiers we will adopt a conversion function to provideclass probabilities For example the classification scores pro-vided by Support Vector Machine can be mapped into theclass probabilities via a logistic transformation with trainedparameters [45] K Nearest Neighbor (kNN) methods canbe extended with probabilistic methods for estimating theposterior probabilities using distances from neighbors [46]The classification probabilities of ML classifiers are used forthe borderline classification (Fig 1 Step 4) It is worth to notethat PROPEDEUTICA is designed to integrate with any machinelearning and deep learning methods in malware detection withminimal efforts Therefore we select commonly used andrepresentative algorithms to show PROPEDEUTICArsquos capabilityin plugging in these algorithms

IV EVALUATION

The main goal of our evaluation is to determine PROPEDEU-TICArsquos effectiveness for real-time malware detection Morespecifically we sought to discover to what extent PROPEDEU-TICA outperforms ML on prediction performance and DL onclassification time

We first describe the dataset used in our evaluation Nextwe evaluate the performance of our newly proposed DEEP-MALWARE algorithm in comparison with the most relevantML classifiers from the literature Then we present the resultsof our experiments on the proposed PROPEDEUTICA In ourevaluation we compare the performance of classifiers whenoperating on a CPU and a GPU GPUs not only reducethe training time for DL models but also help these models

achieve classification time several orders of magnitude lessthan conventional CPUs in the deployment

We performed our test on a server running Ubuntu 1204with 2 CPUs 4GB Memory 60GB Disk DEEPMALWAREwas implemented using Pytorch [47] The training and testingprocedures ran on Nvidia Tesla M40 GPUs

A Software Collection Method

For the malicious part the dataset consisted of 9115Microsoft Windows PE32 format malware samples collectedbetween 2014 and 2018 from threats detected by the incidentresponse team of a major financial institution (who chose toremain anonymous) in the corporate email and Internet accesslinks of the institutionrsquos employees and 7 APTs collectedfrom Rekings [48] Figure 4 shows the categorization of ourmalware samples using AVClass [49]2 The goal of malwarecollection is to be as broad as possible The distribution of ourcollected malware family (Figure 4) correlates to the distribu-tion of malware detected by AV companies For the benignpart our dataset was composed of 1338 samples including866 Windows benchmark software 50 system software 11commonly used GUI-based software and 409 other benignsoftware GUI-based software (eg Chrome Microsoft OfficeVideo Player) had keyboard and mouse operations simulatedusing WinAutomation [50]

B System Call Dataset

For each experiment we run malware and benign softwareand collect system-wide system calls for five minutes Theexperiments were carried out under three different scenar-ios (1) running one general malware one benchmarkGUI-basedother benign software and dozens of system software(2) running two general malware several benchmarkGUI-basedother benign software and dozens of system software(3) running one APT one benchmarkGUI-basedother benignsoftware and dozens of system software

We randomly sampled half of the malware and benign soft-ware for training and the other half for testing to avoid over-fitting [51] We collected 493095 traces in total During therunning of one malware sample multiple benign processes (es-pecially system processes) would run concurrently Thereforethe number of benign process executions we collected wouldbe much larger than that of malicious process executionsOur classifier would end up handling imbalanced datasets

2AVClass is an automatic vendor-agnostic malware labeling tool It pro-vides a fine-grained family name rather than a generic label (eg TrojanVirus etc) which contains scarce meaning for the malware sample

6

Fig 4 Malware families in our dataset classified by AVClass

which could adversely affect the training procedure becausethe classifier would be prone to learn from the majority classHence we under-sampled the dataset by reducing the numberof sliding windows from benign processes to the same asmalware processes so that the ML and DL detectors can learnfrom the same number of positive and negative samples

Window size and stride are two important hyper-parametersin sliding window methods indicating the total and the shiftingnumber of n-gram system calls in each process In real-timedetection large window sizes need more time to fill up thesliding window provide more context for DL models andtherefore achieve higher accuracy We aim to choose the propertwo hyper-parameters to ensure malware are detected accu-rately and in time (before completing its malicious behavior)In our experiments we observed that malware can successfullyinfect the systems in one to two seconds and 1000 systemcalls can be invoked in one second Therefore we selected andcompared three window size and stride pairs (100 50) (200100) (500 250) The results showed that with pair (500 250)DEEPMALWARE needs 1 to 2 seconds to fill up the windowand analyze With pair (200 100) DEEPMALWARE needs 05to 1 second While with pair (100 50) DEEPMALWARE needsonly 01 to 05 seconds The F1-scores for DEEPMALWAREare 9702 9508 and 9497 respectively

Although the detection performance is increased with theincrease of window sizes large window sizes indicate longerwait time for the detector to allow the software to be executedand fill up the windows During this long wait time themalware are more likely to conduct malicious behaviorsTherefore a proper window size needs to be carefully selectedso as to balance the prediction performance and the waittime Since the performance gain with respect to the increasedwindow size is marginal (around 2 increase in terms of F1score from 500 to 100) while the waiting time can be reducedto 10-20 we set the window size and stride as 100 and50 respectively for the following evaluation In practice thevalue of window sizes can be altered in our PROPEDEUTICAframework to meet the required detection performance (ega required accuracy or an acceptable FP rate)

C Performance Analysis of standalone ML and DL classifiers

We started by comparing the performance of DEEPMAL-WARE with relevant ML and DL classifiers Our metrics formodel performance were accuracy precision recall F1 scoreand false positive (FP) rate

Table I shows the detection performance and executiontime of using a single ML or DL model DEEPMALWAREachieves the best performance among all the models Withwindow size 500 and stride 250 DEEPMALWARE achievedan accuracy of 9703 and a false positive rate of 243 Theperformance of DEEPMALWARE increased with the increase inwindow size and stride Random Forest performs the secondbest with an accuracy of 8905 at window size 100 andstride 50 and an accuracy of 9394 at window size 500and stride 250 approximately 10 better than AdaBoostand 5 better than XGBoost Random Forest also achievesthe fastest speed among all the machine learning methodsWhile DEEPMALWARE model is 309 more accurate thanRandom Forest its classification time is approximately 100times slower on CPU and 3 to 5 times slower on GPU

We did not apply GPU devices to ML models because theirclassification time with conventional CPUs is much smaller(at least one order of magnitude) than those measured for DLalgorithms We denote lsquoNArsquo as not applicable in Table I Inreal life GPU devices especially specific DL GPUs for accel-erating DL trainingtesting are still not sufficiently widespreadin end-user or corporate devices In addition the preprocessingtime of Reconstruction Module is around 1098 seconds perrun or 00003 seconds per window (when using 100 windowsize) which is almost negligible compared with ML (00088seconds per window) or DL execution time (00383 secondsper window using GPU or 14104 seconds per window usingCPU)

We observed that malware could successfully infect thetarget system in one to two seconds and 1000 system callscan be invoked in one second Moreover traces from oneprocess may take a significant amount of time to fill upone sliding window because the process might be invokingsystem calls at a low rate Thus in the remainder of thissection we chose to compare all the algorithms using windowsizes of 100 and strides of 50 system calls For all windowsizes and strides the best ML model was Random ForestTherefore for PROPEDEUTICA we chose Random Forest forML classification and DEEPMALWARE for DL classification

In addition we evaluated PROPEDEUTICArsquos performanceon seven APTs with various borderline intervals PROPEDEU-TICA successfully detected all the APTs and six of themwere subjected to DEEPMALWARE classification For differentborderline intervals the false positive rate was approximately10 Random Forest in isolation detected one APT with a525 false positive rate

D Performance Analysis of PROPEDEUTICA combining MLand DL classifiers

Next we evaluated PROPEDEUTICArsquos performance for dif-ferent borderline intervals Based on the results from Table Iwe chose Random Forest as ML classifier and DEEPMAL-WARE as DL classifier with window size 100 and stride50 PROPEDEUTICA is evaluated based on various borderlineintervals As discussed before in PROPEDEUTICA if a pieceof software receives a malware classification probability fromRandom Forest smaller than the lower bound it is considered

7

TABLE I Detection performance of standalone ML or DL classifiers DEEPMALWARE achieves the best performance amongall the models Random Forest is approximately 8 more accurate and much faster than the other machine learning models

Models Window Size Stride Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)

AdaBoost

100 50

7925 7811 8131 7967 2280 NA 00187Random Forest 8905 8463 9544 8971 1735 NA 00089XGBoost 8478 9099 7722 8354 766 NA 00116DEEPMALWARE 9484 9263 9743 9497 776 00383 14104AdaBoost

200 100

7827 7284 9019 8059 3364 NA 00132Random Forest 9349 8999 9790 9377 1091 NA 00063XGBoost 7081 6365 9708 7689 7081 NA 00073DEEPMALWARE 9496 9305 9720 9508 727 00543 13784AdaBoost

500 250

8181 7944 8586 8252 2224 NA 00108Random Forest 9394 9716 9054 9373 265 NA 00048XGBoost 7919 9414 6227 7495 388 NA 00062DEEPMALWARE 9703 9754 9650 9702 243 00797 13344

TABLE II Comparison on different borderline policies for the PROPEDEUTICA combining ML and DL classifiers Borderlineintervals are described with a lower bound and an upper bound The move percentage represents the percentage of software inthe system that received a borderline classification with Random Forest (according to the borderline interval) and was subjectedto further analysis with DEEPMALWARE

LowerBound

UpperBound

Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)Move

Percentage

10 90 9471 9241 9743 9485 800 00223 0381 559520 80 9461 9223 9743 9476 821 00173 01543 483230 70 9434 9177 9743 9451 875 00146 00884 414540 60 9372 9079 9737 9396 992 00123 00497 3496

benign software If the probability is greater than the upperbound it is considered malicious However if the classificationfalls within the borderline interval the software is subjectedto DEEPMALWARE Table II shows the performance of PRO-PEDEUTICA using various (configurable) borderline intervalswhen combining ML and DL methods We chose lower boundas 10 20 30 and 40 upper bound as 60 70 80and 90 We observe that when a small borderline interval isconfigured (eg 30-70 40-60) the detection time isreduced to less than 01 seconds even using CPU with mini-mal degradation of prediction performance With a borderlineinterval of [30-70] PROPEDEUTICA achieved an accuracyof 9434 and a false positive rate of 875 with 4145of the samples moved for DEEPMALWARE analysis Thishighlights the potential of PROPEDEUTICA approximately60 of the samples were quickly classified with high accuracyas malware or benign software using an initial triage withfaster Random Forest Only 40 of the samples needed to besubjected to a more expensive analysis using DEEPMALWAREPROPEDEUTICA achieves high prediction performance whilemaking it feasible to use expensive DL algorithms in real-timemalware detection

E Comparison with Other DL Malware Detectors

We compared PROPEDEUTICA with other DL-based mal-ware detectors in the literature leveraging systemAPI calls asfeatures The features used in our work and [23] are Windowssystem calls The features used in [21] and [22] are WindowsAPI calls System calls provide more insights on the softwareat the kernel level compared with API calls collected at theuser level Moreover we evaluated the detection performance

on a much larger dataset with more traces which providesmore accurate evaluation results

In Table III we summarize the obtained results Comparedwith the existing state-of-the-art DL-based malware detectorsour approaches achieve much better detection performance interms of accuracy precision and recall More importantlycurrent literature mainly focuses on offline analysis [21]ndash[23] and lacks analysis of the detection time in real-timeWe evaluated DEEPMALWARE on a much larger dataset withbetter prediction performance and detection time guarantee

V DISCUSSION

We proposed a real-time malware detection frameworkPROPEDEUTICA accomplishing the fast speed of ML andthe high accuracy of emerging DL models Only softwarereceiving borderline classification from an ML detector neededfurther analysis by DEEPMALWARE which saved computa-tional resources shortened the detection time and improvedaccuracy The key contribution of PROPEDEUTICA is the intel-ligent combination of ML with emerging modern DL methodsin an effective real-time detector which can meet the needsof high accuracy low false-positive rate and short detectiontime required to counter the next generation of malware In thissection we discuss several challenges in the current design andpotential directions of improving PROPEDEUTICA in futurework

A Recurrent Loop in PROPEDEUTICA

PROPEDEUTICA can run into a worst-case scenario whichis having a process continuously loop between Random Forest

8

TABLE III Comparison among DEEPMALWARE and other DL-based malware detection in the literature

Work Model Dataset Performance

DMINrsquo16 [21] DL4MD 45000 traces Accuracy 9565 Precision 9546 Recall 958KDDrsquo17 [22] DNN 26078 traces Accuracy 9399 3

LNCSrsquo16 [23] LSTM 4753 traces Accuracy 894 Precision 856 Recall 894PROPEDEUTICA DEEPMALWARE 493095 traces Accuracy 970 Precision 975 Recall 970

and DEEPMALWARE For example consider a process re-ceiving a borderline classification from Random Forest andthen being moved to DEEPMALWARE Then DEEPMALWAREclassifies it as benign and the software would continue beinganalyzed by Random Forest which again provides a borderlineclassification for the process and so on We plan to mitigatesuch recurrent processes by combining the previous predictionof DEEPMALWARE with the prediction of Random Forest inthe first stage

B Adversarial Attacks against Malware Detection

Despite initial successes on malware classification a re-sourceful and motivated adversary can bypass the protectionmechanisms using evasion attacks A plethora of recent studieshave demonstrated that ML and DL models are vulnerable toevasion attacks such as adversarial examples [24] [25] [52][53] By adding a small perturbation to the input features ofmachine learning models adversaries can mislead the detec-tion system to classify a malware as a benign software [54]ndash[61] However most adversarial attacks target at static malwaredetection malware is not executed and machine learningconducts detection on the static code For example Park etalinjected dummy code into malware source code using anobfuscation technique [57] Liu etal manipulated the malwareimages generated from malware binaries using adversarialattacks [56] Only [62] generated adversarial sequences toattack dynamic behavior-based malware detection systemsHowever their victim detection model is much simpler thanour proposed model In our work we propose a frameworkfor dynamic malware detection where malware is investigatedduring execution which makes the adversarial attacks mucheasier to detect since it is challenging to generate a sequenceof systemAPI calls with normal behaviors In the future weplan to thoroughly investigate the robustness of PROPEDEU-TICA against adversarial attacks

C Weakly Supervised Anomaly Detection

In our work we assume that the malware samples arewell annotated and ML and DL models are trained on thesesamples ie fully supervised anomaly detection However themalware samples have been generated and evolved rapidly inrecent years It becomes impractical to analyze and identifyall the malware samples for model training Therefore aweakly-supervised learning paradigm [63] is urgently neededfor malware detection where only a limited number of labeledpositive samples are provided and a large number of positivesamples are not labeled Although unsupervised learning ap-proaches leveraging Autoencoders and generative adversarial

3The performance in terms of precision and recall is not provided in [22]

networks (GANs) can be used in weakly supervised learningby learning features that are robust to small deviations innegative samples [64]ndash[69] the labeled positive samples arenot fully utilized Recently Pang et al proposed DevNet andPRO to learn anomaly scores using deep learning models forbetter use of labeled positive data [70] [71] The proposedframework in PROPEDEUTICA can be further extended withthe capabilities of learning from weakly labeled samples usingthese advanced weakly-supervised approaches

VI RELATED WORK

Our work pertains to fields related to behavior-based mal-ware detection In this section we summarize the state-of-the-art in these areas and highlights topics currently under-studied

A Behavior-based Malware Detection

Dynamic behavior-based malware detection [5] [15]evolved from Forrest et alrsquos seminal work [34] on detectinganomalies using system calls as features Christodorescu etal [5] [6] extract high-level and unique behavior from thedisassembled binary to detect malware and its variants Thedetection is based on predefined templates covering potentiallymalicious behaviors such as mass-mailing and unpackingWillems et al proposed CWSandbox a dynamic analysissystem that monitors malware API calls in a virtual en-vironment [72] Rieck et al [73] leveraged API calls asfeatures to classify malware into families using Support VectorMachines (SVM) and processed system calls into q-gramsrepresentations using an algorithm similar to the k-nearestneighbors (kNN) [14] Mohaisen et al [74] introduced AMALa framework to dynamically analyze and classify malwareusing SVM linear regression classification trees and kNNKolosnjaji et al [75] proposed maximum-a-posteriori (MAP)to classify malware families using Cuckoorsquos Sandbox [76]Although these techniques were successfully applied for mal-ware classification they did not consider benign samples [77]and were limited to labeling unknown malware pertaining toone of the existing clusters

Wressnegger et al [78] proposed Gordon for detectingFlash-based malware Gordon considered execution behaviorfrom benign and malicious Flash applications and generatedn-grams for an SVM model Bayer et al used a modifiedversion of QEMU to monitor API calls and breakpoints [10]This approach was later used to build Anubis [79] Anubisis better suited for offline malware analysis (by providing adetailed execution report) PROPEDEUTICA on the other handfocuses on real-time detection and monitors a more diverseset of system calls Kirat et al introduced BareBox whichhooked system calls directly from SSDT [80] Barebox runsin bare metal and can potentially obtain behavioral traces frommalware equipped with anti-analysis techniques Barebox aims

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

6

Fig 4 Malware families in our dataset classified by AVClass

which could adversely affect the training procedure becausethe classifier would be prone to learn from the majority classHence we under-sampled the dataset by reducing the numberof sliding windows from benign processes to the same asmalware processes so that the ML and DL detectors can learnfrom the same number of positive and negative samples

Window size and stride are two important hyper-parametersin sliding window methods indicating the total and the shiftingnumber of n-gram system calls in each process In real-timedetection large window sizes need more time to fill up thesliding window provide more context for DL models andtherefore achieve higher accuracy We aim to choose the propertwo hyper-parameters to ensure malware are detected accu-rately and in time (before completing its malicious behavior)In our experiments we observed that malware can successfullyinfect the systems in one to two seconds and 1000 systemcalls can be invoked in one second Therefore we selected andcompared three window size and stride pairs (100 50) (200100) (500 250) The results showed that with pair (500 250)DEEPMALWARE needs 1 to 2 seconds to fill up the windowand analyze With pair (200 100) DEEPMALWARE needs 05to 1 second While with pair (100 50) DEEPMALWARE needsonly 01 to 05 seconds The F1-scores for DEEPMALWAREare 9702 9508 and 9497 respectively

Although the detection performance is increased with theincrease of window sizes large window sizes indicate longerwait time for the detector to allow the software to be executedand fill up the windows During this long wait time themalware are more likely to conduct malicious behaviorsTherefore a proper window size needs to be carefully selectedso as to balance the prediction performance and the waittime Since the performance gain with respect to the increasedwindow size is marginal (around 2 increase in terms of F1score from 500 to 100) while the waiting time can be reducedto 10-20 we set the window size and stride as 100 and50 respectively for the following evaluation In practice thevalue of window sizes can be altered in our PROPEDEUTICAframework to meet the required detection performance (ega required accuracy or an acceptable FP rate)

C Performance Analysis of standalone ML and DL classifiers

We started by comparing the performance of DEEPMAL-WARE with relevant ML and DL classifiers Our metrics formodel performance were accuracy precision recall F1 scoreand false positive (FP) rate

Table I shows the detection performance and executiontime of using a single ML or DL model DEEPMALWAREachieves the best performance among all the models Withwindow size 500 and stride 250 DEEPMALWARE achievedan accuracy of 9703 and a false positive rate of 243 Theperformance of DEEPMALWARE increased with the increase inwindow size and stride Random Forest performs the secondbest with an accuracy of 8905 at window size 100 andstride 50 and an accuracy of 9394 at window size 500and stride 250 approximately 10 better than AdaBoostand 5 better than XGBoost Random Forest also achievesthe fastest speed among all the machine learning methodsWhile DEEPMALWARE model is 309 more accurate thanRandom Forest its classification time is approximately 100times slower on CPU and 3 to 5 times slower on GPU

We did not apply GPU devices to ML models because theirclassification time with conventional CPUs is much smaller(at least one order of magnitude) than those measured for DLalgorithms We denote lsquoNArsquo as not applicable in Table I Inreal life GPU devices especially specific DL GPUs for accel-erating DL trainingtesting are still not sufficiently widespreadin end-user or corporate devices In addition the preprocessingtime of Reconstruction Module is around 1098 seconds perrun or 00003 seconds per window (when using 100 windowsize) which is almost negligible compared with ML (00088seconds per window) or DL execution time (00383 secondsper window using GPU or 14104 seconds per window usingCPU)

We observed that malware could successfully infect thetarget system in one to two seconds and 1000 system callscan be invoked in one second Moreover traces from oneprocess may take a significant amount of time to fill upone sliding window because the process might be invokingsystem calls at a low rate Thus in the remainder of thissection we chose to compare all the algorithms using windowsizes of 100 and strides of 50 system calls For all windowsizes and strides the best ML model was Random ForestTherefore for PROPEDEUTICA we chose Random Forest forML classification and DEEPMALWARE for DL classification

In addition we evaluated PROPEDEUTICArsquos performanceon seven APTs with various borderline intervals PROPEDEU-TICA successfully detected all the APTs and six of themwere subjected to DEEPMALWARE classification For differentborderline intervals the false positive rate was approximately10 Random Forest in isolation detected one APT with a525 false positive rate

D Performance Analysis of PROPEDEUTICA combining MLand DL classifiers

Next we evaluated PROPEDEUTICArsquos performance for dif-ferent borderline intervals Based on the results from Table Iwe chose Random Forest as ML classifier and DEEPMAL-WARE as DL classifier with window size 100 and stride50 PROPEDEUTICA is evaluated based on various borderlineintervals As discussed before in PROPEDEUTICA if a pieceof software receives a malware classification probability fromRandom Forest smaller than the lower bound it is considered

7

TABLE I Detection performance of standalone ML or DL classifiers DEEPMALWARE achieves the best performance amongall the models Random Forest is approximately 8 more accurate and much faster than the other machine learning models

Models Window Size Stride Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)

AdaBoost

100 50

7925 7811 8131 7967 2280 NA 00187Random Forest 8905 8463 9544 8971 1735 NA 00089XGBoost 8478 9099 7722 8354 766 NA 00116DEEPMALWARE 9484 9263 9743 9497 776 00383 14104AdaBoost

200 100

7827 7284 9019 8059 3364 NA 00132Random Forest 9349 8999 9790 9377 1091 NA 00063XGBoost 7081 6365 9708 7689 7081 NA 00073DEEPMALWARE 9496 9305 9720 9508 727 00543 13784AdaBoost

500 250

8181 7944 8586 8252 2224 NA 00108Random Forest 9394 9716 9054 9373 265 NA 00048XGBoost 7919 9414 6227 7495 388 NA 00062DEEPMALWARE 9703 9754 9650 9702 243 00797 13344

TABLE II Comparison on different borderline policies for the PROPEDEUTICA combining ML and DL classifiers Borderlineintervals are described with a lower bound and an upper bound The move percentage represents the percentage of software inthe system that received a borderline classification with Random Forest (according to the borderline interval) and was subjectedto further analysis with DEEPMALWARE

LowerBound

UpperBound

Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)Move

Percentage

10 90 9471 9241 9743 9485 800 00223 0381 559520 80 9461 9223 9743 9476 821 00173 01543 483230 70 9434 9177 9743 9451 875 00146 00884 414540 60 9372 9079 9737 9396 992 00123 00497 3496

benign software If the probability is greater than the upperbound it is considered malicious However if the classificationfalls within the borderline interval the software is subjectedto DEEPMALWARE Table II shows the performance of PRO-PEDEUTICA using various (configurable) borderline intervalswhen combining ML and DL methods We chose lower boundas 10 20 30 and 40 upper bound as 60 70 80and 90 We observe that when a small borderline interval isconfigured (eg 30-70 40-60) the detection time isreduced to less than 01 seconds even using CPU with mini-mal degradation of prediction performance With a borderlineinterval of [30-70] PROPEDEUTICA achieved an accuracyof 9434 and a false positive rate of 875 with 4145of the samples moved for DEEPMALWARE analysis Thishighlights the potential of PROPEDEUTICA approximately60 of the samples were quickly classified with high accuracyas malware or benign software using an initial triage withfaster Random Forest Only 40 of the samples needed to besubjected to a more expensive analysis using DEEPMALWAREPROPEDEUTICA achieves high prediction performance whilemaking it feasible to use expensive DL algorithms in real-timemalware detection

E Comparison with Other DL Malware Detectors

We compared PROPEDEUTICA with other DL-based mal-ware detectors in the literature leveraging systemAPI calls asfeatures The features used in our work and [23] are Windowssystem calls The features used in [21] and [22] are WindowsAPI calls System calls provide more insights on the softwareat the kernel level compared with API calls collected at theuser level Moreover we evaluated the detection performance

on a much larger dataset with more traces which providesmore accurate evaluation results

In Table III we summarize the obtained results Comparedwith the existing state-of-the-art DL-based malware detectorsour approaches achieve much better detection performance interms of accuracy precision and recall More importantlycurrent literature mainly focuses on offline analysis [21]ndash[23] and lacks analysis of the detection time in real-timeWe evaluated DEEPMALWARE on a much larger dataset withbetter prediction performance and detection time guarantee

V DISCUSSION

We proposed a real-time malware detection frameworkPROPEDEUTICA accomplishing the fast speed of ML andthe high accuracy of emerging DL models Only softwarereceiving borderline classification from an ML detector neededfurther analysis by DEEPMALWARE which saved computa-tional resources shortened the detection time and improvedaccuracy The key contribution of PROPEDEUTICA is the intel-ligent combination of ML with emerging modern DL methodsin an effective real-time detector which can meet the needsof high accuracy low false-positive rate and short detectiontime required to counter the next generation of malware In thissection we discuss several challenges in the current design andpotential directions of improving PROPEDEUTICA in futurework

A Recurrent Loop in PROPEDEUTICA

PROPEDEUTICA can run into a worst-case scenario whichis having a process continuously loop between Random Forest

8

TABLE III Comparison among DEEPMALWARE and other DL-based malware detection in the literature

Work Model Dataset Performance

DMINrsquo16 [21] DL4MD 45000 traces Accuracy 9565 Precision 9546 Recall 958KDDrsquo17 [22] DNN 26078 traces Accuracy 9399 3

LNCSrsquo16 [23] LSTM 4753 traces Accuracy 894 Precision 856 Recall 894PROPEDEUTICA DEEPMALWARE 493095 traces Accuracy 970 Precision 975 Recall 970

and DEEPMALWARE For example consider a process re-ceiving a borderline classification from Random Forest andthen being moved to DEEPMALWARE Then DEEPMALWAREclassifies it as benign and the software would continue beinganalyzed by Random Forest which again provides a borderlineclassification for the process and so on We plan to mitigatesuch recurrent processes by combining the previous predictionof DEEPMALWARE with the prediction of Random Forest inthe first stage

B Adversarial Attacks against Malware Detection

Despite initial successes on malware classification a re-sourceful and motivated adversary can bypass the protectionmechanisms using evasion attacks A plethora of recent studieshave demonstrated that ML and DL models are vulnerable toevasion attacks such as adversarial examples [24] [25] [52][53] By adding a small perturbation to the input features ofmachine learning models adversaries can mislead the detec-tion system to classify a malware as a benign software [54]ndash[61] However most adversarial attacks target at static malwaredetection malware is not executed and machine learningconducts detection on the static code For example Park etalinjected dummy code into malware source code using anobfuscation technique [57] Liu etal manipulated the malwareimages generated from malware binaries using adversarialattacks [56] Only [62] generated adversarial sequences toattack dynamic behavior-based malware detection systemsHowever their victim detection model is much simpler thanour proposed model In our work we propose a frameworkfor dynamic malware detection where malware is investigatedduring execution which makes the adversarial attacks mucheasier to detect since it is challenging to generate a sequenceof systemAPI calls with normal behaviors In the future weplan to thoroughly investigate the robustness of PROPEDEU-TICA against adversarial attacks

C Weakly Supervised Anomaly Detection

In our work we assume that the malware samples arewell annotated and ML and DL models are trained on thesesamples ie fully supervised anomaly detection However themalware samples have been generated and evolved rapidly inrecent years It becomes impractical to analyze and identifyall the malware samples for model training Therefore aweakly-supervised learning paradigm [63] is urgently neededfor malware detection where only a limited number of labeledpositive samples are provided and a large number of positivesamples are not labeled Although unsupervised learning ap-proaches leveraging Autoencoders and generative adversarial

3The performance in terms of precision and recall is not provided in [22]

networks (GANs) can be used in weakly supervised learningby learning features that are robust to small deviations innegative samples [64]ndash[69] the labeled positive samples arenot fully utilized Recently Pang et al proposed DevNet andPRO to learn anomaly scores using deep learning models forbetter use of labeled positive data [70] [71] The proposedframework in PROPEDEUTICA can be further extended withthe capabilities of learning from weakly labeled samples usingthese advanced weakly-supervised approaches

VI RELATED WORK

Our work pertains to fields related to behavior-based mal-ware detection In this section we summarize the state-of-the-art in these areas and highlights topics currently under-studied

A Behavior-based Malware Detection

Dynamic behavior-based malware detection [5] [15]evolved from Forrest et alrsquos seminal work [34] on detectinganomalies using system calls as features Christodorescu etal [5] [6] extract high-level and unique behavior from thedisassembled binary to detect malware and its variants Thedetection is based on predefined templates covering potentiallymalicious behaviors such as mass-mailing and unpackingWillems et al proposed CWSandbox a dynamic analysissystem that monitors malware API calls in a virtual en-vironment [72] Rieck et al [73] leveraged API calls asfeatures to classify malware into families using Support VectorMachines (SVM) and processed system calls into q-gramsrepresentations using an algorithm similar to the k-nearestneighbors (kNN) [14] Mohaisen et al [74] introduced AMALa framework to dynamically analyze and classify malwareusing SVM linear regression classification trees and kNNKolosnjaji et al [75] proposed maximum-a-posteriori (MAP)to classify malware families using Cuckoorsquos Sandbox [76]Although these techniques were successfully applied for mal-ware classification they did not consider benign samples [77]and were limited to labeling unknown malware pertaining toone of the existing clusters

Wressnegger et al [78] proposed Gordon for detectingFlash-based malware Gordon considered execution behaviorfrom benign and malicious Flash applications and generatedn-grams for an SVM model Bayer et al used a modifiedversion of QEMU to monitor API calls and breakpoints [10]This approach was later used to build Anubis [79] Anubisis better suited for offline malware analysis (by providing adetailed execution report) PROPEDEUTICA on the other handfocuses on real-time detection and monitors a more diverseset of system calls Kirat et al introduced BareBox whichhooked system calls directly from SSDT [80] Barebox runsin bare metal and can potentially obtain behavioral traces frommalware equipped with anti-analysis techniques Barebox aims

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

7

TABLE I Detection performance of standalone ML or DL classifiers DEEPMALWARE achieves the best performance amongall the models Random Forest is approximately 8 more accurate and much faster than the other machine learning models

Models Window Size Stride Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)

AdaBoost

100 50

7925 7811 8131 7967 2280 NA 00187Random Forest 8905 8463 9544 8971 1735 NA 00089XGBoost 8478 9099 7722 8354 766 NA 00116DEEPMALWARE 9484 9263 9743 9497 776 00383 14104AdaBoost

200 100

7827 7284 9019 8059 3364 NA 00132Random Forest 9349 8999 9790 9377 1091 NA 00063XGBoost 7081 6365 9708 7689 7081 NA 00073DEEPMALWARE 9496 9305 9720 9508 727 00543 13784AdaBoost

500 250

8181 7944 8586 8252 2224 NA 00108Random Forest 9394 9716 9054 9373 265 NA 00048XGBoost 7919 9414 6227 7495 388 NA 00062DEEPMALWARE 9703 9754 9650 9702 243 00797 13344

TABLE II Comparison on different borderline policies for the PROPEDEUTICA combining ML and DL classifiers Borderlineintervals are described with a lower bound and an upper bound The move percentage represents the percentage of software inthe system that received a borderline classification with Random Forest (according to the borderline interval) and was subjectedto further analysis with DEEPMALWARE

LowerBound

UpperBound

Accuracy Precision Recall F1 Score FP RateDetection Time

with GPU (s)Detection Time

with CPU (s)Move

Percentage

10 90 9471 9241 9743 9485 800 00223 0381 559520 80 9461 9223 9743 9476 821 00173 01543 483230 70 9434 9177 9743 9451 875 00146 00884 414540 60 9372 9079 9737 9396 992 00123 00497 3496

benign software If the probability is greater than the upperbound it is considered malicious However if the classificationfalls within the borderline interval the software is subjectedto DEEPMALWARE Table II shows the performance of PRO-PEDEUTICA using various (configurable) borderline intervalswhen combining ML and DL methods We chose lower boundas 10 20 30 and 40 upper bound as 60 70 80and 90 We observe that when a small borderline interval isconfigured (eg 30-70 40-60) the detection time isreduced to less than 01 seconds even using CPU with mini-mal degradation of prediction performance With a borderlineinterval of [30-70] PROPEDEUTICA achieved an accuracyof 9434 and a false positive rate of 875 with 4145of the samples moved for DEEPMALWARE analysis Thishighlights the potential of PROPEDEUTICA approximately60 of the samples were quickly classified with high accuracyas malware or benign software using an initial triage withfaster Random Forest Only 40 of the samples needed to besubjected to a more expensive analysis using DEEPMALWAREPROPEDEUTICA achieves high prediction performance whilemaking it feasible to use expensive DL algorithms in real-timemalware detection

E Comparison with Other DL Malware Detectors

We compared PROPEDEUTICA with other DL-based mal-ware detectors in the literature leveraging systemAPI calls asfeatures The features used in our work and [23] are Windowssystem calls The features used in [21] and [22] are WindowsAPI calls System calls provide more insights on the softwareat the kernel level compared with API calls collected at theuser level Moreover we evaluated the detection performance

on a much larger dataset with more traces which providesmore accurate evaluation results

In Table III we summarize the obtained results Comparedwith the existing state-of-the-art DL-based malware detectorsour approaches achieve much better detection performance interms of accuracy precision and recall More importantlycurrent literature mainly focuses on offline analysis [21]ndash[23] and lacks analysis of the detection time in real-timeWe evaluated DEEPMALWARE on a much larger dataset withbetter prediction performance and detection time guarantee

V DISCUSSION

We proposed a real-time malware detection frameworkPROPEDEUTICA accomplishing the fast speed of ML andthe high accuracy of emerging DL models Only softwarereceiving borderline classification from an ML detector neededfurther analysis by DEEPMALWARE which saved computa-tional resources shortened the detection time and improvedaccuracy The key contribution of PROPEDEUTICA is the intel-ligent combination of ML with emerging modern DL methodsin an effective real-time detector which can meet the needsof high accuracy low false-positive rate and short detectiontime required to counter the next generation of malware In thissection we discuss several challenges in the current design andpotential directions of improving PROPEDEUTICA in futurework

A Recurrent Loop in PROPEDEUTICA

PROPEDEUTICA can run into a worst-case scenario whichis having a process continuously loop between Random Forest

8

TABLE III Comparison among DEEPMALWARE and other DL-based malware detection in the literature

Work Model Dataset Performance

DMINrsquo16 [21] DL4MD 45000 traces Accuracy 9565 Precision 9546 Recall 958KDDrsquo17 [22] DNN 26078 traces Accuracy 9399 3

LNCSrsquo16 [23] LSTM 4753 traces Accuracy 894 Precision 856 Recall 894PROPEDEUTICA DEEPMALWARE 493095 traces Accuracy 970 Precision 975 Recall 970

and DEEPMALWARE For example consider a process re-ceiving a borderline classification from Random Forest andthen being moved to DEEPMALWARE Then DEEPMALWAREclassifies it as benign and the software would continue beinganalyzed by Random Forest which again provides a borderlineclassification for the process and so on We plan to mitigatesuch recurrent processes by combining the previous predictionof DEEPMALWARE with the prediction of Random Forest inthe first stage

B Adversarial Attacks against Malware Detection

Despite initial successes on malware classification a re-sourceful and motivated adversary can bypass the protectionmechanisms using evasion attacks A plethora of recent studieshave demonstrated that ML and DL models are vulnerable toevasion attacks such as adversarial examples [24] [25] [52][53] By adding a small perturbation to the input features ofmachine learning models adversaries can mislead the detec-tion system to classify a malware as a benign software [54]ndash[61] However most adversarial attacks target at static malwaredetection malware is not executed and machine learningconducts detection on the static code For example Park etalinjected dummy code into malware source code using anobfuscation technique [57] Liu etal manipulated the malwareimages generated from malware binaries using adversarialattacks [56] Only [62] generated adversarial sequences toattack dynamic behavior-based malware detection systemsHowever their victim detection model is much simpler thanour proposed model In our work we propose a frameworkfor dynamic malware detection where malware is investigatedduring execution which makes the adversarial attacks mucheasier to detect since it is challenging to generate a sequenceof systemAPI calls with normal behaviors In the future weplan to thoroughly investigate the robustness of PROPEDEU-TICA against adversarial attacks

C Weakly Supervised Anomaly Detection

In our work we assume that the malware samples arewell annotated and ML and DL models are trained on thesesamples ie fully supervised anomaly detection However themalware samples have been generated and evolved rapidly inrecent years It becomes impractical to analyze and identifyall the malware samples for model training Therefore aweakly-supervised learning paradigm [63] is urgently neededfor malware detection where only a limited number of labeledpositive samples are provided and a large number of positivesamples are not labeled Although unsupervised learning ap-proaches leveraging Autoencoders and generative adversarial

3The performance in terms of precision and recall is not provided in [22]

networks (GANs) can be used in weakly supervised learningby learning features that are robust to small deviations innegative samples [64]ndash[69] the labeled positive samples arenot fully utilized Recently Pang et al proposed DevNet andPRO to learn anomaly scores using deep learning models forbetter use of labeled positive data [70] [71] The proposedframework in PROPEDEUTICA can be further extended withthe capabilities of learning from weakly labeled samples usingthese advanced weakly-supervised approaches

VI RELATED WORK

Our work pertains to fields related to behavior-based mal-ware detection In this section we summarize the state-of-the-art in these areas and highlights topics currently under-studied

A Behavior-based Malware Detection

Dynamic behavior-based malware detection [5] [15]evolved from Forrest et alrsquos seminal work [34] on detectinganomalies using system calls as features Christodorescu etal [5] [6] extract high-level and unique behavior from thedisassembled binary to detect malware and its variants Thedetection is based on predefined templates covering potentiallymalicious behaviors such as mass-mailing and unpackingWillems et al proposed CWSandbox a dynamic analysissystem that monitors malware API calls in a virtual en-vironment [72] Rieck et al [73] leveraged API calls asfeatures to classify malware into families using Support VectorMachines (SVM) and processed system calls into q-gramsrepresentations using an algorithm similar to the k-nearestneighbors (kNN) [14] Mohaisen et al [74] introduced AMALa framework to dynamically analyze and classify malwareusing SVM linear regression classification trees and kNNKolosnjaji et al [75] proposed maximum-a-posteriori (MAP)to classify malware families using Cuckoorsquos Sandbox [76]Although these techniques were successfully applied for mal-ware classification they did not consider benign samples [77]and were limited to labeling unknown malware pertaining toone of the existing clusters

Wressnegger et al [78] proposed Gordon for detectingFlash-based malware Gordon considered execution behaviorfrom benign and malicious Flash applications and generatedn-grams for an SVM model Bayer et al used a modifiedversion of QEMU to monitor API calls and breakpoints [10]This approach was later used to build Anubis [79] Anubisis better suited for offline malware analysis (by providing adetailed execution report) PROPEDEUTICA on the other handfocuses on real-time detection and monitors a more diverseset of system calls Kirat et al introduced BareBox whichhooked system calls directly from SSDT [80] Barebox runsin bare metal and can potentially obtain behavioral traces frommalware equipped with anti-analysis techniques Barebox aims

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

8

TABLE III Comparison among DEEPMALWARE and other DL-based malware detection in the literature

Work Model Dataset Performance

DMINrsquo16 [21] DL4MD 45000 traces Accuracy 9565 Precision 9546 Recall 958KDDrsquo17 [22] DNN 26078 traces Accuracy 9399 3

LNCSrsquo16 [23] LSTM 4753 traces Accuracy 894 Precision 856 Recall 894PROPEDEUTICA DEEPMALWARE 493095 traces Accuracy 970 Precision 975 Recall 970

and DEEPMALWARE For example consider a process re-ceiving a borderline classification from Random Forest andthen being moved to DEEPMALWARE Then DEEPMALWAREclassifies it as benign and the software would continue beinganalyzed by Random Forest which again provides a borderlineclassification for the process and so on We plan to mitigatesuch recurrent processes by combining the previous predictionof DEEPMALWARE with the prediction of Random Forest inthe first stage

B Adversarial Attacks against Malware Detection

Despite initial successes on malware classification a re-sourceful and motivated adversary can bypass the protectionmechanisms using evasion attacks A plethora of recent studieshave demonstrated that ML and DL models are vulnerable toevasion attacks such as adversarial examples [24] [25] [52][53] By adding a small perturbation to the input features ofmachine learning models adversaries can mislead the detec-tion system to classify a malware as a benign software [54]ndash[61] However most adversarial attacks target at static malwaredetection malware is not executed and machine learningconducts detection on the static code For example Park etalinjected dummy code into malware source code using anobfuscation technique [57] Liu etal manipulated the malwareimages generated from malware binaries using adversarialattacks [56] Only [62] generated adversarial sequences toattack dynamic behavior-based malware detection systemsHowever their victim detection model is much simpler thanour proposed model In our work we propose a frameworkfor dynamic malware detection where malware is investigatedduring execution which makes the adversarial attacks mucheasier to detect since it is challenging to generate a sequenceof systemAPI calls with normal behaviors In the future weplan to thoroughly investigate the robustness of PROPEDEU-TICA against adversarial attacks

C Weakly Supervised Anomaly Detection

In our work we assume that the malware samples arewell annotated and ML and DL models are trained on thesesamples ie fully supervised anomaly detection However themalware samples have been generated and evolved rapidly inrecent years It becomes impractical to analyze and identifyall the malware samples for model training Therefore aweakly-supervised learning paradigm [63] is urgently neededfor malware detection where only a limited number of labeledpositive samples are provided and a large number of positivesamples are not labeled Although unsupervised learning ap-proaches leveraging Autoencoders and generative adversarial

3The performance in terms of precision and recall is not provided in [22]

networks (GANs) can be used in weakly supervised learningby learning features that are robust to small deviations innegative samples [64]ndash[69] the labeled positive samples arenot fully utilized Recently Pang et al proposed DevNet andPRO to learn anomaly scores using deep learning models forbetter use of labeled positive data [70] [71] The proposedframework in PROPEDEUTICA can be further extended withthe capabilities of learning from weakly labeled samples usingthese advanced weakly-supervised approaches

VI RELATED WORK

Our work pertains to fields related to behavior-based mal-ware detection In this section we summarize the state-of-the-art in these areas and highlights topics currently under-studied

A Behavior-based Malware Detection

Dynamic behavior-based malware detection [5] [15]evolved from Forrest et alrsquos seminal work [34] on detectinganomalies using system calls as features Christodorescu etal [5] [6] extract high-level and unique behavior from thedisassembled binary to detect malware and its variants Thedetection is based on predefined templates covering potentiallymalicious behaviors such as mass-mailing and unpackingWillems et al proposed CWSandbox a dynamic analysissystem that monitors malware API calls in a virtual en-vironment [72] Rieck et al [73] leveraged API calls asfeatures to classify malware into families using Support VectorMachines (SVM) and processed system calls into q-gramsrepresentations using an algorithm similar to the k-nearestneighbors (kNN) [14] Mohaisen et al [74] introduced AMALa framework to dynamically analyze and classify malwareusing SVM linear regression classification trees and kNNKolosnjaji et al [75] proposed maximum-a-posteriori (MAP)to classify malware families using Cuckoorsquos Sandbox [76]Although these techniques were successfully applied for mal-ware classification they did not consider benign samples [77]and were limited to labeling unknown malware pertaining toone of the existing clusters

Wressnegger et al [78] proposed Gordon for detectingFlash-based malware Gordon considered execution behaviorfrom benign and malicious Flash applications and generatedn-grams for an SVM model Bayer et al used a modifiedversion of QEMU to monitor API calls and breakpoints [10]This approach was later used to build Anubis [79] Anubisis better suited for offline malware analysis (by providing adetailed execution report) PROPEDEUTICA on the other handfocuses on real-time detection and monitors a more diverseset of system calls Kirat et al introduced BareBox whichhooked system calls directly from SSDT [80] Barebox runsin bare metal and can potentially obtain behavioral traces frommalware equipped with anti-analysis techniques Barebox aims

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

9

in live system restoring and analyzed on only 42 malwaresamples PROPEDEUTICA also uses system call hooking tomonitor software behavior but is larger scale and uses atestbed to run malware automatically Anubis [79] worksbest for offline malware analyzers by providing a detailedexecution report while PROPEDEUTICA delivers on-thy-flydetection results to end users PROPEDEUTICA improves theexperiment of Anubis by monitoring more kinds of systemcalls running a longer time for each experiment and usingcutting-edge DL models to help with the detection

Some lines of work focus on developing tools to detectevasive malware [81] PROPEDEUTICA is resilient to anti-analysis because its monitoring mechanism operates at thekernel level Other works using tainting techniques egVMScope [82] TTAnalyze [10] and Panorama [36] emu-late environments with QEMU and trace the data-flow fromwhole-system processes PROPEDEUTICA differs from suchworks by monitoring software behavior through system-widesystem call hooking instead of data tainting while maintainingthe interactions among different processes in a lightweightmanner Contrary to PROPEDEUTICA such tools might en-counter challenges when deployed in practice For examplePanorama [36] relies on a human to manually label address andcontrol dependency tags that should be propagated Canali etal evaluated different types of models for malware detectionTheir findings confirm that the accuracy of some widelyused learning models is poor independently of the values oftheir parameters and high-level atoms with arguments modelperforms the best which corroborates our experimental resultsFurther the paper points out that on-the-fly detection suffersfrom even higher false-positive rates because of the diversityof applications and the system calls invoked [15] All thesefindings corroborate the results of our paper

B ML-based Malware Detection

Xie et al proposed a one-class SVM model with differ-ent kernel functions [83] to classify anomalous system callsin the ADFA-LD dataset [84] Ye et al proposed a semi-parametric classification model for combining file content andfile relation information to improve the performance of filesample classification [85] Abed et al used bags of systemcalls to detect malicious applications in Linux containers [86]Kumar et al used K-means clustering [87] to differentiatelegitimate and malicious behaviors based on the NSL-KDDdataset Fan et al used a sequence mining algorithm todiscover malicious sequential patterns and trained an All-Nearest-Neighbor (ANN) classifier based on these discoveredpatterns for malware detection [88]

The Random Forest algorithm has been applied to clas-sification problems as diverse as offensive tweets malwaredetection de-anonymization suspicious Web pages and un-solicited email messages [89] [90] Conventional ML-basedmalware detectors suffer however from high false-positiverates because of the diverse nature of system calls invokedby applications as well as the diversity of applications [15]

C DL-based Malware Detection

There are recent efforts to apply DL for malware detectionDeep learning has made great successes in speech recogni-

tion [91] language translation [92] speech synthesis [93]and other sequential data [94] Li et al proposed a hybridmalicious code classification model based on AutoEncoderand DBN and acquired a relatively high classification rate onthe part of the now outdated KDD99 dataset [95] Pascanuet al first applied deep neural networks (recurrent neuralnetworks and echo state networks) to model the sequentialbehaviors of malware They collected API system call se-quences and C run-time library calls and detected malwareas a binary classification problem [96] David et al [97]used a deep belief network with denoising autoencoders toautomatically generate malware signatures for classificationSaxe and Berlin [98] proposed a DL-based malware detectiontechnique with two-dimensional binary program features andprovided a Bayesian model to calibrate detection Huang andStokes [99] designed a neural network to extract high-levelfeatures and classify them into both benignmalicious andalso 100 malware families based on shared features Dahlet al detected malware files by neural networks and usedrandom projections to reduce the dimension of sparse inputfeatures [100] Wang [101] extracted features from AndroidManifest and API functions to be the input of the deep learningclassifier Hou et al collected the system calls from the kerneland then constructed the weighted directed graphs and useDL framework to make dimension reduction [102] All theseDL-based methods rely on handcrafted features RecentlyKolosnjaji et al [23] proposed a DL method to detect andpredict malware families based on system call sequences Ourproposed DEEPMALWARE has shown superior performanceover their approaches by coping the spatial-temporal featureswith ResNext designs To cope with the fast-evolving nature ofmalware Transcend [103] addresses concept drift in malwareclassification Transcend can detect aging machine learningmodels before their degradation As future work we plan toleverage Transcendrsquos concept drift model in PROPEDEUTICAfor re-training

Our work addressed the performance of DL-based malwaredetection algorithms running in a real system Compared withthe recent work ([21]ndash[23]) our experiments show that deeplearning performs fewer false-positive samples compared withconventional machine learning models Meanwhile conven-tional machine learning algorithms run about 2 to 3 orders ofmagnitude faster than deep learning ones

VII CONCLUSION

In this paper we introduced PROPEDEUTICA a novelparadigm and framework for real-time malware detectionwhich combines the best of conventional machine learningand deep learning techniques using multi-stage classificationIn PROPEDEUTICA all software in the system start executionsubjected to a conventional machine learning detector for fastclassification If a piece of software receives a borderlineclassification it is subjected to further analysis via more per-formance expensive and more accurate deep learning methodsvia our novel algorithm DEEPMALWARE DEEPMALWAREutilizes both process-wide and system-wide system calls andpredicts malware in both short and long-term ways With aborderline interval of [30-70] PROPEDEUTICA achieved

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

10

an accuracy of 9434 and a false positive rate of 875 with4145 of the samples moved for DEEPMALWARE analysisThe detection time using GPU is 00146 seconds on averageEven using CPU the detection time is less than 01 seconds

On one hand our work provided evidence that conventionalmachine learning and emerging deep learning methods in iso-lation might be inadequate to provide both high accuracy andshort detection time required for real-time malware detectionOn the other hand our proposed PROPEDEUTICA combinesthe best of ML and DL methods and has the potential tochange the design of the next generation of practical real-timemalware detectors

ACKNOWLEDGMENT

This material is based upon work supported by the NationalScience Foundation under Grant No 1801599 This material isbased upon work supported by (while serving at) the NationalScience Foundation

REFERENCES

[1] A Calleja J Tapiador and J Caballero ldquoA look into 30 years ofmalware development from a software metrics perspectiverdquo in Inter-national Symposium on Research in Attacks Intrusions and DefensesSpringer 2016 pp 325ndash345

[2] Y Fratantonio A Bianchi W Robertson E Kirda C Kruegel andG Vigna ldquoTriggerscope Towards detecting logic bombs in androidapplicationsrdquo in Security and Privacy (SP) 2016 IEEE Symposiumon IEEE 2016 pp 377ndash396

[3] G Vigna and R A Kemmerer ldquoNetSTAT A Network-Based IntrusionDetection Approachrdquo 1998

[4] BROMIUM INC (2010) Bromium end point protection [Online]Available httpswwwbromiumcom

[5] M Christodorescu S Jha S A Seshia D Song and R E BryantldquoSemantics-aware malware detectionrdquo in Proceedings of the 2005 IEEESymposium on Security and Privacy ser SP rsquo05 2005

[6] M Christodorescu S Jha and C Kruegel ldquoMining specifications ofmalicious behaviorrdquo in Proceedings of the 6th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering ser ESEC-FSE rsquo07 2007 pp 5ndash14

[7] J Kinder S Katzenbeisser C Schallhart and H Veith ldquoDetectingmalicious code by model checkingrdquo Detection of Intrusions andMalware and Vulnerability Assessment vol 3548 pp 174ndash187 2005

[8] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhou andX Wang ldquoEffective and efficient malware detection at the end hostrdquo inProceedings of the 18th Conference on USENIX Security Symposiumser SSYMrsquo09 2009 pp 351ndash366

[9] D Canali A Lanzi D Balzarotti C Kruegel M Christodorescuand E Kirda ldquoA quantitative study of accuracy in system call-based malware detectionrdquo in Proceedings of the 2012 InternationalSymposium on Software Testing and Analysis ser ISSTA 2012 2012pp 122ndash132

[10] U Bayer C Kruegel and E Kirda ldquoTtanalyze A tool for analyzingmalwarerdquo in 15th European Institute for Computer Antivirus Research(EICAR) 2006

[11] S A Hofmeyr S Forrest and A Somayaji ldquoIntrusion detection usingsequences of system callsrdquo Journal of computer security vol 6 no 3pp 151ndash180 1998

[12] U Bayer P M Comparetti C Hlauschek C Kruegel and E KirdaldquoScalable behavior-based malware clusteringrdquo in NDSS vol 9 2009pp 8ndash11

[13] S Revathi and A Malathi ldquoA detailed analysis on nsl-kdd datasetusing various machine learning techniques for intrusion detectionrdquoInternational Journal of Engineering Research and Technology ESRSAPublications 2013

[14] K Rieck P Trinius C Willems and T Holz ldquoAutomatic analysisof malware behavior using machine learningrdquo Journal of ComputerSecurity vol 19 no 4 pp 639ndash668 2011

[15] A Lanzi D Balzarotti C Kruegel M Christodorescu and E KirdaldquoAccessminer using system-centric models for malware protectionrdquo inProceedings of the 17th ACM conference on Computer and communi-cations security 2010 pp 399ndash412

[16] Palo Alto Networks (2013 March) The modern malware review[Online] Available httpsmediapaloaltonetworkscomdocumentsThe-Modern-Malware-Review-March-2013pdf

[17] G Widmer and M Kubat ldquoLearning in the presence of concept driftand hidden contextsrdquo Machine learning vol 23 no 1 pp 69ndash1011996

[18] L Breiman ldquoRandom forestsrdquo Machine learning vol 45 no 1 pp5ndash32 2001

[19] Y Bengio Y LeCun et al ldquoScaling learning algorithms towards airdquoLarge-scale kernel machines vol 34 no 5 pp 1ndash41 2007

[20] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[21] W Hardy L Chen S Hou Y Ye and X Li ldquoDl4md A deep learningframework for intelligent malware detectionrdquo in Proceedings of theInternational Conference on Data Mining (DMIN) The SteeringCommittee of The World Congress in Computer Science ComputerEngineering and Applied Computing (WorldComp) 2016 p 61

[22] Q Wang W Guo K Zhang A G Ororbia II X Xing X Liuand C L Giles ldquoAdversary resistant deep neural networks with anapplication to malware detectionrdquo in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining ACM 2017 pp 1145ndash1153

[23] B Kolosnjaji A Zarras G Webster and C Eckert ldquoDeep learning forclassification of malware system call sequencesrdquo in Australasian JointConference on Artificial Intelligence Springer 2016 pp 137ndash149

[24] X Yuan P He Q Zhu and X Li ldquoAdversarial examples Attacks anddefenses for deep learningrdquo IEEE transactions on neural networks andlearning systems vol 30 no 9 pp 2805ndash2824 2019

[25] J Zhang and C Li ldquoAdversarial examples Opportunities and chal-lengesrdquo IEEE transactions on neural networks and learning systemsvol 31 no 7 pp 2578ndash2593 2019

[26] I Arghire ldquoWindows 7 most hit by wannacry ransomwarerdquo httpwwwsecurityweekcomwindows-7-most-hit-wannacry-ransomware2017

[27] M Russinovich ldquoProcess monitor v340rdquo httpstechnetmicrosoftcomen-ussysinternalsbb896645aspx 2017

[28] drmemory ldquodrstracerdquo httpdrmemoryorgstrace for windowshtml2017

[29] A B Insung Park ldquoCore os toolsrdquo httpsmsdnmicrosoftcomen-usmagazineee412263aspx 2009

[30] Microsoft ldquoWindbg loggerrdquo httpsdocsmicrosoftcomen-uswindows-hardwaredriversdebuggerlogger-and-logviewer 2017

[31] M Jurczyk (2017) Ntapi [Online] Available httpj00ruvexilliumorgsyscallsnt32

[32] anonymity (2018) Propedeutica driver publicly available atlthttpsgithubcomgracesrmwindows-system-call-hookgt

[33] A A Telyatnikov (2016) DbgPrint Logger [Online] Availablehttpsalterorguasoftwindbgdump

[34] S Forrest S A Hofmeyr A Somayaji and T A Longstaff ldquoA senseof self for Unix processesrdquo in IEEE Security and Privacy 1996 pp120ndash128

[35] C D Manning and H Schutze Foundations of statistical naturallanguage processing MIT press 1999

[36] H Yin D Song M Egele C Kruegel and E Kirda ldquoPanoramacapturing system-wide information flow for malware detection andanalysisrdquo in ACM conference on computer and communications se-curity 2007 pp 116ndash127

[37] B Caillat B Gilbert R Kemmerer C Kruegel and G VignaldquoPrison Tracking process interactions to contain malwarerdquo in IEE 17thHigh Performance Computing and Communications (HPCC) IEEE 7thInternational Symposium on Cyberspace Safety and Security (CSS)and IEEE 12th International Conference on Embedded Software andSystems (ICESS) IEEE 2015 pp 1282ndash1291

[38] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquo Neuralcomputation vol 9 no 8 pp 1735ndash1780 1997

[39] S Xie R Girshick P Dollar Z Tu and K He ldquoAggregated residualtransformations for deep neural networksrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2017 pp1492ndash1500

[40] N Srivastava G Hinton A Krizhevsky I Sutskever and R Salakhut-dinov ldquoDropout a simple way to prevent neural networks fromoverfittingrdquo The journal of machine learning research vol 15 no 1pp 1929ndash1958 2014

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

11

[41] S Ioffe and C Szegedy ldquoBatch normalization Accelerating deepnetwork training by reducing internal covariate shiftrdquo arXiv preprintarXiv150203167 2015

[42] T Chen and C Guestrin ldquoXgboost A scalable tree boosting systemrdquoin Proceedings of the 22nd acm sigkdd international conference onknowledge discovery and data mining ACM 2016 pp 785ndash794

[43] Y Freund and R E Schapire ldquoA decision-theoretic generalizationof on-line learning and an application to boostingrdquo in Europeanconference on computational learning theory Springer 1995 pp23ndash37

[44] R Caruana N Karampatziakis and A Yessenalina ldquoAn empiricalevaluation of supervised learning in high dimensionsrdquo in Proceedingsof the 25th international conference on Machine learning ACM2008 pp 96ndash103

[45] J C Platt ldquoProbabilistic outputs for support vector machines andcomparisons to regularized likelihood methodsrdquo in ADVANCES INLARGE MARGIN CLASSIFIERS MIT Press 1999 pp 61ndash74

[46] C Holmes and N Adams ldquoA probabilistic nearest neighbour methodfor statistical pattern recognitionrdquo Journal of the Royal StatisticalSociety Series B (Statistical Methodology) vol 64 no 2 pp 295ndash3062002

[47] PyTorch (2018) [Online] Available httppytorchorg[48] Rekings (2018) Security through insecurity [Online] Available

httpsrekingsorg[49] M Sebastian R Rivera P Kotzias and J Caballero ldquoAvclass A

tool for massive malware labelingrdquo in International Symposium onResearch in Attacks Intrusions and Defenses Springer 2016 pp230ndash253

[50] Winautomation (2017) Powerful desktop automation software[Online] Available httpwwwwinautomationcom

[51] M B Christopher PATTERN RECOGNITION AND MACHINELEARNING Springer-Verlag New York 2016

[52] N Carlini and D Wagner ldquoTowards evaluating the robustness of neuralnetworksrdquo in Security and Privacy (SP) IEEE 2017 pp 39ndash57

[53] K Grosse N Papernot P Manoharan M Backes and P McDanielldquoAdversarial perturbations against deep neural networks for malwareclassificationrdquo arXiv preprint arXiv160604435 2016

[54] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoDeceiving end-to-end deep learning malware detectors usingadversarial examplesrdquo arXiv preprint arXiv180204528 2018

[55] W Hu and Y Tan ldquoBlack-box attacks against rnn based malware detec-tion algorithmsrdquo in Workshops at the Thirty-Second AAAI Conferenceon Artificial Intelligence 2018

[56] X Liu J Zhang Y Lin and H Li ldquoAtmpa Attacking machinelearning-based malware visualization detection methods via adversarialexamplesrdquo in 2019 IEEEACM 27th International Symposium onQuality of Service (IWQoS) IEEE 2019 pp 1ndash10

[57] D Park H Khan and B Yener ldquoGeneration amp evaluation of adver-sarial examples for malware obfuscationrdquo in 2019 18th IEEE Interna-tional Conference On Machine Learning And Applications (ICMLA)IEEE 2019 pp 1283ndash1290

[58] B Chen Z Ren C Yu I Hussain and J Liu ldquoAdversarial examplesfor cnn-based malware detectorsrdquo IEEE Access vol 7 pp 54 360ndash54 371 2019

[59] O Suciu S E Coull and J Johns ldquoExploring adversarial examplesin malware detectionrdquo in 2019 IEEE Security and Privacy Workshops(SPW) IEEE 2019 pp 8ndash14

[60] M Ebrahimi N Zhang J Hu M T Raza and H Chen ldquoBinaryblack-box evasion attacks against deep learning-based static malwaredetectors with adversarial byte-level language modelrdquo arXiv preprintarXiv201207994 2020

[61] J Yuan S Zhou L Lin F Wang and J Cui ldquoBlack-box adversarialattacks against deep learning based malware binaries detection withganrdquo in ECAI 2020 IOS Press 2020 pp 2536ndash2542

[62] F Kreuk A Barak S Aviv-Reuven M Baruch B Pinkas andJ Keshet ldquoAdversarial examples on discrete sequences for beatingwhole-binary malware detectionrdquo arXiv preprint arXiv1802045282018

[63] Z-H Zhou ldquoA brief introduction to weakly supervised learningrdquoNational science review vol 5 no 1 pp 44ndash53 2018

[64] P Vincent H Larochelle I Lajoie Y Bengio and P-A ManzagolldquoStacked denoising autoencoders Learning useful representations in adeep network with a local denoising criterionrdquo Journal of MachineLearning Research vol 11 no Dec pp 3371ndash3408 2010

[65] K Ding J Li R Bhanushali and H Liu ldquoDeep anomaly detection onattributed networksrdquo in Proceedings of the 2019 SIAM InternationalConference on Data Mining SIAM 2019 pp 594ndash602

[66] T Schlegl P Seebock S M Waldstein U Schmidt-Erfurth andG Langs ldquoUnsupervised anomaly detection with generative adversarialnetworks to guide marker discoveryrdquo in International conference oninformation processing in medical imaging Springer 2017 pp 146ndash157

[67] S Akcay A Atapour-Abarghouei and T P Breckon ldquoGanomalySemi-supervised anomaly detection via adversarial trainingrdquo in Asianconference on computer vision Springer 2018 pp 622ndash637

[68] H Zenati M Romain C-S Foo B Lecouat and V ChandrasekharldquoAdversarially learned anomaly detectionrdquo in 2018 IEEE Internationalconference on data mining (ICDM) IEEE 2018 pp 727ndash736

[69] Y Zhou X Song Y Zhang F Liu C Zhu and L Liu ldquoFeatureencoding with autoencoders for weakly supervised anomaly detectionrdquoIEEE Transactions on Neural Networks and Learning Systems pp 1ndash12 2021

[70] G Pang C Shen H Jin and A v d Hengel ldquoDeep weakly-supervisedanomaly detectionrdquo arXiv preprint arXiv191013601 2019

[71] G Pang C Shen and A van den Hengel ldquoDeep anomaly detectionwith deviation networksrdquo in Proceedings of the 25th ACM SIGKDDinternational conference on knowledge discovery amp data mining 2019pp 353ndash362

[72] C Willems T Holz and F Freiling ldquoToward automated dynamicmalware analysis using cwsandboxrdquo IEEE Security amp Privacy vol 5no 2 2007

[73] K Rieck T Holz C Willems P Dussel and P Laskov ldquoLearningand classification of malware behaviorrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability AssessmentSpringer 2008 pp 108ndash125

[74] A Mohaisen O Alrawi and M Mohaisen ldquoAmal High-fidelitybehavior-based automated malware analysis and classificationrdquo Com-puters amp Security vol 52 pp 251 ndash 266 2015

[75] B Kolosnjaji A Zarras T Lengyel G Webster and C EckertldquoAdaptive semantics-aware malware classificationrdquo in Detection ofIntrusions and Malware and Vulnerability Assessment Springer2016 pp 419ndash439

[76] C Guarnieri ldquoCuckoo sandboxrdquo httpswwwcuckoosandboxorg2017

[77] M Bailey J Oberheide J Andersen Z M Mao F Jahanian andJ Nazario ldquoAutomated classification and analysis of internet malwarerdquoin International Conference on Recent Advances in Intrusion Detection(RAID) Berlin Heidelberg Springer-Verlag 2007 pp 178ndash197

[78] C Wressnegger F Yamaguchi D Arp and K Rieck ldquoComprehen-sive analysis and detection of flash-based malwarerdquo in Detection ofIntrusions and Malware and Vulnerability Assessment DIMVA 2016pp 101ndash121

[79] Anubis ldquoAnalyzing unknown binariesrdquo httpanubisiseclaborg 2010[80] D Kirat G Vigna and C Kruegel ldquoBarebox efficient malware

analysis on bare-metalrdquo in Proceedings of the 27th Annual ComputerSecurity Applications Conference ACM 2011 pp 403ndash412

[81] D Balzarotti M Cova C Karlberger E Kirda C Kruegel andG Vigna ldquoEfficient detection of split personalities in malwarerdquo inNetwork and Distributed System Security Symposium 2010

[82] N M Johnson J Caballero K Z Chen S McCamant P PoosankamD Reynaud and D Song ldquoDifferential slicing Identifying causalexecution differences for security applicationsrdquo in Security and Privacy(SP) IEEE 2011 pp 347ndash362

[83] M Xie J Hu and J Slay ldquoEvaluating host-based anomaly detectionsystems Application of the one-class svm algorithm to adfa-ldrdquo inFuzzy Systems and Knowledge Discovery (FSKD) IEEE 2014 pp978ndash982

[84] G Creech and J Hu ldquoGeneration of a new ids test dataset Time toretire the kdd collectionrdquo in 2013 IEEE Wireless Communications andNetworking Conference (WCNC) IEEE 2013 pp 4487ndash4492

[85] Y Ye T Li S Zhu W Zhuang E Tas U Gupta and M Abdulhayo-glu ldquoCombining file content and file relations for cloud based malwaredetectionrdquo in Proceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining ACM 2011pp 222ndash230

[86] A S Abed T C Clancy and D S Levy ldquoApplying bag of system callsfor anomalous behavior detection of applications in linux containersrdquoin 2015 IEEE Globecom Workshops IEEE 2015 pp 1ndash5

[87] V Kumar H Chauhan and D Panwar ldquoK-means clustering approachto analyze nsl-kdd intrusion detection datasetrdquo International Journalof Soft 2013

[88] Y Fan Y Ye and L Chen ldquoMalicious sequential pattern miningfor automatic malware detectionrdquo Expert Systems with Applicationsvol 52 pp 16ndash25 2016

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References

12

[89] D Chatzakou N Kourtellis J Blackburn E De Cristofaro G Stringh-ini and A Vakali ldquoMean birds Detecting aggression and bullying ontwitterrdquo 2017

[90] E Mariconti L Onwuzurike P Andriotis E D Cristofaro G Rossand G Stringhini ldquoMamadroid Detecting android malware by buildingmarkov chains of behavioral modelsrdquo in Network and DistributedSystem Security Symposium 2017

[91] D Amodei S Ananthanarayanan R Anubhai J Bai E BattenbergC Case J Casper B Catanzaro Q Cheng G Chen et al ldquoDeepspeech 2 End-to-end speech recognition in english and mandarinrdquo inInternational Conference on Machine Learning 2016 pp 173ndash182

[92] I Sutskever O Vinyals and Q V Le ldquoSequence to sequence learningwith neural networksrdquo in Advances in neural information processingsystems 2014 pp 3104ndash3112

[93] S Dieleman H Zen K Simonyan O Vinyals A Graves N Kalch-brenner A Senior K Kavukcuoglu et al ldquoWavenet A generativemodel for raw audiordquo arXiv preprint arXiv160903499 2016

[94] J Schmidhuber ldquoDeep learning in neural networks An overviewrdquoNeural Networks vol 61 pp 85ndash117 2015

[95] Y Li R Ma and R Jiao ldquoA hybrid malicious code detection methodbased on deep learningrdquo methods vol 9 no 5 2015

[96] R Pascanu J W Stokes H Sanossian M Marinescu and A ThomasldquoMalware classification with recurrent networksrdquo in 2015 IEEE In-ternational Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE 2015 pp 1916ndash1920

[97] O E David and N S Netanyahu ldquoDeepsign Deep learning forautomatic malware signature generation and classificationrdquo in NeuralNetworks (IJCNN) 2015 pp 1ndash8

[98] J Saxe and K Berlin ldquoDeep neural network based malware detectionusing two dimensional binary program featuresrdquo in 2015 10th Interna-tional Conference on Malicious and Unwanted Software (MALWARE)IEEE 2015 pp 11ndash20

[99] W Huang and J W Stokes ldquoMtnet a multi-task neural networkfor dynamic malware classificationrdquo in Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016 pp 399ndash418

[100] G E Dahl J W Stokes L Deng and D Yu ldquoLarge-scale malwareclassification using random projections and neural networksrdquo in 2013IEEE International Conference on Acoustics Speech and Signal Pro-cessing IEEE 2013 pp 3422ndash3426

[101] Z Wang J Cai S Cheng and W Li ldquoDroiddeeplearner Identifyingandroid malware using deep learningrdquo in Sarnoff Symposium 2016IEEE 37th IEEE 2016 pp 160ndash165

[102] S Hou A Saas L Chen and Y Ye ldquoDeep4maldroid A deep learningframework for android malware detection based on linux kernel systemcall graphsrdquo in Web Intelligence Workshops (WIW) IEEEWICACMInternational Conference on IEEE 2016 pp 104ndash111

[103] R Jordaney K Sharad S K Dash Z Wang D Papini I Nouretdinovand L Cavallaro ldquoTranscend Detecting concept drift in malwareclassification modelsrdquo 2017

  • I Introduction
  • II Methodology
    • II-A Threat Model
    • II-B Overall Design
      • III Implementation Details
        • III-A The System Call Interception Driver
        • III-B The Reconstruction Module
        • III-C DeepMalware Classifier
        • III-D Conventional ML Classifier
          • IV Evaluation
            • IV-A Software Collection Method
            • IV-B System Call Dataset
            • IV-C black Performance Analysis of standalone ML and DL classifiers
            • IV-D Performance Analysis of Propedeutica combining ML and DL classifiers
            • IV-E Comparison with Other DL Malware Detectors
              • V Discussion
                • V-A Recurrent Loop in Propedeutica
                • V-B Adversarial Attacks against Malware Detection
                • V-C Weakly Supervised Anomaly Detection
                  • VI Related Work
                    • VI-A Behavior-based Malware Detection
                    • VI-B ML-based Malware Detection
                    • VI-C DL-based Malware Detection
                      • VII Conclusion
                      • References