A complete formalized knowledge representation model for advanced digital forensics timeline...

A Complete Formalized Knowledge Representation Model for Advanced DigitalForensics Timeline Analysis

Yoan Chabota,b, Aurelie Bertauxa, Christophe Nicollea, Tahar Kechadib

aCheckSem Team, Laboratoire Le2i, UMR CNRS 6306, Faculte des sciences Mirande, Universite de Bourgogne, BP47870, 21078 Dijon, FrancebSchool of Computer Science & Informatics, University College Dublin, Belfield, Dublin 4, Ireland

Abstract

Having a clear view of events that occurred over time is a difficult objective to achieve in digital investigations (DI). Event re-construction, which allows investigators to understand the timeline of a crime, is one of the most important step of a DI process.This complex task requires exploration of a large amount of events due to the pervasiveness of new technologies nowadays. Anyevidence produced at the end of the investigative process must also meet the requirements of the courts, such as reproducibility,verifiability, validation, etc. For this purpose, we propose a new methodology, supported by theoretical concepts, that can assistinvestigators through the whole process including the construction and the interpretation of the events describing the case. Theproposed approach is based on a model which integrates knowledge of experts from the fields of digital forensics and softwaredevelopment to allow a semantically rich representation of events related to the incident. The main purpose of this model is toallow the analysis of these events in an automatic and efficient way. This paper describes the approach and then focuses on the mainconceptual and formal aspects: a formal incident modelization and operators for timeline reconstruction and analysis.

Keywords: Digital Forensics, Timeline Analysis, Event Reconstruction, Knowledge Management, Ontology

1. Introduction

Due to the rapid evolution of digital technologies and theirpervasiveness in everyday life, the digital forensics field is fac-ing challenges that were anecdotal a few years ago. Existingdigital forensics toolkits, such as EnCase or FTK, simplify andfacilitate the work during an investigation. However, the scopeof these tools is limited to collection and examination of evi-dence (i.e. studying its properties), which are the first two stepsof the investigation process, as defined in (Palmer, 2001). Toextract acceptable evidence, it is also necessary to deduce newknowledge such as the causes of the current state of the evi-dence (Carrier & Spafford, 2004b). The field of event recon-struction aims at solving this issue: event reconstruction canbe seen as a process of taking as input a set of events and out-putting a timeline of the events describing the case. Several ap-proaches have been proposed to carry out event reconstruction,which try to extract events and then represent them in a singletimeline (super-timeline (Gudhjonsson, 2010)). This timelineallows to have a global overview of the events occurring before,during and after a given incident. However, due to the numberof events which can be very large, the produced timeline may bequite complicated to analyse. This makes the interpretation ofthe timeline and therefore the decision making very difficult. Inaddition, event reconstruction is a complex process where eachconclusion must be supported by evidence rigorously collected,giving it full credibility.In this paper, we first address these problems by proposing an

Email address: [email protected] (Yoan Chabot)

approach to reconstruct scenarios from suspect data and analysethem using semantic tools and knowledge from experts. Sec-ondly, this paper answers the challenge of correctness of thewhole investigative process with a formal incident modeliza-tion and timeline reconstruction and analysis operators. Thepaper is organised as follows. Section 2 reviews important is-sues of the events reconstruction problem and the various ap-proaches proposed so far. The SADFC (Semantic Analysis ofDigital Forensic Cases) approach is described in Section 3, andthe formal advanced timeline reconstruction and analysis modelis presented in Section 4. Finally, a case study illustrating thekey characteristics of the proposed approach is given in Sec-tion 5.

2. Related Works

Events reconstruction has many issues, which are directly re-lated to the size of the data, digital investigation process com-plexity, and IT infrastructures challenges. For instance, Table 1compares some existing approaches (their strengths (3), limi-tations (6), partial or inadequate solutions (l)) with regard tosome key issues, such as heterogeneity, automatic knowledgeextraction, the use of proven theory as support, analysis capa-bilities, and preservation of data integrity. While some of thesechallenges have been a focus for many researchers and devel-opers for the last decade, the size of data volumes (Richard III& Roussev, 2006) and data heterogeneity are still very chal-lenging. The first (large data sizes) introduces many challengesat every phase of the investigation process; from the data col-lection to the interpretation of the results. The second (data

Preprint submitted to DFWRS May 23, 2014

Approach / Criterion Auto Extraction Heterogeneity Analysis Theory Data integrityECF (Chen et al., 2003) 3 3 6 6 6

FORE (Schatz et al., 2004b) 3 3 l 6 6

Finite state machine(Gladyshev & Patel, 2004) 6 l l 3 6

Zeitline (Buchholz & Falk, 2005) l 3 6 6 3

Neural networks(Khan & Wakeman, 2006) l l 6 l 6

CyberForensic TimeLab(Olsson & Boldt, 2009) 3 3 6 6 6

log2timeline (Gudhjonsson, 2010) 3 3 6 6 6

Timeline reconstruction (Hargreaves & Patterson, 2012) 3 3 l l 6

Table 1: Evaluation of approaches

heterogeneity) is usually due to multiple footprint sources suchas log files, information contained in file systems, etc. We canclassify events heterogeneity into three categories:

• Format: The information encoding is not the same amongsources due to the formatting. Therefore, depending onthe source, footprint data may be different.

• Temporal: The use of different sources from different ma-chines may cause timing problems. First, there are someissues due to the use of different time zones and unsyn-chronised clocks. Second, the temporal heterogeneity canbe due to the use of different formats or granularities (e.g.2 seconds in FAT file systems, 100 nanoseconds in NTFSfile systems).

• Semantic: The same event can be interpreted or repre-sented in different ways. For example, an event describingthe visit of a webpage may appear in different ways in webbrowser logs and server logs.

In order to gather all the events found in footprint sources in asingle timeline, a good handling of all these forms of hetero-geneity is required. This leads to the development of an auto-mated information processing approach which is able to extractknowledge from these heterogeneous sources. In addition, onceextracted, this knowledge should be federated within the samemodel so as to facilitate their interpretation and future analy-sis. The effectiveness of a such approach can be assessed by thefollowing criteria:

• Efficient automated tools that can extract events and builda timeline (Criterion 1 in Table 1).

• The ability to process multiple and various footprintsources and to federate the information collected in a co-herent and structured way (Criterion 2 in Table 1).

• The ability to assist users during the timeline analysis.This latter encompasses many aspects such as makingthe timeline easier to read, identifying correlations be-tween events or producing conclusions from knowledgecontained in the timeline (Criterion 3 in Table 1).

For the majority of existing approaches, solutions are pro-vided to automatically extract events and construct the timeline.(Chen et al., 2003) introduced a set of automated extractors to

collect events and store them in a canonical database, which al-lows to quickly generate a temporal ordered sequence of events.These automatic extractors, a widely used concept, can alsogenerate the timeline (Olsson & Boldt, 2009) (Gudhjonsson,2010) (Hargreaves & Patterson, 2012). However, current toolsextract data in its raw form without a good understanding of themeaning of footprints, which makes their analysis more diffi-cult. In order to deal with the semantic heterogeneity, the FORE(Forensics of Rich Events) system stores the events in an ontol-ogy (Schatz et al., 2004a,b). This ontology uses the notions ofentity and event to represent the state change of an object overtime. Nevertheless, the time model implemented in this ontol-ogy is not accurate enough (use of instant rather than interval)to represent events accurately. In addition, the semantic cover-age of this ontology can be improved. Another storage structureis also proposed in (Buchholz & Falk, 2005). They proposed ascalable structure, a variant of a balanced binary search tree.However, one of the problems of this approach is that one hasto build the case by manually selecting relevant events.Some automatic events extraction approaches are also able toprocess heterogeneous sources. This usually consists of cre-ating a set of extractors dedicated to each type of footprintsources. In (Gudhjonsson, 2010), the authors highlight severallimitations of the current event reconstruction systems such asthe use of a small number of event sources, which makes thetimeline vulnerable to anti-forensics techniques (e.g., alterationof timestamps). They proposed to use a large number of eventsources to ensure that the high quality of the timeline and theimpact of anti-forensics techniques is minimised. log2timelineuses various sources including Windows logs, history of var-ious web browsers, log files of other software resources, etc.Multiple heterogeneous sources require a consistent represen-tation of the events in order to process them. The majority ofapproaches propose their own model for event representation.The most commonly used event features include the date andtime of the event (an instant or an interval when the durationof events is included) and information about the nature of theevent.Regarding the timeline analysis, few solutions have been pro-posed. The FORE approach attempts to identify event corre-lations by connecting events with causal or effect links. In(Gladyshev & Patel, 2004), the authors try to perform the eventreconstruction by representing the system behaviour as a finitestate machine. The event reconstruction process can be seen as

2

https://www.researchgate.net/publication/221410301_Design_and_Implementation_of_Zeitline_a_Forensic_Timeline_Editor?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/27478337_ECF_-_Event_correlation_for_forensics?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/222412000_Abstract_Finite_State_Machine_Approach_to_Digital_Event_Reconstruction?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/257687900_An_automated_timeline_reconstruction_approach_for_digital_forensic_investigations?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/222520797_Computer_forensic_timeline_visualization_tool?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/27481088_Rich_Event_Representation_for_Computer_Forensics?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

a search for sequences of transitions that satisfy the constraintsimposed by the evidence. Then, some scenarios are removedusing the evidence collected (thus, there is no automation ofthe extraction process in this approach). In (Hargreaves & Pat-terson, 2012), they proposed a pattern-based process to pro-duce high-level events (”human-understandable” events) froma timeline containing low-level events (events directly extractedfrom the sources). Although the proposed approach is relevant,it can handle only one of the many aspects of the analysis byhelping the investigator to read the timeline much easier. There-fore, other aspects such as causality analysis between events arenot covered in this approach.All approaches have to satisfy some key requirements such ascredibility, integrity, and reproducibility of the digital evidence(Baryamureeba & Tushabe, 2004). In recent years, the protag-onists of digital forensics moved away from investigative tech-niques that are based on the investigators experience, to tech-niques based on proven theories. It is also necessary to provideclear explanation about the evidence found. In addition, onehas to ensure that the tools used do not modify the data col-lected on crime scenes. Thus, it is necessary to develop toolsthat extract evidence, while preserving the integrity of the data.Finally, a formal and standard definition of the reconstructionprocess is needed to ensure the reproducibility of the investi-gation process and credibility of the results. A clearly-definedinvestigation model allows to explain the process used to getthe results. To this end we believe that the following criteriaare crucial for such techniques: the use of a theoretical modelto support the proposed approach (criterion 4 in Table 1) andthe ability to maintain the data integrity (criterion 5 in Table 1).As a prelude to his work, Gladyshev (Gladyshev & Patel, 2004)argued that a formalisation of the event reconstruction prob-lem is needed to simplify the automation of the process andto ensure the completeness of the reconstruction. In (Khan &Wakeman, 2006), the authors proposed an event reconstructionsystem based on neural networks. The use of a machine learn-ing technique appears to be a suitable solution because it ispossible to know the assumptions and reasoning used to ob-tain final results. However, neural networks behaviour is notentirely clear (especially during the training step). Thus, theapproach adopted by Khan does not seem to fulfil the goal ofmaking reasoning explicit. In (Hargreaves & Patterson, 2012),the system keeps information about the analysis which leads toinfer each high-level event. This gives the opportunity to pro-vide further information when needed. As for the preservationof the data integrity, the approach described in (Buchholz &Falk, 2005) uses a set of restrictions to prevent the alteration ofthe evidence. Regarding the process model, numerous inves-tigation models have been proposed, however, none of them isdesigned for automated or semi-automated investigation. Exist-ing models (DFWRS model (Palmer, 2001), End to End DigitalInvestigation (Stephenson, 2003), Event-Based Digital Foren-sics Investigation Framework (Carrier & Spafford, 2004a), En-hanced Digital Investigation Process Model (Baryamureeba &Tushabe, 2004), Extended Model of Cybecrime Investigation(Ciardhuain, 2004), Framework for a Digital Forensics Investi-gation (Kohn et al., 2006)) are designed to guide humans inves-

tigators by providing a list of tasks to perform. Thus, the pro-posed models are not accurate enough to provide a frameworkfor the development of automated investigation tools. Amongthe limitations of the existing frameworks, the characterizationof the data flow through the model is absent and the meaningof the steps is not clear. The creation of a well-defined and ex-plicit framework is needed to allow an easy translation of theinvestigation process into algorithms.We can see in Table 1 that existing approaches are not able tofulfil all the criteria. Two limitations of the existing approachesare the lack of automation of the timeline analysis and the ab-sence of theoretical foundations to explain the conclusions pro-duced by the tools. The assistance provided to the user shouldnot be limited only to the construction of the timeline. It is nec-essary to deal with a large amount of events (criterion 3), theheterogeneity of event sources, the need to federate events in asuitable model (see criterion 1 and criterion 2), data integrity,and the whole investigative process correctness and validation.The purpose of this study is to propose an approach that satis-fies all the criteria presented in Table 1 while producing all thenecessary evidence in an efficient way.

3. SADFC Approach

To reach these objectives, we propose the following ap-proach:

• Identify and model the knowledge related to an incidentand the knowledge related to the investigation processused to determine the circumstances of the incident:

– Introduce a formalization of the event reconstructionproblem by formally defining entities involved in anincident.

– Define a knowledge representation model based onthe previous definitions used to store knowledgeabout an incident and knowledge used to solve thecase (to give credibility to the results and to ensurethe reproducibility of the process).

• Provide extraction methods of the knowledge contained inheterogeneous sources to populate the knowledge model.

• Provide tools to assist investigators in the analysis of theknowledge extracted from the incident.

Unlike conventional approaches, the SADFC approach usestechniques from knowledge management, semantic web anddata mining at various phases of the investigation process. Is-sues related to knowledge management and ontology are twocentral elements of this approach as they deal with a largepart of the mentioned challenges. According to (Gruber et al.,1993), ”an ontology is an explicit, formal specification of ashared conceptualization”. Thus, ontology allows to repre-sent knowledge generated during an investigation (knowledgeabout footprints, events, objects etc.). Ontology provides sev-eral advantages such as the possibility to use automatic pro-cesses to reason on knowledge thanks to its formal and explicit

3

https://www.researchgate.net/publication/228870524_The_enhanced_digital_investigation_process_model?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/220542517_An_Extended_Model_of_Cybercrime_Investigations?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/222412000_Abstract_Finite_State_Machine_Approach_to_Digital_Event_Reconstruction?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/257687900_An_automated_timeline_reconstruction_approach_for_digital_forensic_investigations?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/220803284_Framework_for_a_Digital_Forensic_Investigation?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

nature, the availability of rich semantics to represent knowledge(richer than databases due to its sophisticated semantic concepts(Martinez-Cruz et al., 2012)) and the possibility of building acommon vision of a topic that can be shared by investigatorsand software developers. Ontology has already proved its rele-vance in computer forensics (Schatz et al., 2004a,b). The use ofontology is also motivated by its successful use in other fieldssuch as biology (Schulze-Kremer, 1998) and life-cycle manage-ment (Vanlande et al., 2008).SADFC is a synergy of the three elements presented below,which, once assembled, constitute a coherent package describ-ing methods, processes and technological solutions needed forevents reconstruction:

• Knowledge Model for Advanced Digital Forensics Time-line Analysis which is presented in the following part ofthe paper.

• Investigation Process Model: The aim of this model is todefine the various phases of the event reconstruction pro-cess: their types, the order between them and the data flowthrough the whole process.

• Ontology-centred architecture: This architecture consistsof several modules which implement some of the keyfunctions such as footprint extraction, knowledge man-agement, ontology and the visualisation of the final time-line. This architecture is based on an ontology which im-plements the knowledge model proposed in our approach.As the proposed ontologies in (Schatz et al., 2004a,b) arenot advanced enough, we have designed and developed anew ontology. For instance, some relations included in ourknowledge model are absent in the ontology of the FOREsystem.

In the following sections, we present a formalized knowledgemodel for advanced forensics timeline analysis, while the inves-tigation process model and the architecture will be discussed indetails in other documents.

4. Advanced Timeline Analysis Model

This section describes the knowledge model used in theSADFC approach to allow to perform an in-depth analysis oftimeline while fulfilling the law requirements. Models pro-posed in the literature (Schatz et al., 2004a,b) are still limited interms of the amount of knowledge they can store about eventsand therefore, analysis capabilities. Temporal characteristicspredominate over other aspects such as interaction of eventswith objects, processes or people. Knowledge representing therelations between events and others entities is not sufficientlydiversified for the subsequent analysis required. Thus, we pro-pose a rich knowledge representation containing a large set ofentities and relations and allowing to build automated analy-sis processes. In addition, the proposed model is designed tomeet the legal requirements and contains knowledge allowingto reproduce the investigative process and to give full credibil-ity to the results. It should be noted that the assumption that the

data has been processed upstream to ensure accuracy and cor-rectness is used. A digital forensic investigation process modelincluding processes for consistency check of data and filteringwill be proposed in future works.We will first formally define the entities and relations compos-ing the model and then, introduce a set of operators to manip-ulate the knowledge. An overview of the proposed knowledgemodel is given in Figure 1.

Figure 1: Knowledge model

4.1. Formal modelling of incident

We first define entities of our knowledge model and then wedetail the four composed relations using this knowledge.

4.1.1. Subject, Object, Event and FootprintA crime scene is a space where a set of events E =

{e1, e2, ..., ei} takes place. An event is a single action occurringat a given time and lasting a certain duration. An event maybe the drafting of a document, the reading of a webpage or aconversation via instant messaging software.

Subject. During its life cycle, an event involves subjects. Let Sbe the set containing subjects covering human actors and pro-cesses (e.g. Firefox web browser, Windows operating system,etc.), a subject x ∈ S corresponds to an entity involved in oneor more events e ∈ E and is defined by x = {a ∈ As | x αs a}where:

• As is a set containing all the attributes which can be usedto describe a subject. A subject attribute may be the firstname and the last name of a person, the identifier of a websession, the name of a Windows session, etc.

• αs is the relation used to link a subject with the attributesof As describing it.

Object. During its life cycle, an event can also interact withobjects. An object may be a webpage, a file or a registry keyfor example. An object x ∈ O is defined by x = {a ∈ Ao | x αo a}where:

• Ao is a set containing all the attributes which can be used todescribe an object. An object attribute may be for examplea filename, the location of an object on a hard disk, etc.

• αo is the relation used to link an object with the attributesof Ao describing it.

4

https://www.researchgate.net/publication/251332115_Ontologies_versus_relational_databases_Are_they_so_different_A_comparison?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==




• O ⊆ ℘(Ao) is the set of the objects, meaning that o ∈ Obelongs to the power set of Ao. An object is a compositionof one or several attributes of Ao.

Note, that for easier human understanding, an object can also beseen as a composition of attributes and objects (because objectsare sets of attributes), e.g. a registry key is an object composedof several attributes such as its value and its key name. Thisregistry key is also an attribute of the object representing thedatabase containing all keys of the system.

Event. Each event takes place in a time-interval defined by astart time and an end time. These boundaries define the life cy-cle of the event. The use of a time interval allows to representthe notion of uncertainty (Liebig et al., 1999). For example,when the start time of an event cannot be determined accurately,the use of a time-interval allows to approximate it. The useof intervals requires the introduction of a specific algebra (e.g.to order events). In our work, the Allen algebra, illustrated inthe Table 2, is used (Allen, 1983). The authors of this paperare aware of the problems caused by temporal heterogeneityand anti-forensics techniques on the quality and the accuracy oftimestamps (granularity, timestamps offset and alteration, time-zone, etc). Several works try to characterize the phenomena re-lated to the use of time information in digital forensic investiga-tions and to propose techniques to solve these problems (Schatzet al., 2006), (Gladyshev & Patel, 2005), (Forte, 2004). In thispaper, we assume that all timestamps used in our model are ad-justed and normalized beforehand by a process that will be thesubject of future work (the proposed model is generic enoughto incorporate future solutions). We consider that each functionreturns the value 1 if events meet the constraints and 0 other-wise.

Functions Constraints Example

before(X,Y) xtend < ytstart

X

Y

equal(X,Y) xtstart = ytstart && xtend = ytend

Y

X

meets(X,Y) xtend = ytstart

X

Y

overlaps(X,Y) xtstart < ytstart && xtend > ytstart

Y

X

during(X,Y) xtstart > ytstart && xtend < ytend

X

Y

starts(X,Y) xtstart = ytstart

Y

X

finishes(X,Y) xtend = ytend

X

Y

Table 2: Allen algebra

An event e ∈ E is defined by e = {tstart, tend, l, S e,Oe, Ee}

where:

• tstart is the start time of the event, tend is the end-time of theevent and l is the location where the event took place. Thislocation may be a machine (represented by an IP) and itsgeolocalisation.

• S e is a set containing all subjects involved in the event.S e = {s ∈ S | e ∈ E, s σs e} where σs is a composedrelation used to link an event e ∈ E with a subject s ∈ S .The relation σs is defined below.

• Oe is a set containing all objects related to the event e.Oe = {o ∈ O | e ∈ E, e σo o} where σo is a composedrelation used to link an event e ∈ E with an object o ∈ O.The relation σo is defined below.

• Ee is the set containing all events with which the event iscorrelated. Ee = {x ∈ E | e ∈ E, e σe x} where σe isa composed relation used to link an event e ∈ E with anevent x ∈ E. The relation σe is defined below.

Footprint. According to (Ribaux, 2013), a footprint is the signof a past activity and a piece of information allowing to re-construct past events. A footprint may be a log entry or aweb history for example as a log entry gives information aboutsoftware activities and web histories provide information aboutuser’s behaviour on the Web. Let F be a set containing allfootprints related to a case, a footprint x ∈ F is defined byx = {a ∈ A f | x α f a} where:

• A f is a set containing all the attributes which can be usedto describe a footprint. For example, a bookmark containsattributes such as the title of the bookmark, the date ofcreation and the webpage pointed by the bookmark.

• α f is the relation used to link a footprint with the attributesof A f used to described it.

Footprints are the only available information to define pastevents and can be used by investigators to reconstruct the eventswhich happened during an incident. However, the imperfectand incomplete nature of the footprints can lead to produce ap-proximate results. It is therefore not always possible to deter-mine which event is associated with a given footprint. In addi-tion, it is not always possible to fully reconstruct an event froma footprint. Thus, a footprint can be used to identify one orseveral features:

• The temporal features of an event. For example, a foot-print extracted from the table moz formhistory (databaseFormHistory of the web browser Firefox) can be used toestablish the time at which a form field has been filled.

• A relation between an event and an object. For exam-ple, a footprint extracted from the table moz historyvisits(database Places of the web browser Firefox) can be usedto link an event representing the visit of a webpage to thiswebpage.

• A relation between an event and a subject. For example, allfootprints produced by the web browser Firefox are storedin a folder named with the profile name of the user. Thisallows to link each Firefox event to the user designated bythis name.

5

https://www.researchgate.net/publication/48314005_Maintaining_Knowledge_About_Temporal_Intervals?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/3818155_Event_composition_in_time-dependent_distributed_systems?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/257546843_The_Art_of_log_correlation_Tools_and_Techniques_for_Correlating_Events_and_Log_Files?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

https://www.researchgate.net/publication/220542508_Formalising_Event_Time_Bounding_in_Digital_Investigations?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

• The features of an object. For example, a footprint ex-tracted from the table moz places (database Places) can beused to determine the URL and the title of a webpage.

• The features of a subject. For example, a footprint ex-tracted from the table moz historyvisits (database Places)can be used to determine the session identifier of a user.

4.1.2. RelationsTo link the entities presented before, four composed relations

which were introduced in previous part are detailed here.

Subjects Relations. σs is composed of two types of relations tolink an event e ∈ E with a subject s ∈ S and can be defined inthe following way σs = s isInvolved e ∨ s undergoes e:

• Relation of participation: s isInvolved e means that s ini-tiated or was involved in e. For example, the user of acomputer is involved in an event representing the login tothe session, etc.

• Relation of repercussion: s undergoes e means that s isaffected by the execution of e. For example, a user is af-fected by the removal of one of his files, etc.

Objects Relations. σo is composed of four types of relationsto link an event e ∈ E with an object o ∈ O and can be de-fined in the following way σo = e creates o ∨ e removes o ∨e modi f ies o ∨ e uses o:

• Relation of creation: e creates o means that o does notexist before the execution of e and that o is created by e.

• Relation of suppression: e removes o means that o doesnot exist anymore after the execution of e and that o isdeleted by e.

• Relation of modification: e modi f ies o means that one ormore attributes of o are modified during the execution ofe.

• Relation of usage: e uses o means that one or more at-tributes of o are used by e to carry out its task.

Events Relations. σe is composed of relations used to link twoevents x, e ∈ E and can be defined in the following way σe =

x composes e∨ e composes x∨ x causes e∨ e causes x. In ourworks, x isCorrelated e means that x is linked to e on the basisof multiple criteria: use of common resources, participation ofa common person or process, temporal position of events. Wedistinguish two special cases of the relation of correlation:

• Relation of composition: x composes e means that x is anevent composing e. For example, an event representing aWindows session is composed of all events initiated by theuser during this session. Let x = {txstart , txend , S x,Ox, Ex}

be an event composing e = {testart , teend , S e,Oe, Ee}, therelation of composition implies a set of constraints.First, a temporal constraint requiring that sub-events

take place during the parent event. Using Allen rela-tions, if x composes e then equal(x, e) or during(x, e) orstarts(x, e) or f inishes(x, e). Sub-events have also con-straints on participating subjects as well as the objects withwhich the event interacts. If x composes e then S x ⊆ S e

and Ox ⊆ Oe. Thus, x composes e = [equal(x, e) ∨during(x, e) ∨ starts(x, e) ∨ f inishes(x, e)] ∧ (S x ⊆ S e) ∧(Ox ⊆ Oe).

• Relation of causality: x causes e means that x has to hap-pen to allow e to happen. For example, an event describ-ing the download of a file from a server is caused by theevent describing the connection to this server. An eventcan have several causes and can be the cause of severalevents. Let e = {testart , teend , S e,Oe, Ee} be an event causedby x = {txstart , txend , S x,Ox, Ex}, the relation of causality im-plies a temporal constraint requiring that the cause musthappen before the consequence. Using Allen algebra,x causes e = [be f ore(x, e)∨meets(x, e)∨overlaps(x, e)∨starts(x, e)] ∨ (S x ∩ S e) ∨ (Ox ∩ Oe).

Footprints relation. σ f is a relation used to link a footprint f ∈F with an entity en ∈ {E×O×S }. This relation is called Relationof support: f supports en means that f is used to deduce one ormore attributes of en. We define a function support which canbe used to know the footprints used to deduce a given entity:support(en ∈ {E × O × S }) = { f ∈ F | f σ f en}. For example,an entry of a web history can be used to reconstruct an eventrepresenting the visit of a webpage by a user.

4.1.3. Crime SceneIn our works, we define a crime scene CS as an environment

in which an incident takes place and by CS = {PCS ,DCS }.PCS is a set containing the physical crime scenes. At the be-ginning of an investigation, PCS is initialized with the localiza-tion where the incident takes place. However, in an investiga-tion, the crime scene is not limited to only one building. Due tonetwork communication for example, the initial physical crimescene may be extended to a set of new physical crime scenesif one of the protagonists communicated with an other personthrough the network, downloaded a file from a remote server,etc. In these cases, the seizure of the remote machines (and byextension, the creation of new physical crime scenes and digitalcrime scenes) should be taken into account as it may be rele-vant for the investigation. DCS is a set containing the digitalcrime scenes. Unlike Carrier (Carrier et al., 2003), there is nodistinction between primary and secondary digital crime scenein our works to simplify the modelization.A crime scene is also related to events that take place on itECS = {EiCS ∪ EcCS ∪ EnCS }. For easier notation, we write hereE = {Ei ∪ Ec ∪ En} where:

• Ei is a set containing illicit events as Ei = {e ∈ E | eσi i, i ∈I} with I a set containing all actions considered as infrac-tions by the laws and σi the relation linking an event withan action. For example, an event representing the uploadof defamatory documents to a website is an event of Ei.

6

https://www.researchgate.net/publication/220542528_Getting_Physical_with_the_Digital_Investigation_Process?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

• Ec is a set containing the correlated events as Ec = {e ∈E | e σi l, l ∈ L, e σe x, x ∈ Ei}. This set contains all legalevents which are linked with a set of illicit events x.

• En = {E \ (Ei ∪ Ec)} is a set containing the events whichare not relevant for the investigation.

4.2. Event Reconstruction and Analysis OperatorsSince investigators have ensure the preservation of the crime

scene, it becomes a protected static environment containing aset of footprints. After the collection of all the footprints ofthe crime scene, the goal of the event reconstruction consists inmoving from the static crime scene to a timeline describing thedynamics of an incident which happened in the past. Describ-ing an incident means identify all the events Einc = Ei∪Ec usingfootprints of F. Events ordered chronologically describing anincident are called the scenario of the incident. In the SADFCapproach, four operators are defined to carry out the event re-construction. These operators are illustrated in the Section 5.The aim of extraction operators is to identify and extract rel-evant information contained in digital footprints from varioussources. Sources are chosen according to the definition of theperimeter of the crime scene. The relevancy or the irrelevancyof information contained in footprints is determined by the in-vestigators according to the goals of the investigation. For ex-ample, in a case involving illegal downloads of files, the in-vestigator will pay more attention to information related to theuser behaviour on the web while logs of word processing soft-ware will be ignored. The mapping operators create entities(events, objects and subjects) associated to the extracted foot-prints. These operators take the form of mapping rules allowingto connect attributes extracted from footprints and attributes ofevents, objects and subjects. A large part of the features of anevent can be determined by extraction operators from footprintscollected in the crime scene. However, the identification ofsome kinds of features requires the use of advanced techniquessuch as inference. The inference operators allow to deduce newknowledge about entities from existing knowledge. Unlike ex-traction operators which use knowledge of footprints, inferenceoperators use the knowledge about events, objects and subjects(knowledge generated by mapping operators). The analysis op-erators are used to help the investigators during the interpre-tation of the timeline. These operators can be used to identifyrelations between events or to highlight the relevant informa-tion of the timeline. In our works, we introduce an operatordedicated to the identification of event correlations. The corre-lation between two events e, x ∈ E is measured by the followingfunction:

Correlation(e, x) = CorrelationT (e, x)+CorrelationS (e, x)+CorrelationO(e, x) + CorrelationKBR(e, x) (1)

Correlation(e, x) can be weighted to allow to give more impor-tance to one of the correlation functions. Correlation(e, x) canalso be ordered and thresholded to deal with data volume con-straints by selecting the most significant correlations. Thosefour correlations are described in the following way:

Temporal Correlation, CorrelationT (e, x). First of all, a set ofassumptions about the temporal aspect is defined (according tothe Allen algebra given in Table 2):

• The greater the relative difference between the two eventsbe f ore(e, x) is, the lower the temporal relatedness is andreciprocally.

• The temporal relatedness is high for functions meets(e, x),overlaps(e, x), during(e, x) and f inishes(e, x) and maxi-mal for starts(e, x) and equals(e, x).

Thus,CorrelationT (e, x) = α∗ starts(e, x)+α∗equals(e, x)+meets(e, x) + overlaps(e, x) + during(e, x) + f inishes(e, x)+

be f ore(e, x) (2)

where starts(e, x), equals(e, x), meets(e, x), overlaps(e, x),during(e, x), f inishes(e, x) are binary functions andbe f ore(e, x) = 1

(xtstart−etend ) . Previous assumptions statethat the more two events are close in time, the more it is likelythat these events are correlated. Because of time granularities,and multi-tasks computers, if two events start at the same time,the relatedness is more important. That is why this importancecan be highlighted with an α factor.

Subject Correlation, CorrelationS (e, x) and Object Correla-tion, CorrelationO(e, x). These two correlations respectivelyquantify correlations regarding subjects involved in each eventand objects used, generated, modified or removed by events.The following hypothesis are defined according to the core ideaof several domains such as data mining, for example in formalconcept analysis (Ganter et al., 1997) which groups objects re-garding to the attributes they share or in statistics such as princi-pal component analysis, which groups observations measuringhow far they are spread (variance).

• The relatedness between e and x increases proportion-ally to the number of common subjects they share re-garding to relations of participation and repercussion.CorrelationS (e, x) = |S e ∩ S x| / max(|S e|, |S x|).

• The relatedness between e and x increases proportion-ally to the number of objects they share regarding to re-lations of creation, suppression, modification and usage.CorrelationO(e, x) = |Oe ∩ Ox| / max(|Oe|, |Ox|).

Rule-based Correlation, CorrelationKBR(e, x). In addition tothe previous factors (time, subject, object), rules basedon expert knowledge can be used to correlate events.CorrelationKBR(e, x) =

∑nr=1 ruler(e, x) with ruler(e, x) = 1 if

the rule is satisfied and 0 otherwise. Therefore, the greaterthe number of satisfied rules is, the greater the value ofCorrelationKBR is. All correlation functions return a score be-tween 0 and 1 except CorrelationKBR. This specificity allowsto give more importance to expert knowledge synthesized in therules.

7

https://www.researchgate.net/publication/200045862_Wille_R_Formal_Concept_Analysis_Mathematical_Foundations_Springer_Berlin?el=1_x_8&enrichId=rgreq-f7721ff2-e1e5-41ae-b156-b8481c3c7154&enrichSource=Y292ZXJQYWdlOzI2MTUwNzc0MjtBUzoxNTIwODc0Njc5MjU1MDZAMTQxMzI3MTg4Njg0NQ==

Figure 2: Overview of the knowledge generated during the case

5. Case Study

The aim of this section is to illustrate with an example howthis knowledge model can be used to formalize a computerforensics case and reconstruct the incident by using the pro-posed operators. To illustrate the capabilities of our model, wedesigned a fictitious investigation concerning a company man-ager who contacted a private investigator after suspicions aboutone of its employees. The manager believes that this employeeuses the internet connection of the company to illegally down-load files for personal purposes. As the investigation processmodel is not presented yet, we used in this case study a processdesigned for the need of this paper and composed of five steps:the definition of the crime scene, the collection of footprints,the creation of entities (event, object, subject), the enrichmentof the extracted knowledge and finally the construction and theanalysis of the timeline.

5.1. Crime Scene DefinitionAs a start of the investigation, the investigator evaluates the

size of the crime scene. As suspicions concern the worksta-tion of the employee, the open-space of the company where thiscomputer is located is designated as the physical crime scene.Based on the testimony and to reduce the complexity of the in-vestigation, only the workstation of the suspected employee isadded to the DCS set. To establish whether or not the doubtson the employee are founded, the investigator in charge of thecase seizes the machine used by the employee and starts usingthe theoretical tools proposed in this paper to track the user’spast activities.

5.2. Footprints CollectionAfter defining the machine used by the employee as crime

scene, the investigator collects footprints on this latter. To carryout this step, the extraction operators are used. According to

the objectives of the investigation, the investigator chooses tocollect footprints left by the web browser used by the employeeas he knows that he can find information about downloads per-formed by the user. The output of the extraction is a set con-taining several types of web browser footprints (id is a uniqueidentifier for each element populating our model):

• f Webpage(id,webpageID, pageTitle,URL, hostname): foot-print giving information about a webpage.

• f Visit(id, date, session, pageID): footprint of a visit of awebpage.

• f Bookmark(id, date, bookmarkT itle, pageID): footprint of acreation of a bookmark.

• f Download(id, start, end, f ilename, pageS ourceID,URLTarget):footprint of a file download where start and end arethe dates on which the download starts and ends,pageS ourceID is the webpage from where the downloadis launched and URLTarget is the destination used tostore the file.

The result of the extraction is given in Figure 3 and is illustratedin the left part of Figure 2.

fWebpage ( 1 , 153 , ”BBC News Home” ,” h t t p : / /www. bbc . com / news / ” , ”www. bbc . com” ) .

fWebpage ( 2 , 165 , ” T o r r e n t , S t reaming , Crack , S e r i a l . . . ” ,” h t t p : / /www. warez . com” , ”www. warez . com” ) .

fWebpage ( 3 , 28 , ” Warez: F i lms ” ,” h t t p : / /www. warez . com / f i l m s ” , ”www. warez . com” ) .

f V i s i t ( 4 , ” 2013−08−14 T10 :35 :43 ” , 3 5 1 , 1 6 5 ) .f V i s i t ( 5 , ” 2013−08−14 T10 :37 :02 ” , 3 5 1 , 2 8 ) .f V i s i t ( 6 , ” 2013−08−14 T10 :55 :41 ” , 4 1 0 , 1 5 3 ) .fBookmark ( 7 , ” 2013−08−14 T10 :55 :59 ” , ”News” , 1 5 3 ) .fBookmark ( 8 , ” 2013−08−14 T10 :35 :53 ” , ” Download ” , 1 6 5 ) .fDownload ( 9 , ” 2013−08−14 T10 :37 :20 ” , ” 2013−08−14 T11 :22 :12 ” ,” c h a n g i n g S e a s o n s . d ivx ” , 28 , ” C: \User s \UserA \Desktop \ ” ) .

Figure 3: Output of footprints extraction

5.3. Entities Creation

The output of this step is a set containing the entities (event,object, subject) which can be recovered using the informationleft in the digital crime scene. To carry out this step, the map-ping operators are used. The output of the mapping is the setof entities given in Figure 4 and illustrated in the right part ofFigure 2. Entities are linked together by relations defined inSection 4.1.2. support(x, y) means that the footprint identifiedby the id x has been used to create the entity identify by the idy. participation(x, y) means that the subject x is involved in theevent y. usage(x, y) means that the event x used the object y. Asubject is defined by sub ject(id, session) where id is a uniqueidentifier while session is the session number associated to theuser. The hypothesis that duration of bookmark creation andthe visit of a webpage is null is used (start time equals to endtime).

8

o b j e c t ( 1 0 , 153 , ”BBC News Home” ,” h t t p : / /www. bbc . com / news / ” , ”www. bbc . com” ) . ( a )

o b j e c t ( 1 1 , 165 , ” T o r r e n t , S t reaming , Crack , S e r i a l . . . ” ,” h t t p : / /www. warez . com” , ”www. warez . com” ) . ( b )

o b j e c t ( 1 2 , 28 , ” Warez: F i lms ” ,” h t t p : / /www. warez . com / f i l m s ” , ”www. warez . com” ) . ( c )

support ( 1 , 1 0 ) . support ( 2 , 1 1 ) . support ( 3 , 1 2 ) .support ( 4 , 1 3 ) . s u b j e c t ( 1 3 , 351) . ( d )support ( 6 , 1 4 ) . s u b j e c t ( 1 4 , 410) . ( e )event ( 1 5 , ” 2013−08−14 T10 :35 :43 ” , ” 2013−08−14 T10 :35 :43 ” ,

” 1 5 3 . 1 6 8 . 1 . 1 ” ) . ( f )support ( 4 , 1 5 ) . usage ( 1 5 , 1 1 ) . p a r t i c i p a t i o n ( 1 3 , 1 5 ) .event ( 1 6 , ” 2013−08−14 T10 :37 :02 ” , ” 2013−08−14 T10 :37 :02 ” ,

” 1 5 3 . 1 6 8 . 1 . 1 ” ) . ( g )support ( 5 , 1 6 ) . usage ( 1 6 , 1 2 ) . p a r t i c i p a t i o n ( 1 3 , 1 6 ) .event ( 1 7 , ” 2013−08−14 T10 :55 :41 ” , ” 2013−08−14 T10 :55 :41 ” ,

” 1 5 3 . 1 6 8 . 1 . 1 ” ) . ( h )support ( 6 , 1 7 ) . usage ( 1 7 , 1 0 ) . p a r t i c i p a t i o n ( 1 4 , 1 7 ) .event ( 1 8 , ” 2013−08−14 T10 :55 :59 ” , ” 2013−08−14 T10 :55 :59 ” ,

” 1 5 3 . 1 6 8 . 1 . 1 ” ) . ( i )event ( 1 9 , ” 2013−08−14 T10 :35 :53 ” , ” 2013−08−14 T10 :35 :53 ” ,

” 1 5 3 . 1 6 8 . 1 . 1 ” ) . ( j )event ( 2 0 , ” 2013−08−14 T10 :37 :20 ” , ” 2013−08−14 T11 :22 :12 ” ,

” 1 5 3 . 1 6 8 . 1 . 1 ” ) . ( k )support ( 7 , 1 8 ) . usage ( 1 8 , 1 0 ) . support ( 8 , 1 9 ) .

usage ( 1 9 , 1 1 ) . support ( 9 , 2 0 ) . usage ( 2 0 , 1 2 ) .

Figure 4: Entities created by the mapping process

5.4. Knowledge EnrichmentThis step is particularly useful to improve the results of the

analysis steps as it allows to discover new knowledge about en-tities. For example, the only available information to determinethe subject involved in an event generated by a web browser isthe session identifier found in some digital footprints extractedfrom Web browser. To identify the subject involved in otherevents, we used an inference operator based on the followingassumption. Let ei be the first visit of a webpage for the sessions, e j the last visit of s, ti the start date of ei and t j the end dateof e j , an event occurring on the machine in a date within thetime interval defined by ti and t j involves the person who ownsthe session s. In this case study, the person who is involvedin the event eventid=19 ( j) in Figure 4 (creation of a bookmarkfor the webpage ”http://www.warez.com”) is unknown. How-ever, using the previous inference rule, it is possible to inferthat the user is the subject sub jectid=13 (d) using the fact thatthis user perform a visit of a webpage before and after the eventeventid=19 ( j) (respectively eventid=15 ( f ) and eventid=16 (g)).Thus, the fact participation(13, 19) can be added.

5.5. Timeline Construction and AnalysisAfter collecting knowledge about events and related enti-

ties, the last step is to build the associated timeline and anal-yse it. A graphical representation of the timeline is givenin Figure 5. To find correlations between events, the opera-tor introduce in Section 4.2 is used. To illustrate the com-putation of correlations, three events defined in Figure 4 areused. In this example, the correlations between the eventseventid=15 ( f ) and eventid=17 (h) (called ”pair A”) and be-tween the events eventid=15 ( f ) and eventid=19 ( j) (called ”pairB”) are computed. First, the subject correlation and the ob-ject correlation are computed. Regarding the pair A, the twoevents share neither subjects nor objects. Regarding the pair B,

the two events use a common resource which is the webpage”http://www.warez.com” (ob jectid=11 (b)). A common subjectis also involved in both events: the subject identified by the ses-sion number 14 (sub jectid=13 (d)). Second, the temporal corre-lation is computed. For both pairs, the case be f ore(x, e) wherean event occurs before the other is used. Thus, the temporalcorrelation is greater for the pair B than the pair A as the rela-tive difference between eventid=15 ( f ) and eventid=19 ( j) is lowerthan the difference between eventid=15 ( f ) and eventid=17 (h). Toconclude, the pair B is more correlated than the pair A due tothe use of a common object (ob jectid=11 (b)), the participationof a common subject (sub jectid=13 (d)) and the temporal prox-imity between these two events. At the end of the analysis step,the investigator gets a timeline enriched with information aboutcorrelations between events. The investigator starts the inter-pretation of the results by identifying eventid=15 ( f ) which is avisit of the website ”warez.com” that appears to be a web plat-form providing links to illegal files. Then, correlation resultsshow that eventid=15 ( f ) and eventid=19 ( j) are highly correlated.This correlation allows to conclude that the visit of the web-site is voluntary and that this website has some relevance forthe user (indeed, he uses a bookmark to facilitate a subsequentvisit). The analysis of others events and others possible cor-relations can lead to the conclusion than the user has visitedintentionally the website ”warez.com” and used this website todownload illegal files.

6. Conclusion and Future Works

In digital investigations, one of the most important challengeis the reconstruction and the analysis of past events, because ofthe heterogeneity of data and the volume of data to process. Toanswer this challenge we present the SADFC approach. It al-lows to reduce the tedious character of the analysis thanks to au-tomatic analysis processes helpful for the investigators. Thus,investigators can focus on the tasks for which their expertiseand experience are most needed such as interpretation of re-sults, validation of hypotheses, etc. Moreover, SADFC imple-ments mechanisms allowing to satisfy law requirements. Thesemechanisms rely on formal definitions on which this paper fo-cuses: a formalization of the event reconstruction containingformal definitions of entities involved in an incident and foursets of operators allowing to extract, manipulate and analysethe knowledge contained in the model. The case study pre-sented in the previous section has shown the relevance of thismodel. Indeed, the use of a semantically rich representationusing new semantic aspects (in addition to temporal aspects)allows to build advanced analysis processes. In particular, wehave presented the possibilities introduced by our model to cor-relate events. The identification of correlated events enables tohighlight valuable information for the investigators.Future works will be to develop the two other components ofour approach. First, we will design an investigation processmodel providing a framework for the development of auto-mated investigation tools. This process model will include adefinition of the steps composing a digital investigation, from

9

Figure 5: Timeline of the incident

the preservation of the crime scene to the presentation of con-clusions to Justice passing through the definition of the crimescene, the collection of footprints, the construction and the anal-ysis of the timeline. During the development of this model, thefocus is on the precision and completeness of the definition ofeach step as well as its inputs and outputs. Second, we will im-plement the theoretical model presented in this paper. This im-plementation will consist of a reference architecture based onan ontology derived from our knowledge model coupled withinference mechanisms and analysis algorithms. This architec-ture will be used to validate our model.

Acknowledgements

The above work is a part of a collaborative research projectbetween the CheckSem team (University of Burgundy and theUCD School of Computer Science and Informatics and is sup-ported by a grant from UCD and the Burgundy region (France).The authors would like to thank Dr Florence Mendes for valu-able comments on the formal part of this paper.

7. References

Allen, J. (1983). Maintaining knowledge about temporal intervals. Communi-cations of the ACM, 26, 832–843.

Baryamureeba, V., & Tushabe, F. (2004). The enhanced digital investigationprocess model. In Proceedings of the Fourth Digital Forensic ResearchWorkshop. Citeseer.

Buchholz, F., & Falk, C. (2005). Design and implementation of zeitline: aforensic timeline editor. In Digital forensic research workshop.

Carrier, B., & Spafford, E. (2004a). An event-based digital forensic investiga-tion framework. In Digital forensic research workshop.

Carrier, B., Spafford, E. et al. (2003). Getting physical with the digital investi-gation process. International Journal of Digital Evidence, 2, 1–20.

Carrier, B. D., & Spafford, E. H. (2004b). Defining event reconstruction ofdigital crime scenes. Journal of Forensic Sciences, 49, 1291.

Chen, K., Clark, A., De Vel, O., & Mohay, G. (2003). Ecf-event correlation forforensics. In First Australian Computer Network and Information ForensicsConference (pp. 1–10). Perth, Australia: Edith Cowan University.

Ciardhuain, S. (2004). An extended model of cybercrime investigations. Inter-national Journal of Digital Evidence, 3, 1–22.

Forte, D. V. (2004). The art of log correlation: Tools and techniques for corre-lating events and log files. Computer Fraud & Security, 2004, 15–17.

Ganter, B., Wille, R., & Franzke, C. (1997). Formal concept analysis: mathe-matical foundations. Springer-Verlag New York, Inc.

Gladyshev, P., & Patel, A. (2004). Finite state machine approach to digital eventreconstruction. Digital Investigation, 1, 130–149.

Gladyshev, P., & Patel, A. (2005). Formalising event time bounding in digitalinvestigations. International Journal of Digital Evidence, 4, 1–14.

Gruber, T. R. et al. (1993). A translation approach to portable ontology specifi-cations. Knowledge acquisition, 5, 199–220.

Gudhjonsson, K. (2010). Mastering the super timeline with log2timeline. SANSReading Room, .

Hargreaves, C., & Patterson, J. (2012). An automated timeline reconstructionapproach for digital forensic investigations. Digital Investigation, 9, 69–79.

Khan, M., & Wakeman, I. (2006). Machine learning for post-event timelinereconstruction. In First Conference on Advances in Computer Security andForensics Liverpool, UK (pp. 112–121).

Kohn, M., Eloff, J., & Olivier, M. (2006). Framework for a digital forensicinvestigation. In Proceedings of the ISSA 2006 from Insight to ForesightConference, Sandton, South Africa (published electronically).

Liebig, C., Cilia, M., & Buchmann, A. (1999). Event composition in time-dependent distributed systems. In Proceedings of the Fourth IECIS Interna-tional Conference on Cooperative Information Systems (pp. 70–78). Wash-ington, DC, USA: IEEE Computer Society.

Martinez-Cruz, C., Blanco, I. J., & Vila, M. A. (2012). Ontologies versus rela-tional databases: are they so different? a comparison. Artificial IntelligenceReview, 38, 271–290.

Olsson, J., & Boldt, M. (2009). Computer forensic timeline visualization tool.Digital Investigation, 6, 78–87.

Palmer, G. (2001). A road map for digital forensic research. In First DigitalForensic Research Workshop, Utica, New York (pp. 27–30).

Ribaux, O. (2013). Science forensique.Http://www.criminologie.com/article/science-forensique.

Richard III, G., & Roussev, V. (2006). Digital forensics tools: the next gen-eration. Digital Crime and Forensic Science in Cyberspace. Idea GroupPublishing, (pp. 75–90).

Schatz, B., Mohay, G., & Clark, A. (2004a). Rich event representation forcomputer forensics’. Proceedings of the Fifth Asia-Pacific Industrial Engi-neering and Management Systems Conference (APIEMS 2004), 2, 1–16.

Schatz, B., Mohay, G., & Clark, A. (2006). A correlation method for establish-ing provenance of timestamps in digital evidence. digital investigation, 3,98–107.

Schatz, B., Mohay, G. M., & Clark, A. (2004b). Generalising event foren-sics across multiple domains. In C. Valli (Ed.), 2nd Australian ComputerNetworks Information and Forensics Conference (pp. 136–144). Perth, Aus-tralia: School of Computer Networks Information and Forensics Confer-ence, Edith Cowan University.

Schulze-Kremer, S. (1998). Ontologies for molecular biology. In In Proceed-ings of the Third Pacific Symposium on Biocomputing (pp. 695–706). AAAIPress volume 3.

Stephenson, P. (2003). A comprehensive approach to digital incident investiga-tion. Information Security Technical Report, 8, 42–54.

Vanlande, R., Nicolle, C., & Cruz, C. (2008). Ifc and building lifecycle man-agement. Automation in Construction, 18, 70–78.

10

A complete formalized knowledge representation model for advanced digital forensics timeline...

Documents

Transcript of A complete formalized knowledge representation model for advanced digital forensics timeline...