Processing Hand-drawn Avatars for Forensics

11
15 Processing Hand-drawn Avatars for Forensics Cheong Lee Mei & Siti Salmah Yasiran ABSTRACT Hand-drawn avatars as online self-representation can be used for criminal investigation in cyber space. These iconic self-representations are important as supporting evidence for other physical evidence in forensic investigation. It is necessary to collect cyber evidence together with general evidence so as to classify documents for effective investigation, including document retention and exchange. This paper identifies hand-drawn avatars as online self-representation which may indicate identity of the user. Informal knowledge about avatars as online self-representation is acquired by considering the broadest possible categories of hand-drawn avatars among 210 participants of ages between 21 and 22 years old with no prior knowledge of readily available online avatars. They were asked to draw a graphical image as a self-representation. A visual inspection of the suitability of a drawn avatar to be scanned into a digital image was conducted. The digital image is scaled to fit the display rectangle. This scaled image can be examined for hidden information. The digital image is then rasterised into a greyscale bitmap. It is easier to locate the object on the image (for the human eye). A filled area plot is conducted for the rasterised image. These operations are displayed through a graphical user interface, called the Avatar4pic. The avatar is subsequently decomposed and reconstructed by using 2D Haar wavelet transformation for storage reduction in a forensic information system. Keywords: Avatars, hand-drawn avatars, online self-representation, forensics, hidden information. INTRODUCTION The affordances of Internet are freedom, democracy, and communication with people around the world. These in turn has generated anxieties concerning the potential of the Internet to corrupt vulnerable minds and facilitates crimes. Online activities such as instant messaging, chats, blogs, and forums often have trails of digital evidence. The process of investigating digital evidence related to an incident under investigation is commonly referred to as Digital Forensic Investigation (DFI). Digital evidence of an incident is any digital data that supports or refutes a hypothesis about the incident (Carrier and Spafford, 2004). The Digital Forensics Research Workshop (DFRWS) model for digital forensic analysis (Palmer, 2001) defines the linear process succinctly: The use of scientifically derived and proven methods toward the preservation, collection, validation, identification, analysis, interpretation, documentation and presentation of digital evidence derived from digital sources for the purpose of facilitating or furthering the reconstruction of events found to be criminal, or helping unauthorised actions shown to be disruptive to planned operations. 141

Transcript of Processing Hand-drawn Avatars for Forensics

15 Processing Hand-drawn Avatars for Forensics

Cheong Lee Mei & Siti Salmah Yasiran

ABSTRACT

Hand-drawn avatars as online self-representation can be used for criminal investigation in cyber space.

These iconic self-representations are important as supporting evidence for other physical evidence in

forensic investigation. It is necessary to collect cyber evidence together with general evidence so as to

classify documents for effective investigation, including document retention and exchange. This paper

identifies hand-drawn avatars as online self-representation which may indicate identity of the user.

Informal knowledge about avatars as online self-representation is acquired by considering the broadest

possible categories of hand-drawn avatars among 210 participants of ages between 21 and 22 years old

with no prior knowledge of readily available online avatars. They were asked to draw a graphical image as

a self-representation. A visual inspection of the suitability of a drawn avatar to be scanned into a digital

image was conducted. The digital image is scaled to fit the display rectangle. This scaled image can be

examined for hidden information. The digital image is then rasterised into a greyscale bitmap. It is easier

to locate the object on the image (for the human eye). A filled area plot is conducted for the rasterised

image. These operations are displayed through a graphical user interface, called the Avatar4pic. The

avatar is subsequently decomposed and reconstructed by using 2D Haar wavelet transformation for

storage reduction in a forensic information system.

Keywords: Avatars, hand-drawn avatars, online self-representation, forensics, hidden information.

INTRODUCTION

The affordances of Internet are freedom, democracy, and communication with people around the world.

These in turn has generated anxieties concerning the potential of the Internet to corrupt vulnerable minds

and facilitates crimes. Online activities such as instant messaging, chats, blogs, and forums often have

trails of digital evidence. The process of investigating digital evidence related to an incident under

investigation is commonly referred to as Digital Forensic Investigation (DFI). Digital evidence of an

incident is any digital data that supports or refutes a hypothesis about the incident (Carrier and Spafford,

2004). The Digital Forensics Research Workshop (DFRWS) model for digital forensic analysis (Palmer,

2001) defines the linear process succinctly:

The use of scientifically derived and proven methods toward the preservation, collection, validation,

identification, analysis, interpretation, documentation and presentation of digital evidence derived from

digital sources for the purpose of facilitating or furthering the reconstruction of events found to be

criminal, or helping unauthorised actions shown to be disruptive to planned operations.

141

Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013

This study is based on the DFRWS model and focused on an empirical collection of hand-drawn

avatar. The avatars are scanned into digital images for processing using MATLAB toolbox. This paper is

organised as follows. Section 2 reviews the background knowledge and related work. Section 3 describes

the collection of hand-drawn avatars while section 4 demonstrates the processing of hand-drawn avatars

for forensics. Section 5 shows the conclusion and discussion for further work.

BACKGROUND KNOWLEDGE

Most images are created, stored and distributed in a digital format that is fairly easy to edit and tampered

with. A manipulated image is often the result of the application of several tools made available by the

image processing software. Consequently, most of the work in digital image forensics thus revolved in

proving authencity and integrity of the digital images. Different forensic tools working on different

manipulation are employed: single and multiple compression (Farid, 2009, Lukáš and Fridrich, 2003);

resampling (Farid, 2009, Mahdian and Saic, 2008) ; cut and paste forgeries (Lin et al., 2009, Luo et al.,

2007); and copy and move forgeries (Amerini et al., 2011, Bayram et al., 2008). Each tool provides an

output such as image enhancement, image comparison, and image visualisation. The Court will use the

output in an easy and convenient way for their interpretation of what happened at the crime scene.

Compared to other evidence such as DNA or fingerprints, the question frequently surfaced is whether

there has been tampering with the images due to the artefacts that might arise from the forensic image

processing. It is hard to employ an image processing method which is not validated before being used in

Court.

Besides image processing, pattern recognition in forensics is often used in databases. The image

has to searched by Query By Example (QBE) framework for formulating similarity queries over the

images (Huijsmans and Smeulders, 1999). In QBE a user formulates a query by providing an example of

the object that is similar to the object the user would like to use. The search method involves the

acquisition of the image followed by the features extraction from the digitised pattern and stored in the

database. A matching algorithm is used in the database. There is a decision of authentication which

depends on the amount of similarity.

Four major approaches for pattern recognition are identified (Geradts and Bijhold, 2001) namely,

Synactic, Structural, Neural Networks, and Statistical Classifier. In synactic approach, the ridge patterns

and minutiae are approximated by a string of primitives whereas in structural approach, the features

based on minutiae are extracted and represented using a graph data structure. The matching is done by

using the topology of the features. Neural networks approach requires a feature vector to be constructed

and classified by a neural network classifier. In statistical classifier approach, statistical classifiers are

used instead.

In summary, most image databases are primarily biometry (e.g. DNA, fingerprint, iris, facial

features). It is pertinent to note that in crime investigation profiling is made possible through traces in

cyber activities such as the use of avatars as online self-representation.

142

ICT, Teknologi dan Kejuruteraan

COLLECTION OF EMPIRICAL AVATARS

Informal knowledge about avatars as online self-representation is acquired by considering the broadest

possible categories of hand-drawn avatars. 210 participants of ages between 21 and 22 years old with no

prior knowledge of readily available online avatars were asked to draw a graphical image as a self-

representation for online use. A visual inspection of the suitability of a drawn avatar to be scanned into a

digital image was conducted. They were given a piece of A5 paper and a pencil to draw their an avatar

each. The avatars were inspected for suitability to be scanned and stored as jpeg images. Additional

information pertaining to gendering and motivation of the avatars were explored and published in an

earlier paper (Cheong and Fariza,2012). The hand-drawn avatars were processed using the MATLAB

Toolbox.

PROCESSING HAND-DRAWN AVATAR

Each of the hand-drawn avatars is stored as jpeg images with an ID in a database. The images can be

easily retrieved with the following codes in C++.

[filename,pathname] = uigetfile(‘*.jpg’,’Select an Image’);

image1 = imread([pathname,filename]);

axes(handles.axes1);

imshow(image1);

A graphical user interface called Avatar4pic is designed through MATLAB.

Hiding information

Figure 1 depicts four images (clockwise from top left): digital avatar; intensified avatar to fit a given

rectangle; a greyscale bitmap of the avatar; and a filled area plot for the avatar.

Figure 1: Image1 in Avatar4pic

143

Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013

The digital avatar is scaled to fit the display rectangle as shown.

image1=handles.image1;

axes(handles.axes2);

imagesc(image1);

%Each element of image1 corresponds to a rectangular area in the image

handles.image1 = image1;

guidata(hObject,handles);

This scaled image can be examined for hidden information, if any. The digital image is then

rasterised into a greyscale bitmap. A greyscale image contains only pixels of two intensity values, one

for background pixels and one for object pixels, the contrast of the object would be the difference of

those values. It is easier to locate the object on the image (for the human eye). The following is a code

snippet for the rasterisation.

imageB=handles.image1;

axes(handles.axes2);

level =graythresh(imageB);

imageA =im2bw(imageB,level);

axes(handles.axes3);

imshow(imageA);

handles.imageA = imageA;

guidata(hObject,handles);

A filled area plot is then conducted for the rasterised image as shown by the code snippet.

b=handles.imageA;

axes(handles.axes4);

area(b(1:256,1:256),’DisplayName’,’b(1:256,1:256)’);figure(gcf);

It is an area graph that displays elements in Y as one or more curves and fills the area beneath

each curve. When Y is a matrix, the curves are stacked showing the relative contribution of each row

element to the total height of the curve at each x interval.

Figure 2 shows the avatar Image2. Additional details were added on the original Image1. Note

there is hardly any difference in the filled area plot. The additional inforamtion were not captured in the

filled area plot. The artefacts of image processing were notable in the original and binary images.

Storing avatars

It is pertinent to store information of acceptable quality from an image. Wavelets contribute to effective

solutions for this problem. Wavelets are a set of non-linear bases. Unlike families of linear bases which

use static set of basis functions for every input function, wavelets employ a dynamic set of basis

functions that represents the input function in the most efficient way. The

144

ICT, Teknologi dan Kejuruteraan

Figure 2: Image2 with additional information

wavelet basis functions are chosen according to the function being approximated. Thus wavelets are able

to provide a great deal of compression in image processing.

The goal of true compression is to minimize the number of bits needed to represent it. The Haar

transform is the simplest orthogonal wavelet transform. It is computed by iterating difference and

averaging between odd and even samples of the signal. Since the avatar is in 2-D, the computational is

only required for the average and difference in the horizontal, diagonal and vertical direction. To

transform the input matrix, the 1D Haar transform is applied on each row. With the resultant matrix, the

1D Haar transform is applied on each column. This gives the final transformed matrix.

void haar1(float *vec, int n, int w)

{

int i=0;

float *vecp = new float[n];

for(i=0;i<n;i++)

vecp[i] = 0;

w/=2;

for(i=0;i<w;i++)

{

vecp[i] = (vec[2*i] + vec[2*i+1])/sqrt(2.0);

vecp[i+w] = (vec[2*i] - vec[2*i+1])/sqrt(2.0);

}

for(i=0;i<(w*2);i++)

vec[i] = vecp[i];

delete [] vecp;

}

145

Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013

The algorithm takes the sums and differences of every pair of numbers in the input vector and

divides them by square root of 2. The process is repeated on the resultant vector of the summed terms.

The 2D Haar transform is used extensively in image compression. The following shows the code

snippet.

void haar2(float **matrix, int rows, int cols)

{

float *temp_row = new float[cols];

float *temp_col = new float[rows];

int i=0,j=0;

int w = cols, h=rows;

while(w>1 || h>1)

{

if(w>1)

{

for(i=0;i<h;i++)

{

for(j=0;j<cols;j++)

temp_row[j] = matrix[i][j];

haar1(temp_row,cols,w);

for(j=0;j<cols;j++)

matrix[i][j] = temp_row[j];

}

} if(h>1)

{

for(i=0;i<w;i++)

{

for(j=0;j<rows;j++)

temp_col[j] = matrix[j][i];

haar1(temp_col, rows, h);

for(j=0;j<rows;j++)

matrix[j][i] = temp_col[j];

}

}

if(w>1)

w/=2;

if(h>1)

h/=2;

}

delete [] temp_row;

delete [] temp_col;

}

146

ICT, Teknologi dan Kejuruteraan

Figure 3 shows another avatar, Image3, in Avatar4pic. The 2D Haar transformation will be used

to compressed Image3.

Figure 3: Image3 in Avatar4pic

Figure 4 shows the decomposition (dwt) of image3 to three levels, the reconstruction (idwt) as in

the synthesized image, and the approximation at level 3. It can shown that the approximation is a much

better image than the original.

Figure 4: Decomposition and Reconstruction of Image3

147

Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013

Figure 5 shows the 2D Haar transformation of Image3 in three iterations with a tree view mode.

There are four images in a wavelet transformation. The image on the left of each level has most of the

energy conserved. Thus it is an approximation of the entire image.

Figure 5: 2D Haar transformation of Image3 with 3 iterations

The remaining three images show the differences in the vertical, diagonal, and horizontal

directions, respectively. The three images consist primarily of values that are zero or near zero. Image3

is stored in matrix with dimensions 700 x 847 pixels. The horizontal details hold information about

horizontal in the image. A large value indicate a large horizontal change as we move down the image

and small values indicate little horizontal change. Similarly, the vertical details hold information about

vertical in the image. The diagonal details contain differences across both columns and rows. It measures

changes along 45-degree lines.

The 2D Haar transform is iterated three times where there is not much change in the image. The

image can be effectively coded and the storage space for the image can be drastically reduced.

Figure 6 shows the energy distribution for the original image (global threshold) and three

iterations of the 2D Haar transform (pink). The horizontal scale is pixels. For a given pixel value p, the

height represents the percentage of energy stored in the largest p pixels of the image. Note that the

iterated 2D Haar transform gets to 100% of the energy much faster than the original image.

DISCUSSION AND CONCLUSION

The hand-drawn avatars were visually checked for suitability to be scanned into digital images. The

digital image is then rasterised into a bitmap for plotting as a filled area. The filled area plot of an image

is similar to the embellished form of it. The embellishment can be a form for hiding

148

ICT, Teknologi dan Kejuruteraan

Figure 6: Energy distribution of Image3

information which serves as intangible evidence in forensics. Intangible evidence is non-specific and

non-provable but derived from wider analysis. It provides an understanding of the motives, relationship

and actions of a suspect rather then provide evidence per se. Hidden information can take many forms,

such as coded text, icons, lines, and geometric figures. This aspect requires further exploration.

Despite being cropped to remove the surrounding white spaces, some of the images were too

large to fit into the predetermined size of the filled area plot. This infers that the bitmaps of avatars is too

demanding on the storage space of a forensic information system. To circumvent the problem, the

alternative is to decompose and reconstruct an avatar in bitmap using the iterated 2D Haar wavelet

transformation. Thus the image is coded and the storage of it is very much reduced as shown by the

energy distribution.

ACKNOWLEDGEMENTS

This exploratory study is part of an on-going project about digital cultures on ageism, anonymity, and

gendering for a forensic information system. It is fully funded by the Ministry of Higher Education

Malaysia under the grant code 600-RMI/ERGS 5/3 (31/2012).

149

Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013

REFERENCES

AMERINI, I., BALLAN, L., CALDELLI, R., DEL BIMBO, A. & SERRA, G. 2011. A SIFT-Based Forensic

Method for Copy–Move Attack Detection and Transformation Recovery. Information Forensics and

Security, IEEE Transactions on, 6, 1099-1110.

BAYRAM, S., SENCAR, H. T. & MEMON, N. 2008. A survey of copy-move forgery detection techniques.

In: IEEE Western New York Image Processing Workshop. Citeseer, 538-542.

CARRIER, B. & SPAFFORD, E. H. 2004. An event-based digital forensic investigation framework. In:

Digital forensic research workshop.

FARID, H. 2009. Exposing digital forgeries from JPEG ghosts. Information Forensics and Security, IEEE

Transactions on, 4, 154-160.

GERADTS, Z. & BIJHOLD, J. 2001. New developments in forensic image processing and pattern

recognition. Science & Justice, 41, 159-166.

HUIJSMANS, N. & SMEULDERS, A. W. 1999. Visual Information and Information Systems: Third

International Conference, VISUAL’99, Amsterdam, The Netherlands, June 2-4, 1999, Proceedings,

Springer.

LIN, Z., HE, J., TANG, X. & TANG, C.-K. 2009. Fast, automatic and fine-grained tampered JPEG image

detection via DCT coefficient analysis. Pattern Recognition, 42, 2492-2501.

LUKÁŠ, J. & FRIDRICH, J. 2003. Estimation of primary quantization matrix in double compressed JPEG

images. In: Proc. Digital Forensic Research Workshop. 5-8.

LUO, W., QU, Z., HUANG, J. & QIU, G. 2007. A novel method for detecting cropped and recompressed

image block. In: Acoustics, Speech and Signal Processing. ICASSP 2007. IEEE, II-217-II-220.

MAHDIAN, B. & SAIC, S. 2008. Blind authentication using periodic properties of interpolation. Information

Forensics and Security, IEEE Transactions on, 3, 529-538.

PALMER, G. 2001. A road map for digital forensic research. In: First Digital Forensic Research Workshop.

Utica, New York, 27-30.

CHEONG, L. & FARIZA, H. 2012. Gendering of avatars for an online self-representation. In: Proceedings of

Seminar Penyelidikan KPT Vol. 3, 465-472.

Cheong Lee Mei

Institute of Forensic Science

Faculty of Computer and Mathematical Sciences

Universiti Teknologi MARA Malaysia

Email: [email protected]

Siti Salmah Yasiran

Faculty of Computer and Mathematical Sciences

Universiti Teknologi MARA Malaysia

Email: [email protected]

150