Processing Hand-drawn Avatars for Forensics
Transcript of Processing Hand-drawn Avatars for Forensics
15 Processing Hand-drawn Avatars for Forensics
Cheong Lee Mei & Siti Salmah Yasiran
ABSTRACT
Hand-drawn avatars as online self-representation can be used for criminal investigation in cyber space.
These iconic self-representations are important as supporting evidence for other physical evidence in
forensic investigation. It is necessary to collect cyber evidence together with general evidence so as to
classify documents for effective investigation, including document retention and exchange. This paper
identifies hand-drawn avatars as online self-representation which may indicate identity of the user.
Informal knowledge about avatars as online self-representation is acquired by considering the broadest
possible categories of hand-drawn avatars among 210 participants of ages between 21 and 22 years old
with no prior knowledge of readily available online avatars. They were asked to draw a graphical image as
a self-representation. A visual inspection of the suitability of a drawn avatar to be scanned into a digital
image was conducted. The digital image is scaled to fit the display rectangle. This scaled image can be
examined for hidden information. The digital image is then rasterised into a greyscale bitmap. It is easier
to locate the object on the image (for the human eye). A filled area plot is conducted for the rasterised
image. These operations are displayed through a graphical user interface, called the Avatar4pic. The
avatar is subsequently decomposed and reconstructed by using 2D Haar wavelet transformation for
storage reduction in a forensic information system.
Keywords: Avatars, hand-drawn avatars, online self-representation, forensics, hidden information.
INTRODUCTION
The affordances of Internet are freedom, democracy, and communication with people around the world.
These in turn has generated anxieties concerning the potential of the Internet to corrupt vulnerable minds
and facilitates crimes. Online activities such as instant messaging, chats, blogs, and forums often have
trails of digital evidence. The process of investigating digital evidence related to an incident under
investigation is commonly referred to as Digital Forensic Investigation (DFI). Digital evidence of an
incident is any digital data that supports or refutes a hypothesis about the incident (Carrier and Spafford,
2004). The Digital Forensics Research Workshop (DFRWS) model for digital forensic analysis (Palmer,
2001) defines the linear process succinctly:
The use of scientifically derived and proven methods toward the preservation, collection, validation,
identification, analysis, interpretation, documentation and presentation of digital evidence derived from
digital sources for the purpose of facilitating or furthering the reconstruction of events found to be
criminal, or helping unauthorised actions shown to be disruptive to planned operations.
141
Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013
This study is based on the DFRWS model and focused on an empirical collection of hand-drawn
avatar. The avatars are scanned into digital images for processing using MATLAB toolbox. This paper is
organised as follows. Section 2 reviews the background knowledge and related work. Section 3 describes
the collection of hand-drawn avatars while section 4 demonstrates the processing of hand-drawn avatars
for forensics. Section 5 shows the conclusion and discussion for further work.
BACKGROUND KNOWLEDGE
Most images are created, stored and distributed in a digital format that is fairly easy to edit and tampered
with. A manipulated image is often the result of the application of several tools made available by the
image processing software. Consequently, most of the work in digital image forensics thus revolved in
proving authencity and integrity of the digital images. Different forensic tools working on different
manipulation are employed: single and multiple compression (Farid, 2009, Lukáš and Fridrich, 2003);
resampling (Farid, 2009, Mahdian and Saic, 2008) ; cut and paste forgeries (Lin et al., 2009, Luo et al.,
2007); and copy and move forgeries (Amerini et al., 2011, Bayram et al., 2008). Each tool provides an
output such as image enhancement, image comparison, and image visualisation. The Court will use the
output in an easy and convenient way for their interpretation of what happened at the crime scene.
Compared to other evidence such as DNA or fingerprints, the question frequently surfaced is whether
there has been tampering with the images due to the artefacts that might arise from the forensic image
processing. It is hard to employ an image processing method which is not validated before being used in
Court.
Besides image processing, pattern recognition in forensics is often used in databases. The image
has to searched by Query By Example (QBE) framework for formulating similarity queries over the
images (Huijsmans and Smeulders, 1999). In QBE a user formulates a query by providing an example of
the object that is similar to the object the user would like to use. The search method involves the
acquisition of the image followed by the features extraction from the digitised pattern and stored in the
database. A matching algorithm is used in the database. There is a decision of authentication which
depends on the amount of similarity.
Four major approaches for pattern recognition are identified (Geradts and Bijhold, 2001) namely,
Synactic, Structural, Neural Networks, and Statistical Classifier. In synactic approach, the ridge patterns
and minutiae are approximated by a string of primitives whereas in structural approach, the features
based on minutiae are extracted and represented using a graph data structure. The matching is done by
using the topology of the features. Neural networks approach requires a feature vector to be constructed
and classified by a neural network classifier. In statistical classifier approach, statistical classifiers are
used instead.
In summary, most image databases are primarily biometry (e.g. DNA, fingerprint, iris, facial
features). It is pertinent to note that in crime investigation profiling is made possible through traces in
cyber activities such as the use of avatars as online self-representation.
142
ICT, Teknologi dan Kejuruteraan
COLLECTION OF EMPIRICAL AVATARS
Informal knowledge about avatars as online self-representation is acquired by considering the broadest
possible categories of hand-drawn avatars. 210 participants of ages between 21 and 22 years old with no
prior knowledge of readily available online avatars were asked to draw a graphical image as a self-
representation for online use. A visual inspection of the suitability of a drawn avatar to be scanned into a
digital image was conducted. They were given a piece of A5 paper and a pencil to draw their an avatar
each. The avatars were inspected for suitability to be scanned and stored as jpeg images. Additional
information pertaining to gendering and motivation of the avatars were explored and published in an
earlier paper (Cheong and Fariza,2012). The hand-drawn avatars were processed using the MATLAB
Toolbox.
PROCESSING HAND-DRAWN AVATAR
Each of the hand-drawn avatars is stored as jpeg images with an ID in a database. The images can be
easily retrieved with the following codes in C++.
[filename,pathname] = uigetfile(‘*.jpg’,’Select an Image’);
image1 = imread([pathname,filename]);
axes(handles.axes1);
imshow(image1);
A graphical user interface called Avatar4pic is designed through MATLAB.
Hiding information
Figure 1 depicts four images (clockwise from top left): digital avatar; intensified avatar to fit a given
rectangle; a greyscale bitmap of the avatar; and a filled area plot for the avatar.
Figure 1: Image1 in Avatar4pic
143
Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013
The digital avatar is scaled to fit the display rectangle as shown.
image1=handles.image1;
axes(handles.axes2);
imagesc(image1);
%Each element of image1 corresponds to a rectangular area in the image
handles.image1 = image1;
guidata(hObject,handles);
This scaled image can be examined for hidden information, if any. The digital image is then
rasterised into a greyscale bitmap. A greyscale image contains only pixels of two intensity values, one
for background pixels and one for object pixels, the contrast of the object would be the difference of
those values. It is easier to locate the object on the image (for the human eye). The following is a code
snippet for the rasterisation.
imageB=handles.image1;
axes(handles.axes2);
level =graythresh(imageB);
imageA =im2bw(imageB,level);
axes(handles.axes3);
imshow(imageA);
handles.imageA = imageA;
guidata(hObject,handles);
A filled area plot is then conducted for the rasterised image as shown by the code snippet.
b=handles.imageA;
axes(handles.axes4);
area(b(1:256,1:256),’DisplayName’,’b(1:256,1:256)’);figure(gcf);
It is an area graph that displays elements in Y as one or more curves and fills the area beneath
each curve. When Y is a matrix, the curves are stacked showing the relative contribution of each row
element to the total height of the curve at each x interval.
Figure 2 shows the avatar Image2. Additional details were added on the original Image1. Note
there is hardly any difference in the filled area plot. The additional inforamtion were not captured in the
filled area plot. The artefacts of image processing were notable in the original and binary images.
Storing avatars
It is pertinent to store information of acceptable quality from an image. Wavelets contribute to effective
solutions for this problem. Wavelets are a set of non-linear bases. Unlike families of linear bases which
use static set of basis functions for every input function, wavelets employ a dynamic set of basis
functions that represents the input function in the most efficient way. The
144
ICT, Teknologi dan Kejuruteraan
Figure 2: Image2 with additional information
wavelet basis functions are chosen according to the function being approximated. Thus wavelets are able
to provide a great deal of compression in image processing.
The goal of true compression is to minimize the number of bits needed to represent it. The Haar
transform is the simplest orthogonal wavelet transform. It is computed by iterating difference and
averaging between odd and even samples of the signal. Since the avatar is in 2-D, the computational is
only required for the average and difference in the horizontal, diagonal and vertical direction. To
transform the input matrix, the 1D Haar transform is applied on each row. With the resultant matrix, the
1D Haar transform is applied on each column. This gives the final transformed matrix.
void haar1(float *vec, int n, int w)
{
int i=0;
float *vecp = new float[n];
for(i=0;i<n;i++)
vecp[i] = 0;
w/=2;
for(i=0;i<w;i++)
{
vecp[i] = (vec[2*i] + vec[2*i+1])/sqrt(2.0);
vecp[i+w] = (vec[2*i] - vec[2*i+1])/sqrt(2.0);
}
for(i=0;i<(w*2);i++)
vec[i] = vecp[i];
delete [] vecp;
}
145
Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013
The algorithm takes the sums and differences of every pair of numbers in the input vector and
divides them by square root of 2. The process is repeated on the resultant vector of the summed terms.
The 2D Haar transform is used extensively in image compression. The following shows the code
snippet.
void haar2(float **matrix, int rows, int cols)
{
float *temp_row = new float[cols];
float *temp_col = new float[rows];
int i=0,j=0;
int w = cols, h=rows;
while(w>1 || h>1)
{
if(w>1)
{
for(i=0;i<h;i++)
{
for(j=0;j<cols;j++)
temp_row[j] = matrix[i][j];
haar1(temp_row,cols,w);
for(j=0;j<cols;j++)
matrix[i][j] = temp_row[j];
}
} if(h>1)
{
for(i=0;i<w;i++)
{
for(j=0;j<rows;j++)
temp_col[j] = matrix[j][i];
haar1(temp_col, rows, h);
for(j=0;j<rows;j++)
matrix[j][i] = temp_col[j];
}
}
if(w>1)
w/=2;
if(h>1)
h/=2;
}
delete [] temp_row;
delete [] temp_col;
}
146
ICT, Teknologi dan Kejuruteraan
Figure 3 shows another avatar, Image3, in Avatar4pic. The 2D Haar transformation will be used
to compressed Image3.
Figure 3: Image3 in Avatar4pic
Figure 4 shows the decomposition (dwt) of image3 to three levels, the reconstruction (idwt) as in
the synthesized image, and the approximation at level 3. It can shown that the approximation is a much
better image than the original.
Figure 4: Decomposition and Reconstruction of Image3
147
Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013
Figure 5 shows the 2D Haar transformation of Image3 in three iterations with a tree view mode.
There are four images in a wavelet transformation. The image on the left of each level has most of the
energy conserved. Thus it is an approximation of the entire image.
Figure 5: 2D Haar transformation of Image3 with 3 iterations
The remaining three images show the differences in the vertical, diagonal, and horizontal
directions, respectively. The three images consist primarily of values that are zero or near zero. Image3
is stored in matrix with dimensions 700 x 847 pixels. The horizontal details hold information about
horizontal in the image. A large value indicate a large horizontal change as we move down the image
and small values indicate little horizontal change. Similarly, the vertical details hold information about
vertical in the image. The diagonal details contain differences across both columns and rows. It measures
changes along 45-degree lines.
The 2D Haar transform is iterated three times where there is not much change in the image. The
image can be effectively coded and the storage space for the image can be drastically reduced.
Figure 6 shows the energy distribution for the original image (global threshold) and three
iterations of the 2D Haar transform (pink). The horizontal scale is pixels. For a given pixel value p, the
height represents the percentage of energy stored in the largest p pixels of the image. Note that the
iterated 2D Haar transform gets to 100% of the energy much faster than the original image.
DISCUSSION AND CONCLUSION
The hand-drawn avatars were visually checked for suitability to be scanned into digital images. The
digital image is then rasterised into a bitmap for plotting as a filled area. The filled area plot of an image
is similar to the embellished form of it. The embellishment can be a form for hiding
148
ICT, Teknologi dan Kejuruteraan
Figure 6: Energy distribution of Image3
information which serves as intangible evidence in forensics. Intangible evidence is non-specific and
non-provable but derived from wider analysis. It provides an understanding of the motives, relationship
and actions of a suspect rather then provide evidence per se. Hidden information can take many forms,
such as coded text, icons, lines, and geometric figures. This aspect requires further exploration.
Despite being cropped to remove the surrounding white spaces, some of the images were too
large to fit into the predetermined size of the filled area plot. This infers that the bitmaps of avatars is too
demanding on the storage space of a forensic information system. To circumvent the problem, the
alternative is to decompose and reconstruct an avatar in bitmap using the iterated 2D Haar wavelet
transformation. Thus the image is coded and the storage of it is very much reduced as shown by the
energy distribution.
ACKNOWLEDGEMENTS
This exploratory study is part of an on-going project about digital cultures on ageism, anonymity, and
gendering for a forensic information system. It is fully funded by the Ministry of Higher Education
Malaysia under the grant code 600-RMI/ERGS 5/3 (31/2012).
149
Prosiding Seminar Hasil Penyelidikan Sektor Pengajian Tinggi 2013
REFERENCES
AMERINI, I., BALLAN, L., CALDELLI, R., DEL BIMBO, A. & SERRA, G. 2011. A SIFT-Based Forensic
Method for Copy–Move Attack Detection and Transformation Recovery. Information Forensics and
Security, IEEE Transactions on, 6, 1099-1110.
BAYRAM, S., SENCAR, H. T. & MEMON, N. 2008. A survey of copy-move forgery detection techniques.
In: IEEE Western New York Image Processing Workshop. Citeseer, 538-542.
CARRIER, B. & SPAFFORD, E. H. 2004. An event-based digital forensic investigation framework. In:
Digital forensic research workshop.
FARID, H. 2009. Exposing digital forgeries from JPEG ghosts. Information Forensics and Security, IEEE
Transactions on, 4, 154-160.
GERADTS, Z. & BIJHOLD, J. 2001. New developments in forensic image processing and pattern
recognition. Science & Justice, 41, 159-166.
HUIJSMANS, N. & SMEULDERS, A. W. 1999. Visual Information and Information Systems: Third
International Conference, VISUAL’99, Amsterdam, The Netherlands, June 2-4, 1999, Proceedings,
Springer.
LIN, Z., HE, J., TANG, X. & TANG, C.-K. 2009. Fast, automatic and fine-grained tampered JPEG image
detection via DCT coefficient analysis. Pattern Recognition, 42, 2492-2501.
LUKÁŠ, J. & FRIDRICH, J. 2003. Estimation of primary quantization matrix in double compressed JPEG
images. In: Proc. Digital Forensic Research Workshop. 5-8.
LUO, W., QU, Z., HUANG, J. & QIU, G. 2007. A novel method for detecting cropped and recompressed
image block. In: Acoustics, Speech and Signal Processing. ICASSP 2007. IEEE, II-217-II-220.
MAHDIAN, B. & SAIC, S. 2008. Blind authentication using periodic properties of interpolation. Information
Forensics and Security, IEEE Transactions on, 3, 529-538.
PALMER, G. 2001. A road map for digital forensic research. In: First Digital Forensic Research Workshop.
Utica, New York, 27-30.
CHEONG, L. & FARIZA, H. 2012. Gendering of avatars for an online self-representation. In: Proceedings of
Seminar Penyelidikan KPT Vol. 3, 465-472.
Cheong Lee Mei
Institute of Forensic Science
Faculty of Computer and Mathematical Sciences
Universiti Teknologi MARA Malaysia
Email: [email protected]
Siti Salmah Yasiran
Faculty of Computer and Mathematical Sciences
Universiti Teknologi MARA Malaysia
Email: [email protected]
150