DETECTION OF HUMAN INTRUSION IN A SMART HOME
A Project
Presented to the faculty of the Department of Computer Science
California State University, Sacramento
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
Computer Science
by
Beerdwinder Deep Kaur
FALL
2021
iii
DETECTION OF HUMAN INTRUSION IN A SMART HOME
A Project
by
Beerdwinder Deep Kaur
Approved by:
__________________________________, Committee Chair Dr. Jun Dai
__________________________________, Second Reader Dr. Xuyu Wang
____________________________
Date
iv
Student: Beerdwinder Deep Kaur
I certify that this student has met the requirements for format contained in the University format
manual, and this project is suitable for electronic submission to the library and credit is to be
awarded for the project.
__________________________, Graduate Coordinator ___________________
Dr. Jinsong Ouyang Date
Department of Computer Science
v
Abstract
of
DETECTION OF HUMAN INTRUSION IN A SMART HOME
by
Beerdwinder Deep Kaur
This project develops an AI-based system that will help detect intrusion of human in a
smart home with Camera and networking setting. The system will be made in such way
that once a camera detects a stranger with knife or gun, the system will capture the picture
of the stranger and will make all the cameras to detect/track the stranger anywhere in
home and keep sending the update of the person to the user on his mobile. Compared to
traditional and similar systems, this project introduces a secondary layer to have all the
cameras cooperate to track the thief (intruder detected before).
In summary, the system will be capable of doing face recognition and object detection
(weapons). This human intrusion detection system will be developed by using two models
– LBPH and Faster RC neural networks.
_______________________, Committee Chair
Dr. Jinsong Ouyang
_______________________
Date
vi
ACKNOWLEDGEMENTS
I am grateful to Dr. Jun Dai for giving me an opportunity to work on this project. His
in-depth technical inputs and guidance helped me shape this project. As a teacher,
he has taught me a great deal about AI/Machine Learning and also Software
Development Methodologies. The concepts I learned from the project has helped me
in my professional career. I would also like to thank Dr. Xuyu Wang for his
willingness to serve as a second reader for this project and provide valuable
feedback.
vii
TABLE OF CONTENTS
Page
Acknowledgements ......................................................................................................... vi
List of Tables ................................................................................................................... ix
List of Figures .................................................................................................................. x
Chapter
1. INTRODUCTION ........................................................................................................ 1
1.1 Problem Statement .................................................................................................. 1
1.2 Existing Solution .................................................................................................... 2
1.3 Proposed Solution ................................................................................................... 3
2. BACKGROUND AND LITERATURE REVIEW ...................................................... 4
3. PROPOSED FRAMEWORK (DESIGN) .................................................................. 10
4. IMPLEMENTATION AND DEVELOPMENT ........................................................ 12
4.1 Data Collection: .................................................................................................... 12
4.2 Models used .......................................................................................................... 15
4.2.1 LBPH: ............................................................................................................ 15
4.2.2 Faster R-CNN ................................................................................................ 18
4.3 Implementing Face Detection and Recognition ................................................... 20
4.4 Implementation for Object Detection: .................................................................. 23
viii
4.5 Primary Layer: ...................................................................................................... 27
4.6 Secondary Layer: .................................................................................................. 28
4.7 Impact Factor ........................................................................................................ 30
5. RESULTS AND EXPERIMENTATION .................................................................. 32
5.1 Results obtained for the first person: .................................................................... 34
5.2 Results obtained for the second person: ............................................................... 36
5.3 Results obtained from the third person. ................................................................ 39
6. FUTURE WORK ....................................................................................................... 42
7. CONCLUSION .......................................................................................................... 43
References: ..................................................................................................................... 44
x
LIST OF FIGURES
Figure Page
1: Functioning of the project from user's perspective .................................................... 10
2: Flowchart describing the working of the project ....................................................... 11
3: It shows the different houses labelled in one image. .................................................. 14
4: Xml file generated after annotating an image. ........................................................... 14
5: Picture of knife held in hand ...................................................................................... 15
6: Working of the lbp operator. ...................................................................................... 17
7: Showing the working of LBP on am Image ............................................................... 17
8: Architecture of faster rcnn .......................................................................................... 19
9: Flowchart of face recognition..................................................................................... 21
10: Training faces ........................................................................................................... 22
11: Face recognition of a person .................................................................................... 22
12: Face recognition of unknown person ....................................................................... 23
13: Knife detection with 98% accuracy ......................................................................... 26
14: Knife detection with 100% accuracy ....................................................................... 27
15: Email sent to the user when an intruder is detected ................................................. 28
16: Code shows the working of secondary layer ............................................................ 30
17: User with the knife ................................................................................................... 32
xi
18: Unknown user with the knife ................................................................................... 33
19: Email screenshot ...................................................................................................... 33
20: Screenshot sent along with the email ....................................................................... 34
21: First video tracked and sent in email ........................................................................ 34
22: Second video tracked in camera zone 2 ................................................................... 35
23: Third video tracked in camera zone 2 ...................................................................... 35
24: Fourth video tracked in camera zone 2 .................................................................... 35
25: Camera screen (zone 1), detection of thief ............................................................... 36
26: Image sent along with the email ............................................................................... 36
27: Same thief detected in camera zone 1 ...................................................................... 37
28: Same thief detected in camera zone 1 ...................................................................... 37
29: Thief moves to second location ................................................................................ 38
30: Tracking video of camera zone 2 ............................................................................. 38
31: Tracking video of camera zone 2 ............................................................................. 38
32: Tracking video of camera zone 2 ............................................................................. 39
33: Thief detected and image sent .................................................................................. 39
34: Thief tracked in camera zone 1 ................................................................................ 40
35: Thief tracked again in camera zone 1 ....................................................................... 40
36: Thief tracked in camera zone 1 ................................................................................ 41
1
Chapter 1. Introduction
1.1 Problem Statement
Home security is one of the main concerns for the people. There are numerous alternatives
can provide home security. For instance: - CCTV cameras. These cameras can capture and
store the video for the whole day. However, there is an additional effort required to go
through the videos feed and check any malicious activity. Moreover, CCTV system does
not provide any alert system. These can be two major drawbacks for this security system.
Other systems like ADT protection, is a security with burglary alarm, fire alarm and many
more. However, if we delve into the background working of ADT, a protection agency is
linked to protect from any theft. Also, this system provides no privacy. Every single second
of your day life is being recorded in the house. There are other systems like Ring, that
provide human detection alert outside the home. However, this framework will still provide
the alert if the home owner itself is detected outside the house. So, keeping all these
concerns in mind, a better security system is required that can meet the satisfaction level
of the family. This motivation leads to build a system that:
• Makes distinction between legal user and unknown user.
• Send alert to the user for any thief detected.
• After detection of the thief/intruder, all the cameras will form a second layer where
the thief will be tracked.
• Moreover, there is no external agency involved and data is secured with the user.
2
1.2 Existing Solution
Khan A Sultana and Tan a Wahid et al [21] proposed a light weight intrusion detection
system with Raspberry Pi camera for detection of a robbery. It will detect an attacker
holding knife and gun in his hand. Here, event-driven approach is utilized, such that,
camera sensors will produce better surveillance services [21]. Patterns are monitored for
better results [21]. In addition, activity is supervised in the field of view [21], By making
use of event-driven approach, cameras will be on standby [21]. This build a framework
which is energy efficient and will only respond if any activity of the target event is
inspected [21]. In this research, they deployed CNN model to solve this problem statement.
This framework integrates image processing [21], AI computer vision and network
communications methods for real-time crime event detection [21]. Training of CNN model
is accomplished by using trained images of knives and guns. For validation and testing, 8:2
ratio is diffused for training and testing.
Kepin Yu et al proposed a framework that can recognize an object (knife, gun, any metal)
in a moving person [22]. In the traditional system, people have to wait for the security
check at the gate in the airport [22]. That will result in bottleneck and congestion.
Therefore, this paper purposed an AI based W band dubious object detection. This system
comprises of two layers- primary level and secondary level. In primary level screening,
suspicious object will be inspected using W-band radar, if that object is within 15 meters
[22]. In secondary level screening, the person who was found suspicious in the primary
layer, will be tracked and the suspicious object will be recognized. Recurrent Convolution
Neural network (R-CNN) methodology is used for image recognition and classification
3
[22]. Here four types of suspicious objects are used (knife, scissor, fork, bottle) to train and
evaluate CNN model. Blurred and noisy images are used as well to train the CNN model
to reduce false rate (FR) [22].
1.3 Proposed Solution
The proposed solution incorporates two layers: - primary layer and secondary layer. In the
primary layer, face recognition and object detection will take place to authenticate if the
person is unknown and if it is holding a knife. The user will receive an email alert along
with the intruder’s image if both the conditions satisfy. For the face recognition, LBPH
classifier model is deployed. For object detection Faster -RCNN model is disseminated to
train the data. Once the thief is exposed, secondary layer will be established where the thief
will be tracked. A model will be trained with the images of thief and face recognizer will
initiate the process of determining the similar unknown person. If the thief is detected in
secondary layer, a video will be sent to user through email. The system will continue the
procedure of sending emails, taking into consideration of the thief’s presence in any camera
zone.
4
Chapter 2. Background and Literature Review
Home security has always been major concern among people. However, with the help of
an intelligent system, a home can be secured by using CCTV cameras, video doorbell,
smart locks and many more. Life would become much easier if the system can predict a
crime by processing CCTV cameras [17]. In [17], the research work focuses on image
dataset to detect any gun, knife, gun in hand, blood. In the dataset of [17], gun was further
classified as short gun, machine gun and revolver. Therefore, there were, in total 6 classes
– knife, blood, short gun, machine gun, revolver and gun in hand. The images in dataset
were of size 150x100 pixels. Furthermore, max pooling was used to make the original
image as 75x50 and then again performed max pooling to get the dimensions as 35x27.
A multi-layered CNN model was used to detect the crime scene. To arrive at a detection
result in [17], this model employed the Convolutional Layer, Rectified Linear Unit (ReLU)
Fully Connected Layer, and CNN dropout function [17]. To obtain our desired results, in
[17], CNN was developed using TensorFlow (an open source platform) to obtain the
desired results [17]. This proposed model achieved an accuracy of 90.2% while testing on
dataset [17]. The [17], there was some test cases where the proposed model can detect any
color that was similar to blood as a blood. Moreover, the short guns and revolver look quite
similar. In [17], short guns and revolver were trained with CNN model after training
machine gun, so there were some accuracy problems with differentiating the guns by the
model. These were some of the flaws mentioned in [17] research work.
The weapon detection system should be robust to provide enhanced security. As a result,
the ability of an automated surveillance system to detect weapons raises the level of
5
security [18]. In [18], Deep Convolutional Neural Networks are being used in a
revolutionary way to classify weapons (DCNN) [18]. For initializing the weights of the
architecture's convolutional layer, the proposed approach employs the idea of transfer
learning [18]. Transfer learning is a form of learning that allows you to apply what you've
learned in one setting to help you learn in another [18]. In [18], number of neurons were
changed in fully connected layer. This leaded to two models which further helped to
determine the effect of neurons on the weapon classification accuracy [18].
Experiments were carried out on three types of images: knife, gun, and no-weapon [18].
Various varieties of knives were retrieved from the internet, and some of the images were
taken with the lab's camera [18]. Gun class consisted of several photos of pistols [18].
Pistols photos were downloaded from the internet [18]. Humans, automobiles, chairs, and
other objects are included in the No-weapon class [18]. The dataset used in [18] is too small
[18]. Therefore, some of the changes were made to make sure that training was done
properly [18]. Number of the parameters were reduced [18] and increasing of dropout value
was seen in [18]. After making two models by increasing the number of neurons, analysis
carried out was that true positive will not increase even by increasing the number of
neurons [18]. And increase in dropout may or may not lead to decrease in true positives
[18]. This concludes, that even after increasing the number of neurons in convolutional
neurons network, this will not lead to good or accurate results.
A deep research has been done in [19], which focuses on different algorithms for object
detection. The algorithms are further classified as non-deep learning and deep learning
algorithms [19]. Color segmentation, interest points, forms, and edge detectors are
6
commonly used in non-deep learning methods [19]. The dependency of non-deep learning
methods on the image quality and angle leads to major drawback of using this algorithm
[19]. Noise and occlusion in images might be difficult to notice [19]. If the needed object
shares the same color as its surrounding objects, color segmentation will struggle to
separate the image [19]. In non-deep algorithms, image quality is very crucial [19]. If the
images used are not of good quality, then this will provide bad results for the weapon
detection.
On the other hand, no requirement of feature engineering makes the deep learning
algorithms to have upper hand on non-deep learning algorithms [19]. In [19], it uses and
compares the accuracy obtained by different deep learning algorithms – RCNN, Faster-
RCNN, YOLO, SSD and Overfeat [19].
In the conclusion of [19], results comparison of the different algorithms differs based on
the dataset used [19]. Some of it used the ImageNet dataset and some used custom dataset
[19]. To resolve this issue in [19], ImageNet dataset is taken as a standard for generating
results [19]. Based on this assumption in [19], Faster-RCNN gives the best speed [19].
This conclusion in [19] about Faster-RCNN providing accurate results leads to using this
model for object detection in this project.
The research work in [17], [18], [19] detected the objects and crime scene based on the
images used in the testing dataset instead of live video feed.
However, there are some research work that focused both on real time and non-real time
system. In [20], a support system is built to provide security to the system [20]. Face
recognition and anomaly detection is done to recognize the individual's actions [20]. These
7
two are the main key aspects of the suggested system in [20]. As a result, if necessary, it
can be utilized to send alerts to relevant authorities and family members [20]. The proposed
system does not take a dataset of the home owner’s face [20].
The primary goal of this project is to develop an image processing-based security solution
[20]. The system's flow begins with the capture of a face image, which is then subjected to
face recognition [20]. The image capture is carried out once more if the face is recognized
[20]. Otherwise, the body tracking component is started [20]. If an anomaly is discovered,
an alert is sent to the cloud, along with the location where the anomaly was discovered
[20]. Face recognition, anomaly detection, and transferring data to the cloud are the three
primary components of the system software [20].
The first section employs a webcam to collect people's faces and forecast their identities
[20]. A PCA approach is used by the application to identify a face from the database [20].
The eigenfaces are discovered by projecting the primary components onto the eigenspace
[20]. The lowest Euclidean distance of projection determines the anonymous faces [20].
If the person detected matches with face in the unknown dataset, then anomaly detection
is carried out [20]. The anomaly detection focuses on activity pattern, further based on
unsupervised method, Hidden Markov [20]. The activity pattern is related to burglary
position. When any activity pattern is recognized, a picture is clicked and saved in the cloud
with the location of the camera. Moreover, a siren will start ringing to give the owner an
alert about this activity. There should be at least two detection of anomalies that passes
above the threshold set up [20].
8
Whenever an anomaly was detected, the time and location of the anomaly are reported to
Thing speak, a cloud platform [20]. Moreover, the image of the detected anomaly was also
delivered to the cloud as a 1D array [20]. The saved image in the cloud database, was
helpful for the owner, which can be retrieved for further investigation.
The research carried out in [20], was tested both on real time and non-real time dataset.
The proposed model in [20], turns out to be more efficient and effective in non-real time
dataset with an efficiency of 100%. But still, it was able to achieve an efficiency of 70% in
real time surveillance. In [20], it is concluded that large amount of dataset will leads to
more efficiency in real time surveillance [20].
In conclusion to the research work in [17], [18], [19], the kind of weapon used determines
the nature and scope of the crime [18]. If an automated video surveillance system can create
a previous alarm, then effects can be reduced to the greatest extent possible [18].
The effectiveness of video surveillance can be considerably enhanced by a comprehensive
weapon categorization system [18]. An automated surveillance system can also benefit
from weapon classification [18]. When it comes to weapon types, it has been observed that
guns and/or knives are frequently employed in criminal activity due to their portability
[18]. As a result, the work provided in [18] focuses on recognizing or classifying these two
types of firearms in any image [18].
This research work in [17] detected the crime scene based on the images used in the testing
dataset instead of live video feed. A person's ability to pay attention to all of the video feeds
on a single screen recorded by many video cameras is not always achievable [18].
9
However, there is a need of active surveillance systems such that an alert can be send to
the user when there is any malicious activity is recognized.
10
Chapter 3. Proposed Framework (Design)
Designing is necessary to get the layout of the project, before taking deep dive into it. It
helps to understand the rudiments of the project and give clear intention of the
functionality.
The picture below describes the functionality of the project in a user perspective way. So,
if a person that is unknown to the system is detected and moreover, he or she is holding a
knife, then an alert will be send from the system to the user through email address.
Furthermore, some pictures of the person will be taken to keep track of the person. The
cameras will be placed at different locations. Every camera will be active and detection of
the unknown person and the knife will be done simultaneously. This gives a general idea
of the project.
Figure 1: Functioning of the project from user's perspective
The flow chart below provides the layout or design in a more detail. As described in
flowchart, an email alert will only be sent if the person is unknown and have a knife. So, if
the person is unknown and does not have any knife, then the system will not detect it as a
11
thief. Once the thief is detected, then tracking will take place. But before tracking, images
of the thief will be stored which will be used to track the same unknown person.
Figure 2: Flowchart describing the working of the project
12
Chapter 4. Implementation and Development
4.1 Data Collection:
Dataset plays very important role for any AI project and system. The availability of data
will have a significant impact on how the system is put together and which AI approaches
are deployed [5]. The amount and quality of data available will have an impact on the final
product's quality [5]. In this way, one could argue that data availability (if data exists) and
accessibility (whether data is available) are the primary motivators for the development of
AI-based solutions [5].
ML is extremely reliant on data; without it, an "AI" will be unable to train [6]. It is the most
important factor that allows algorithm training to take place [6]. Any AI project will fail
regardless of how smart the AI team is or how big the data set is if the data set isn't good
enough [6].
For this project, two types of datasets were required for carrying out face detection and
object detection.
1. Face Detection and recognition: For face recognition, 120 images of the legal user
are used in dataset to differentiate between user and thief. To capture 120 images,
a python program is written, which will take pictures and will save those in dataset
folder. Haarcascde_frontface.xml classifier is used for detecting face. Pictures will
be saved in black and white. This program will take nearly 20 seconds to capture
120 images. While capturing, user will tilt the face to the four directions. This will
help in improving accuracy of the face recognition.
13
2. Object Detection: As mentioned before, data plays very important role in this
project. The bigger the dataset is, more accurate the results will be. Large dataset
provides great benefit in a research work for object detection and recognition [1].
LabelMe is a web-based tool which is used for building dataset in this project.
LabelMe is a tool for annotating images by classifying images into different classes
[2]. This tool is necessary because current segmentation and interest point
techniques cannot detect the outlines of plenty of item division, which are regularly
short and obscure in natural photos, annotated data is required [2]. It is vital to have
a "center of attention" [2]. The objective of LabelMe is to contribute with high
quality of labelling, diverse classes with detailed information like bounding boxes,
polygon and many more [1].
Thus, the purpose of the annotation tool is to grant a drawing interface that works across
several platforms, is simple to use, and enables for fast data sharing [1]. A picture appears
when the user first visits the page [1]. The picture is part of a much larger image database
that includes hundreds of object types and a wide range of situations [1]. By tapping control
points across the perimeter of the object, the user can label a new object [1]. The user will
start labelling by drawing the shape that will further connect the end points of the object.
Here, the rectangle shape is used for the knife. Once this is done, a dialog box will pop up
which will ask for object name. After entering the object name, the same image can be
labelled again for any other object. This dataset comprises of one class: knife. Total 1346
images are used for knife. So, the images are labeled one by one along with name of the
object. These labels are saved in an XML file, making it more extensible [1].
14
Figure 3: It shows the different houses labelled in one image [3].
For every image labelled, an xml file is created. Below is the xml file for one of the images
of the dataset.
Figure 4: Xml file generated after annotating an image.
This xml file will have the position of the knife (xmin, xmax, ymin, ymax) in the image. It
has the image name, width, height, depth along with the object name ‘knife’. To make the
15
system more accurate while detecting knife, a person holding a knife in hand are used as
well in dataset to train the model. There is a chance that while holding a knife, knife might
be covered 50% by hand (as shown in the picture below).
Figure 5: Picture of knife held in hand
So, to label this picture, the red bounding box is the correct representation for the
annotation. This will not only make sure that knife is detected, but also that the knife was
held by a human instead of only present on the floor, desk or somewhere else. The dataset
is divided into training and testing. For testing purpose, 234 pictures are used.
4.2 Models used
4.2.1 LBPH:
The most difficult problem in computer vision during the last decade has been automatic
individual face identification [7]. However, law enforcement agencies are still unable to
effectively classify and observe any person [32] using video surveillance cameras;
lightning, blur conditions, resolution, illumination are [32] all influential issues in facial
recognition [7]. This is where it comes, Local Binary Pattern Histogram which will be used
to handle human face identification [7], [32].
LBPH is a face recognition algorithm that is established on the local binary operator [8]. It
is an extensively employed algorithm for the sake its two main features. First one is
16
discriminative power and secondly, computational simplicity, [8]. Let's take a closer look
at the LBPH method; there are various steps to this approach.
The first is parameters, which are divided into four categories: neighbors, radius, grid Y
and grid x [9]. Radius is deployed to locate the image's center. The value for this radius is
set to 1. It cannot accommodate to have greater than 8 pixels as it would be too expensive
to compute [9]. The direction of the number of cells in grid x is horizontal, while it is
vertical in direction for the grid y. Given the cost of computing power, grid y and x are
both set to 8 [9]. The dataset must then be trained for the method. Each user has a distinct
identity, and photographs are kept according to that identification value [9].
For every image of size M x N, it will be split into M x M regions.
The local binary operator is used in each region. When used on pictures, this operator
compares a pixel to its eight closest neighbors [8]. If the effectiveness of the nearby pixel
is outstanding than the effectiveness of the center pixel, the comparison returns a value of
'1' [8]. A '0' value is returned in all other cases [8]. When this technique is applied to all
eight neighbors, eight binary values are obtained [8]. Following that, those values are
combined to generate an 8-bit binary number [8]. The acquired binary number can be
converted to a decimal value in the range of 0-255, known as the pixel LBP value [8]. In
the Fig below, the operation is performed on each region [8].
17
Figure 6: Working of the LBP operator [8].
The current region's histogram is then constructed by counting the number of times each
LBP value appears in this region [8]. Each region's histogram is made up of 256 bins as a
result of this technique [8]. Then, by concatenating each histogram, we generate a new,
larger histogram [9]. For an 8x8 grid, a total of 8x8256=16.384 places will be generated
for the histogram [9]. The resulting histogram imitate the peculiarity of the original image
[9]. Face recognition is the final phase, and the algorithm has already been trained for this
[9]. Each image in the training data set is represented by a histogram [9]. All of the previous
processes are repeated for each new image in order to construct a histogram that depicts it
[9]. So, the histogram generated for the image, will be compared with other histograms [9].
So, the histograms that will be closest, will be considered a match [9]. The image that
cognates to this particular histogram, will be returned as a result of this algorithm [9].
Figure 7: Showing the working of LBP on Image
18
In addition, the algorithm returns a confidence value, which is equal to the calculated
distance [10]. Lower confidence is preferable. The shorter the distance, the better will be
the accuracy of the algorithm [9]. The algorithm can then be given a threshold value, which
can be used to determine whether the algorithm has identified a picture with the help of
comparison of confidence value or not. [9].
So, basically, face recognizer will check the confidence value. If the confidence value is
greater than the threshold value, this means the face is not recognized and it will return -1
or negative value.
4.2.2 Faster R-CNN
Convolutional neural networks have great properties in the field of object detection [11].
Its weight-sharing network structure restrict in diminish the complexity of network model.
Fast-RCNN method is comparable to the R-CNN algorithm. Convolutional feature maps
are generated by CNN by feeding image, instead of region proposals [14]. The region of
proposals is selected from the convolutional feature map using this algorithm. These region
proposals are in different size. To fed these into fully connected layer, RoI pooling layer is
used to restructure these into fixed size [14]. After being reconstructed into fixed size, a
SoftMax layer is employed to predict class and the bounding box offset values from the
RoI feature vector [14]. However, region proposals used in Fast-RCNN become bottleneck
as it slows down the algorithm, ultimately leading to affecting the performance [14].
In this research, faster-RCNN is employed to detect objects [11]. Faster-RCNN deploy
fast-RCNN and RPN together for the object detection.
19
RPN is a fully convolutional deep learning network [13]. At each position in the input
image [11] [12], RPN has the capability for estimating the target area frame and target
score (probability of the actual target) [11], [12]. RPN is used to build high-quality Region
Proposal boxes; it shares the complete graph's convolution feature with the detection
network and eliminates the speed problem of the original Selective Search [11], [12],
considerably improving the object detection effect [11].
Figure 8: Architecture of Faster RCNN [14]
These region proposals are of different sizes [30]. This will further lead to CNN feature
maps of different sizes [30]. It will be a lot of struggle to come up with an efficient structure
which will be used for the different sized features maps [30]. To make these feature maps
into the same size, Region of interest pooling is used [30]. The input feature maps are
divided into a fixed number by ROI pooling and then, made into nearly equal regions [30].
After the creation of equal regions, max pooling is applied on every region [30]. The idea
behind making all these feature maps into same size is Max Pooling, and it can only be
applied on fixed size [30].
20
The features are sent into the sibling classification and regression branches after passing
through two fully linked layers [31]. It's worth noting that these categorization and
detection branches differ from the RPN's [31]. To get the estimation of accuracy,
classification scores are calculated. This calculation can be carried out by two ways. Either
the probability of a proposal belonging to each class is calculated or the features are passed
through a SoftMax layer [31]. To improve the projected bounding boxes, the regression
layer coefficients are employed [31]. The regressor in this case is size agnostic (unlike the
RPN), but it is unique to each class [31]. That is, in the regression layer, each class has its
own regressor with four parameters [31].
4.3 Implementing Face Detection and Recognition
For Face detection: For face detection, OpenCV DNN is used [15]. It is constructed on the
'Single Shot Multi-Box Detector' (SSD) and is based on the 'ResNet-10' architecture [15].
The YOLO technique and Single Shot Mutli- Box Detector shares similarity, such that, it
uses Multibox to detect numerous items in a single shot [15]. It has a substantially faster
object detecting system with good accuracy [15].
Below is the flowchart of the face recognition.
21
Figure 9: Flowchart of face recognition [7]
Face Recognition: To recognize the face of the legal user, dataset of user images is required
to train the model. To capture 120 images of the user, a python script is written, which will
automatically start taking pictures and will save those in a dataset folder. The number of
users can be increased as per the requirement of the family members. Once the dataset has
acquired 120 images per user, LBPH model is trained with the help of this dataset.
Below is the code for training the user images by using LBPH. Images taken from dataset
are converted to a grayscale. Here ‘id’ is given as a distinct name to the user. While running
the recognizer program, id refers to the index of the array, for instance, arr[id] = “Beer”.
22
Figure 10: Training faces
Below are the screenshots of the user and unknown person recognized after training the
model.
Figure 11: Face Recognition of a known person
Unknown user:
23
Figure 12: Face Recognition of unknown person
4.4 Implementation for Object Detection:
The foremost step is the construction of object detection classifier [16]. This can be
accomplished by training our own convolutional neural network for an object ‘knife’ [16].
Below are the steps that are followed to accomplish this purpose
1. Install Anaconda: Anaconda is a software that builds virtual Python environments,
which will allow us to install and utilize Python libraries without fear of conflicting
with current installations [16]. Anaconda is a Python library and provides with a
straightforward way to set up TensorFlow [16].
2. TensorFlow Directory will be set up.
• TensorFlow Object Detection API repository will be downloaded from
GitHub [16]: A folder named “tensorflow1” will be created under the
directory C. All the training images, classifiers, training data will be stored
in this directory [16]. TensorFlow object detection will be downloaded and
folder ‘models’ will be extracted.
24
• In this step, Faster-RCNN-Inception-V2-COCO model will be downloaded.
Faster RCNN provide more accuracy with little slow speed.
• Set up new folder: All the data (training and testing) will be put under this
directory. Further, directory for inference graph and label map will be
created.
• Download the packages described in the table below.
Table 1: Packages downloaded
Packages Usage
NumPy This is used to compute numerous operations on arrays
and multidimensional objects [29].
TensorFlow TensorFlow is an artificial intelligence software library.
It can be used for a variety of applications [27].
OpenCV-python OpenCV is a large open-source library that includes a lot
of functionality. Most importantly, it can be used for
image processing, computer vision, and machine learning
[28].
Matplotlib Matplotlib is a Python programming language plotting
library [26].
Pandas It is a library that is useful for analyzing the data [25].
Contextlib2 This library works with context manager for managing
the resources [23].
25
LXML Lxml is a Python module. It works with XML and HTML
files. It is also capable of web scraping [24].
• Hereafter, we will set up a virtual environment for tensorflow-gpu in
Anaconda [16]. Search for the Anaconda Prompt application in Windows'
Start menu, right-click it, and select "Run as Administrator" [16]. If
Windows prompts you to enable it to make modifications to your machine,
select Yes [16]. A new virtual environment named "tensorflow1" is created
in the command terminal [16].
• Configuration of the PYTHONPATH variable: A PYTHONPATH variable
pointing \models, \models\research should be created [16]. This
PYTHONPATH will be set in anaconda after activating tensorflow1, which
is a virtual environment.
• Compile Protobufs and run setup.py: This step includes commands to
compile the protobuf files. After compiling, these files can be used for
training the parameters [16]. Then the setup.py files are built and installed
using commands.
3. Generate Training Data: After collecting data and annotating those images, all the xml
files are converted into csv. This csv file will be used to generate TFRecords.
TFRecords are used as input for the TensorFlow training model. Therefore, tf_record
is generated for both training and test images.
4. Configure training and Label Map: Label Map file will be generated in this section.
The file will have the following code.
26
5. Run Training: Last but not least, the object detection training pipeline must be set up
[16]. It specifies the model and parameters will be utilized in the training process
[16]. This is the final step before start running the training.
6. Export inference graph: After we have finished training, we will need to create the
frozen inference graph (. pb file) [16]. The presence or absence of edges between a
set of points is predicted by using an inference graph, which is based on observations
about the points.
7. Testing Object Detection classifier: After doing all the steps to create custom object
classifier, it was run to test the results. Below is the picture, when this classifier was
run to test it.
Figure 13: Knife Detection with 98% Accuracy
27
Figure 14:Knife Detection with 100% Accuracy
This classifier of object detection was successfully able to detect the knife with
good accuracy.
4.5 Primary Layer:
In this layer, both the face recognition and object detection will be carried out to detect the
thief. If the thief is detected, an email will be sent to the email of the user attached with the
system as a warning. So, there are three main points, based on which system will decide to
send the warning on email or not: -
• If the person is unknown to the system and he is not holding any knife, no
warning will be sent.
• If the person is unknown to the system and he is holding a knife, then a
warning will be sent to the user.
• If the person is known to the system and he is holding a knife, no warning
will be sent as he/she is the legal user.
Here, the intruder21 will be used for knife presence and intruder11 will be used for
unknown person presence. If knife is detected, then intruder21 will be set to true. If an
28
unknown person is detected then intruder11 will be set to true. So, if both (intruder21 and
intruder11) are true then the system will send the email warning to the user along with the
image of unknown person (saved when the unknown person is detected).
Figure 15: Email sent to the user when an intruder is detected
Primary layer working is only limited to detect the thief and send the warning to the user
along with the image. However, to keep track of the image, a secondary layer is used.
4.6 Secondary Layer:
Secondary layer is used here for tracking the thief. When the thief is first detected in
primary layer, some of pictures (nearly 20) will be stored in folder. Then, those images will
be used to train the model. A variable named ‘lastSent’ is used to keep a track if there was
an unknown person was detected before. In the primary layer, it will be set to true.
There will be two cases after thief is detected.
Case 1: So, once training is done, if the intruder is detected along with knife, moreover, if
the lastSent is not None, then the face of the person will be checked with the unknown
29
person’s face (that was stored before), if a match occurs then, a five seconds video will be
sent to the user. Those five seconds video will be sent continuously to the user, if the same
thief will be detected in the camera. The main reason for using five seconds video is, if the
system keeps on checking every frame, then there will be a chance when a thief will be
present in one spot for a long time. For instance, if the thief is detected and tracked in
camera location 1 and the thief was present in the same location for 7 minutes, then the
video will be recorded for 7 minutes. After recording of thief for 7 minutes, a video will be
sent to user. However, it will be too late to track the activity of the thief.
Case 2: Once training is done, what if the intruder detected along with the knife does not
match with the image of previous thief. In this scenario, the second thief pictures should
be taken and another face recognizer for the second thief. In the code, lastSent will be set
to None, such that, if second thief is detected, primary layer will be created to detect the
thief and store the images. Primary layer run only when lastSent will be None.
30
Figure 16: code shows the working of secondary layer
For all the other scenarios, for instance, if the legal user with the knife is recognized, no
alert will be sent to the system. Also, if an unknown person is detected and he does not
have a knife in his hands, then there will be no alert. This scenario was taken into
consideration to make sure a guest (unknown person to the system and not holding knife)
will not be detected as thief.
4.7 Impact Factor
• If the thief has turned his back to the camera, the system is unable to detect if it is
a thief. So, if there is an unknown person in the camera range with a knife, and had
his back towards the camera, then the system will not be able to detect any face.
31
For face recognition, a face check is required that will be compared with the user
face. If there is no face found, then the system will not be able to make sure that
this person is a thief.
• If the person is too far from the camera (at least 8.5 feet), It’s hard for the system
to make distinction. Because smaller items may lose too much signal during down
sampling in the pooling layers of convolution, so it becomes hard for the system to
detect from too far.
32
Chapter 5. Results and experimentation
For testing this project, two cameras were deployed at different locations. This project is
extendable, more cameras can be added to this current module. For the chase of the felon,
a five second video will be sent to the user. The main reason behind employing a small
video is because, considering the scenario, that what if the thief stays in same camera zone
for 5 minutes straight, then this will lead to recording of video which will be 5 minutes
long. And, this video will be sent after 5 minutes. Then this can be too late to deal with or
forestall and prevent any serious happening.
The picture below shows the face detection along with the knife. In this scenario, an alert
will not be sent as the recognized person is the user.
Figure 17: User with the knife
33
In the second camera spot, an unknown person is detected with a knife. So, an alert will be
sent to the user through email along with the image as an attachment. The picture below
shows an unknown person detected with a knife.
we
Figure 18: Unknown user with the knife
The email alert sent to the user looks like the image in Fig 19. The system uses
[email protected] to send alerts to the user. The most current email
will have all the videos and images that were previously sent to the user for the same thief
detected.
Figure 19: Email screenshot
34
5.1 Results obtained for the first person:
Figure 20: Screenshot sent along with the email
An unknown person was found in camera zone 1 with the knife. The system identified the
person as intruder. As shown in Figure 20, this is the image of the intruder that was attached
along with the email as an alert. After detection of the intruder, secondary layer will start
tracking the thief. Pictures of the unknown person holding a knife will be taken and trained
for keeping track of thief to prevent any mis happening. Intruder was detected in same
camera zone 1. Figure 21 contains the video that tracks the thief when the thief is found in
camera zone 1.
Figure 21: First video tracked and sent in email
35
Figure 22, 23, 24 contains the videos sent to the user as a warning while tracking the
same thief in camera zone 2.
Figure 22: Second video tracked in camera zone 2
Figure 23: Third video tracked in camera zone 2
Figure 24: Fourth video tracked in camera zone 2
36
5.2 Results obtained for the second person:
Figure 25 shows the camera zone 1 where the criminal is recognized along with the knife.
After the thief is identified, an email alert will be sent to the user along with the picture in
Figure 26.
Figure 25: Camera Screen (Zone 1), detection of thief
Figure 26: Image sent along with the email
The same criminal is found in the camera zone 1. Figure 27, 28 shows the video sent to
the legal user’s email address while tracking down the thief, found in camera zone 1.
37
Figure 27: Same thief detected in camera zone 1
Figure 28: Same thief detected in camera Zone 1
The intruder changes the location and moves to the camera zone 2. In addition, the thief is
detected again and system will start tracking and sending alert along with the videos for
the same thief encountered. Figures 29, 30, 31 and 32 contains the video for the same
burglar found in camera zone 2.
38
Figure 29: Thief moves to second location
Figure 30: Tracking video of camera zone 2
Figure 31: Tracking Video of camera zone 2
39
Figure 32: Tracking video of camera zone 2
5.3 Results obtained from the third person.
A different intruder is observed in the location 1. Therefore, an alert is sent to user along
with the image, shown in Figure 33.
Figure 33: Thief detected and image sent
Furthermore, the similar person with the knife is detected again in location 1. As a result,
secondary layer will start tracking the intruder. Whenever, the same person is detected in
any of the camera locations, a tracking video of 5 minutes is recorded and sent to the user.
40
These videos are stored in the same location as the whole directory of the project. A
different folder named ‘thief_vidoes’ is constructed and used for the storing purpose.
Adding on, the thief in figure 33, is standing in same spot for location 1. A Video will be
sent to the user as long as the thief is in the camera zone, shown in Figure 34.
Figure 34: Thief tracked in camera zone 1
Same intruder is spotted in location 1, shown in figure 35 and 36. However, in figure 36,
it clearly shows the intruder moving out from the camera location. Therefore, tracking is
been finished and no more videos will be sent to the user.
Figure 35: Thief tracked again in camera zone 1
42
Chapter 6: Future work
1. A web application using HTML, CSS frameworks can be produced which will have
a user interface for addition of family member (collecting images of face), and
email address for alerts. Furthermore, having the accessibility of watching the live
camera feed.
2. Contrary to sending videos, flask framework will be efficient in generating a
website link for sharing purposes with user. The attached link will act as a tracking
link, which does not require a use of downloading a video. Once the thief is out of
the horizon of the camera, that particular live link can be transformed to video.
3. Adding on, an external device can be appended. For instance: - siren. This siren
will initiate audio alert like a ring in the house. This will further help in reducing
crime. The sound of siren will trigger a fear inside them. Therefore, instead of
executing that crime, they will struggle to escape the house in the agitation of
getting seize.
43
Chapter 7: Conclusion
A system is built that ensures the security of the home. With the help of video surveillance,
a thief can be tracked by this system to restrict any mis happenings in the future. This
system makes a distinction between a legal user and unknown person. Weapon detection
in the video surveillance will help in providing alerts and will further prevent the crime. It
uses Faster-RCNN for the object detection. This system works accurately to send alerts to
the user along with the videos (used for tracking). The system is little slow, as the laptop
used for this project does not fulfil the GPU requirements. However, if this system run on
laptop with CUDA installed or GPU requirements are met, then it will be very fast in
reading frames and detecting the weapon. Furthermore, the system provides an accuracy
of greater than 97% for object detection (knife). Overall, the system when tested for
detecting thief provided and accuracy of 94.7% and this further concluded that it can be
greatly utilized for smart home security.
44
References:
1. Russell, B.C., Torralba, A., Murphy, K.P. et al. LabelMe: A Database and Web-
Based Tool for Image Annotation. Int J Comput Vis 77, 157–173 (2008).
https://doi.org/10.1007/s11263-007-0090-8
2. Russell, B.C., Torralba, A., Murphy, K.P. et al. LabelMe: A Database and Web-
Based Tool for Image Annotation. Int J Comput Vis 77, 157–173 (2008).
https://doi.org/10.1007/s11263-007-0090-8
3. Solawetz, J. (2020, December 9). Getting started with LabelMe - Computer Vision
Annotation. Roboflow Blog. Retrieved November 13, 2021, from
https://blog.roboflow.com/labelme/.
4. Yue Han, Guoliang Li, Haitao Yuan, Ji Sun, "An Autonomous Materialized View
Management System with Deep Reinforcement Learning", Data Engineering
(ICDE) 2021 IEEE 37th International Conference on, pp. 2159-2164, 2021.
5. Digital Curation Centre Trilateral Research, The role of data in AI - GPAI. [Online].
Available: https://www.gpai.ai/projects/data-governance/role-of-data-in-ai.pdf.
[Accessed: 26-Oct-2021].
6. A. Gonfalonieri, “How to Build A Data Set For Your Machine Learning Project,”
Medium, 14-Feb-2019. [Online]. Available: https://towardsdatascience.com/how-
to-build-a-data-set-for-your-machine-learning-project-5b3b871881ac. [Accessed:
26-Oct-2021].
7. A. Ahmed, J. Guo, F. Ali, F. Deeba and A. Ahmed, "LBPH based improved face
recognition at low resolution," 2018 International Conference on Artificial
Intelligence and Big Data (ICAIBD), 2018, pp. 144-147, doi:
10.1109/ICAIBD.2018.8396183.
8. N. Stekas and D. Van Den Heuvel, "Face Recognition Using Local Binary Patterns
Histograms (LBPH) on an FPGA-Based System on Chip (SoC)," 2016 IEEE
International Parallel and Distributed Processing Symposium Workshops
(IPDPSW), 2016, pp. 300-304, doi: 10.1109/IPDPSW.2016.67.
9. A. M. Jagtap, V. Kangale, K. Unune and P. Gosavi, "A Study of LBPH, Eigenface,
Fisherface and Haar-like features for Face recognition using OpenCV," 2019
International Conference on Intelligent Sustainable Systems (ICISS), 2019, pp.
219-224, doi: 10.1109/ISS1.2019.8907965.
10. Face Recognition: Understanding LBPH Algorithm, 2017, [online] Available:
https://towardsdatascience.com/face-recognition-how-lbph-works-90ec258c3d6b.
45
11. J. Zou and R. Song, "Microarray camera image segmentation with Faster-RCNN,"
2018 IEEE International Conference on Applied System Invention (ICASI), 2018,
pp. 86-89, doi: 10.1109/ICASI.2018.8394403.
12. JRR Uijlings, KEAVD Sande, T Gevers et al., "Selective Search for Object
Recognition[J]", International Journal of Computer Vision, vol. 104, no. 2, pp. 154-
171, 2013.
13. J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic
segmentation", 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 3431-3440, 2015.
14. R. Gandhi, “R-CNN, fast R-CNN, Faster R-CNN, YOLO - object detection
algorithms,” Medium, 09-Jul-2018. [Online]. Available:
https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-
detection-algorithms-36d53571365e. [Accessed: 31-Oct-2021].
15. P. Nagrath, R. Jain, A. Madan, R. Arora, P. Kataria, and J. Hemanth, “SSDMNV2:
A real time DNN-based face mask detection system using single shot multibox
detector and mobilenetv2,” Sustainable cities and society, 31-Dec-2020. [Online].
Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7775036/. [Accessed:
31-Oct-2021].
16. Evan, Lodeiro, M., & Zylinski, A. (2020, December 15). Tensorflow-object-
detection-API-tutorial-train-multiple-objects-windows-10/readme.md at master ·
EdjeElectronics/tensorflow-object-detection-API-tutorial-train-multiple-objects-
windows-10. GitHub. Retrieved November 13, 2021, from
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-
Train-Multiple-Objects-Windows-10/blob/master/README.md.
17. M. Nakib, R. T. Khan, M. S. Hasan and J. Uddin, "Crime Scene Prediction by
Detecting Threatening Objects Using Convolutional Neural Network," 2018
International Conference on Computer, Communication, Chemical, Material and
Electronic Engineering (IC4ME2), 2018, pp. 1-4, doi:
10.1109/IC4ME2.2018.8465583.
18. N. Dwivedi, D. K. Singh and D. S. Kushwaha, "Weapon Classification using Deep
Convolutional Neural Network," 2019 IEEE Conference on Information and
Communication Technology, 2019, pp. 1-5, doi:
10.1109/CICT48419.2019.9066227.
19. A. Warsi, M. Abdullah, M. N. Husen and M. Yahya, "Automatic Handgun and
Knife Detection Algorithms: A Review," 2020 14th International Conference on
Ubiquitous Information Management and Communication (IMCOM), 2020, pp. 1-
9, doi: 10.1109/IMCOM48794.2020.9001725.
46
20. K. Lashmi and A. S. Pillai, "Ambient Intelligence and IoT Based Decision Support
System for Intruder Detection," 2019 IEEE International Conference on Electrical,
Computer and Communication Technologies (ICECCT), 2019, pp. 1-4, doi:
10.1109/ICECCT.2019.8869327.
21. T. Sultana and K. A. Wahid, "IoT-Guard: Event-Driven Fog-Based Video
Surveillance System for Real-Time Security Management," in IEEE Access, vol.
7, pp. 134881-134894, 2019, doi: 10.1109/ACCESS.2019.2941978.
22. 2. K. Yu et al., "Design and Performance Evaluation of an AI-Based W-Band
Suspicious Object Detection System for Moving Persons in the IoT Paradigm," in
IEEE Access, vol. 8, pp. 81378-81393, 2020, doi:
10.1109/ACCESS.2020.2991225.
23. 28.7. Contextlib - utilities for with-statement contexts. 28.7. contextlib - Utilities
for with-statement contexts - Python 2.7.10 Documentation. (n.d.). Retrieved
November 13, 2021, from https://documentation.help/Python-
2.7.10/contextlib.html.
24. Contributor, G. (2019, April 10). Introduction to the python LXML Library. Stack
Abuse. Retrieved November 13, 2021, from https://stackabuse.com/introduction-
to-the-python-lxml-library/.
25. Pandas: Python library - mode. Mode Resources. (2016, May 23). Retrieved
November 13, 2021, from https://mode.com/python-tutorial/libraries/pandas/.
26. Wikipedia. (2021, October 25). Matplotlib. Wikipedia. Retrieved November 13,
2021, from https://en.wikipedia.org/wiki/Matplotlib.
27. Wikipedia. (2021, November 12). Tensorflow. Wikipedia. Retrieved November 13,
2021, from https://en.wikipedia.org/wiki/TensorFlow.
28. GeeksForGeeks. (2021, March 28). OpenCV python tutorial. GeeksforGeeks.
Retrieved November 13, 2021, from https://www.geeksforgeeks.org/opencv-
python-tutorial/.
29. Wikimedia Foundation. (2021, October 27). NumPy. Wikipedia. Retrieved
November 13, 2021, from https://en.wikipedia.org/wiki/NumPy.
30. Gao, H. (2017, October 5). Faster R-CNN explained. Medium. Retrieved
November 13, 2021, from https://medium.com/@smallfishbigsea/faster-r-cnn-
explained-864d4fb7e3f8.
47
31. Ananth, S. (2020, October 1). Faster R-CNN for object detection. Medium.
Retrieved November 13, 2021, from https://towardsdatascience.com/faster-r-cnn-
for-object-detection-a-technical-summary-474c5b857b46.
32. IEEE Xplore. (n.d.). Retrieved November 30, 2021, from
https://ieeexplore.ieee.org/.
Top Related