Smartphone Naïve Bayes Human Activity Recognition Using ...

18
Na¨ ıve Bayes Human Activity Using Personalized Datasets Paper: Smartphone Na¨ ıve Bayes Human Activity Recognition Using Personalized Datasets Moses L. Gadebe, Okuthe P. Kogeda, and Sunday O. Ojo Department of Computer Science, Tshwane University of Technology Private Bag X680, Pretoria 0001, South Africa E-mail: {GadebeML, KogedaPO, OjoSO}@tut.ac.za [Received July 10, 2018; accepted July 15, 2020] Recognizing human activity in real time with a lim- ited dataset is possible on a resource-constrained de- vice. However, most classification algorithms such as Support Vector Machines, C4.5, and K Nearest Neigh- bor require a large dataset to accurately predict hu- man activities. In this paper, we present a novel real-time human activity recognition model based on Gaussian Na¨ ıve Bayes (GNB) algorithm using a per- sonalized JavaScript object notation dataset extracted from the publicly available Physical Activity Monitor- ing for Aging People dataset and University of South- ern California Human Activity dataset. With the pro- posed method, the personalized JSON training dataset is extracted and compressed into a 12 × 8 multi- dimensional array of the time-domain features ex- tracted using a signal magnitude vector and tilt angles from tri-axial accelerometer sensor data. The algo- rithm is implemented on the Android platform using the Cordova cross-platform framework with HTML5 and JavaScript. Leave-one-activity-out cross valida- tion is implemented as a testTrainer() function, the re- sults of which are presented using a confusion matrix. The testTrainer() function leaves category K as the testing subset and the remaining K-1 as the training dataset to validate the proposed GNB algorithm. The proposed model is inexpensive in terms of memory and computational power owing to the use of a com- pressed small training dataset. Each K category was repeated five times and the algorithm consistently pro- duced the same result for each test. The result of the simulation using the tilted angle features shows over- all precision, recall, F-measure, and accuracy rates of 90%, 99.6%, 94.18%, and 89.51% respectively, in comparison to rates of 36.9%, 75%, 42%, and 36.9% when the signal magnitude vector features were used. The results of the simulations confirmed and proved that when using the tilt angle dataset, the GNB algo- rithm is superior to Support Vector Machines, C4.5, and K Nearest Neighbor algorithms. Keywords: tilt angles, signal magnitude vector, real- time, Gaussian distribution function, personalized dataset 1. Introduction Studies conducted on cardiovascular diseases, partic- ularly strokes, have singled out physical activity as the main contributor to an improvement in glucose intoler- ance, cholesterol, body mass index (BMI), blood pres- sure, pulse rate, vital capacity, and oxygen intake [1–3]. A commitment of 30 min of moderate physical activity per day for 5 days a week can produce long-term health benefits, and reduce BMI, cholesterol, and obesity in peo- ple practicing such activity [1]. The personalization and monitoring of human activity is possible using inexpensive devices, including smart- phones, that are available round-the-clock and at close proximity to their owners. Smartphones are generally 0 to 60 cm from their owners and are equipped with pow- erful motion sensors such as accelerometers, gyroscopes, ambient light sensors, temperature sensors, global posi- tioning systems, barometers, and orientation sensors [4– 6]. Smartphone accelerometer and gyroscope sensory data have significant potential for the recognition of hu- man activities. Human activity recognition (HAR) can normally be divided into three steps: sensory data collec- tion from accelerometer and gyroscope sensors and the la- beling of collected sensor data regarding human activities; feature extraction to reduce classification features from a dataset, which is also known as pre-processing [7]; and classification based on machine learning algorithms such as a K Nearest Neighbor, Support Vector Machine, C4.5, Na¨ ıve Bayes, Hidden Markov model [5, 7]. The avail- ability of a sensory dataset for real-time benchmarking of HAR systems is still lagging owing to the shortage of per- sonalized datasets [7, 8]. Researchers have recently collected and published their datasets online to allow other researchers to benchmark their HAR systems [9–12]. However, some of these datasets lack personal attributes (e.g., age, height, weight, and body mass index) and are in a high-dimensional space, thereby requiring high computational power and large memory [13]. The latter poses a challenge in provid- ing HAR systems on resource-constrained platforms such as a smartphone. Recent HAR models are mindful of the resource constraints of smartphone devices with regard to the classification requirements [7, 14, 15]. Hence, in this paper, we present a real-time HAR algorithm based on a Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 685 and Intelligent Informatics https://doi.org/10.20965/jaciii.2020.p0685 © Fuji Technology Press Ltd. Creative Commons CC BY-ND: This is an Open Access article distributed under the terms of the Creative Commons Attribution-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nd/4.0/).

Transcript of Smartphone Naïve Bayes Human Activity Recognition Using ...

Naıve Bayes Human Activity Using Personalized Datasets

Paper:

Smartphone Naıve Bayes Human Activity RecognitionUsing Personalized Datasets

Moses L. Gadebe, Okuthe P. Kogeda, and Sunday O. OjoDepartment of Computer Science, Tshwane University of Technology

Private Bag X680, Pretoria 0001, South AfricaE-mail: {GadebeML, KogedaPO, OjoSO}@tut.ac.za

[Received July 10, 2018; accepted July 15, 2020]

Recognizing human activity in real time with a lim-ited dataset is possible on a resource-constrained de-vice. However, most classification algorithms such asSupport Vector Machines, C4.5, and K Nearest Neigh-bor require a large dataset to accurately predict hu-man activities. In this paper, we present a novelreal-time human activity recognition model based onGaussian Naıve Bayes (GNB) algorithm using a per-sonalized JavaScript object notation dataset extractedfrom the publicly available Physical Activity Monitor-ing for Aging People dataset and University of South-ern California Human Activity dataset. With the pro-posed method, the personalized JSON training datasetis extracted and compressed into a 111222 ××× 888 multi-dimensional array of the time-domain features ex-tracted using a signal magnitude vector and tilt anglesfrom tri-axial accelerometer sensor data. The algo-rithm is implemented on the Android platform usingthe Cordova cross-platform framework with HTML5and JavaScript. Leave-one-activity-out cross valida-tion is implemented as a testTrainer() function, the re-sults of which are presented using a confusion matrix.The testTrainer() function leaves category K as thetesting subset and the remaining K-1 as the trainingdataset to validate the proposed GNB algorithm. Theproposed model is inexpensive in terms of memoryand computational power owing to the use of a com-pressed small training dataset. Each K category wasrepeated five times and the algorithm consistently pro-duced the same result for each test. The result of thesimulation using the tilted angle features shows over-all precision, recall, F-measure, and accuracy ratesof 90%, 99.6%, 94.18%, and 89.51% respectively, incomparison to rates of 36.9%, 75%, 42%, and 36.9%when the signal magnitude vector features were used.The results of the simulations confirmed and provedthat when using the tilt angle dataset, the GNB algo-rithm is superior to Support Vector Machines, C4.5,and K Nearest Neighbor algorithms.

Keywords: tilt angles, signal magnitude vector, real-time, Gaussian distribution function, personalized dataset

1. Introduction

Studies conducted on cardiovascular diseases, partic-ularly strokes, have singled out physical activity as themain contributor to an improvement in glucose intoler-ance, cholesterol, body mass index (BMI), blood pres-sure, pulse rate, vital capacity, and oxygen intake [1–3].A commitment of 30 min of moderate physical activityper day for 5 days a week can produce long-term healthbenefits, and reduce BMI, cholesterol, and obesity in peo-ple practicing such activity [1].

The personalization and monitoring of human activityis possible using inexpensive devices, including smart-phones, that are available round-the-clock and at closeproximity to their owners. Smartphones are generally0 to 60 cm from their owners and are equipped with pow-erful motion sensors such as accelerometers, gyroscopes,ambient light sensors, temperature sensors, global posi-tioning systems, barometers, and orientation sensors [4–6]. Smartphone accelerometer and gyroscope sensorydata have significant potential for the recognition of hu-man activities. Human activity recognition (HAR) cannormally be divided into three steps: sensory data collec-tion from accelerometer and gyroscope sensors and the la-beling of collected sensor data regarding human activities;feature extraction to reduce classification features from adataset, which is also known as pre-processing [7]; andclassification based on machine learning algorithms suchas a K Nearest Neighbor, Support Vector Machine, C4.5,Naıve Bayes, Hidden Markov model [5, 7]. The avail-ability of a sensory dataset for real-time benchmarking ofHAR systems is still lagging owing to the shortage of per-sonalized datasets [7, 8].

Researchers have recently collected and published theirdatasets online to allow other researchers to benchmarktheir HAR systems [9–12]. However, some of thesedatasets lack personal attributes (e.g., age, height, weight,and body mass index) and are in a high-dimensionalspace, thereby requiring high computational power andlarge memory [13]. The latter poses a challenge in provid-ing HAR systems on resource-constrained platforms suchas a smartphone. Recent HAR models are mindful of theresource constraints of smartphone devices with regard tothe classification requirements [7, 14, 15]. Hence, in thispaper, we present a real-time HAR algorithm based on a

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 685and Intelligent Informatics

https://doi.org/10.20965/jaciii.2020.p0685

© Fuji Technology Press Ltd. Creative Commons CC BY-ND: This is an Open Access article distributed under the terms of the Creative Commons Attribution-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nd/4.0/).

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

Naıve Bayes classification technique using a lightweightJavaScript object notation (JSON) personalized datasetgenerated using the personalization algorithm proposedin [16].

Our contributions in this study are three-fold. First, anovel use of compressed time-domain features extractedfrom the personalized JSON HAR dataset is used to detecthuman behavior. Our unique use of a compressed train-ing dataset leads to the optimal application of smartphonelimited resources, given that less computational powerand memory are required to predict human activities. Sec-ond, the use of the tilt angles between vectors, as opposedto only the signal magnitude vector (SMV), might allowthe use of a single HAR training dataset for multiple sub-jects to resolve the subject sensitivity phenomenon. Third,the implementation of the cross validation testTrainer()function as a part of the package makes a distinct contri-bution, the experiments of which are conducted in a re-source constrained environment.

The rest of this paper is structured as follows: In Sec-tion 2, we present the related work. In Section 3, the im-plementation of real-time human activity recognition ispresented. The experiments on the real-time human activ-ity algorithm and a validation of the results are presentedin Section 4. The conclusion and future study are pre-sented in Section 5.

2. Related Work

HAR has been studied for decades, there remains a lim-ited number of publicly available HAR datasets for ma-chine learning. In [9], the researchers investigated andestablished that there is a lack of benchmarking HARdatasets. To overcome this limitation, they collected andpublished a dataset called a Physical Activity Monitor-ing for Aging People dataset (PAMAP2), obtained fromnine subjects. They used three Calibri wireless inertialmeasurement unit (IMU) devices attached to the domi-nant arm wrist, ankle, and chest of each participant at afrequency rate of 100 Hz.

However, the researchers in [17] collected and pub-lished their dataset called the University of Southern Cal-ifornia Human Activity dataset (USC-HAD) from 14 sub-jects wearing a MotionNode device on their waist, whichis connected to a laptop at a frequency similar to that usedin [9]. In contrast, the authors in [10] and [12] foundthat public datasets have been incrementally introduced,although there is still a lack of personalized smartphonedatasets. To address this problem, the researchers in [10]and [12] published datasets collected using a SamsungGalaxy S II smartphone from 30 subjects at a frequencyrate of 50 Hz. However, their dataset lacks personal at-tributes (e.g., age, height, weight, and body mass index)and is limited by its high dimensionality; a large mem-ory and high computational power are required to processsuch data, and therefore, this dataset is inappropriate for aresource-constrained smartphone platform [13].

The authors in [14] investigated how the selection of

the frequency sampling rate and classification features af-fect the energy overhead introduced from a high computa-tional power. Their investigation was influenced by [15],and based on their findings, they proposed an activity sen-sitive strategy called an adaptive accelerometer-based ac-tivity recognition (A3R) strategy for the recognition ofcontinuous activities. Based on the A3R strategy, the se-lection of the sampling rate and classification features isadapted in real time as the individuals conduct daily activ-ities. The authors validated their A3R strategy by annotat-ing a human activity dataset collected from four selectedsubjects. To investigate whether A3R is naturalistic andlifestyle-driven, they employed two additional datasetscollected from six users for a period of 6–8 weeks. Toevaluate the potential energy savings of A3R, they moni-tored two additional users who were engaged in everydaylifestyle activities. However, owing to the limited datasetemployed, the A3R produced poor results.

The researchers in [7] found that, instead of generatinga different model, the normal HAR model often learnsfrom accelerometer data collected from a single personand distributed to other individuals for recognizing thesame activities. They also found that such HAR mod-els are restricted by an unstable accuracy, particularly ina cross subject (known as subject sensitivity). They pro-posed and implemented a personalization algorithm thatselects self-assured samples from one person in real timeto update existing models of other individuals. However,their technique raised privacy and security issues owingto the use of a model generated from subject A and for-warded to subjects B and C. Moreover, the use of oneself-assured sample data negatively impacts the reliabil-ity of their results and can lead to a poor performance incross-person prediction.

The study in [15] is similar to our previous techniquedescribed in [16]. Here, the authors found that learningnew activities to adapt to new user needs is challeng-ing owing to a shortage of an annotated dataset. Theresearchers proposed feature and attribute-based learn-ing that leverages the relationship between existing andnew activities to compensate for the shortage of anno-tated datasets. They used the radial basis function tofeed the Support Vector Machine algorithm with 11 at-tributes for classification [15]. They evaluated their tech-nique and found that it outperforms other traditional HARmodels in recognizing new activities using limited train-ing dataset. However, the technique does not address theshortage of personalized dataset; it only detects new ac-tivities from existing dataset. More recently and similarto our technique, are models proposed in [17] and [18]to monitor and prevent obesity. The Hybrid obesity mon-itoring model and Mobile health architecture allows thesubject to enter height, weight and BMI, and the bloodpressure, blood sugar, heart rate and pedometer sensorydata. However, both techniques employ machine learningalgorithm to filter social media data about obesity usingheight, weight and BMI in order to recommend physicalactivity plans to the monitored person based on similarpersonal qualities from social media dataset.

686 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Naıve Bayes Human Activity Using Personalized Datasets

Fig. 1. Real-time HAR application architecture [6].

In this paper, we use personalized JSON dataset pro-posed in [16] to address the shortage of personalizeddataset, as well as to remove curse of dimensionality.We further employ a Gaussian Naıve Bayes (GNB) al-gorithm to recognize human activities using compressedJSON features owing to its ability to use a small datasetto accurately detect patterns, and hence its appropriate-ness for the smartphone HAR platform. Moreover, GNBrelies on a few time-domain features such as the mean andstandard deviation, and hence a compressed dataset is themost suitable for this technique in the prediction of humanactivity.

3. Methodology

In this section, we present the design and implementa-tion of a real-time Naıve Bayes human activity recogni-tion application following the HAR architecture depictedin Fig. 1.

As shown in Fig. 1, the HAR process is composedof a JSON feature extraction, real-time feature acquisi-tion, training, and classification of activities conductedin real time on an Android smartphone. Whereas thetraining dataset is acquired during the registration of newusers, the registration interface connects with our person-alization algorithm proposed in [16] through a web con-troller interface (WCI). The personalization algorithm ex-tracts time-domain bag-of-features returned as a reducedJSON dataset using two HAR datasets (PAMAP2 andUSC-HAD).

3.1. Training Dataset Acquisition and ProcessingThe process is implemented, and processing takes place

on the server-side only once during the registration ofa new user, with the aim of tailoring the JSON bag-of-feature training dataset based on personal traits. ThePAMAP2 and USC-HAD published in [9] and [19], re-spectively, were selected, as discussed in [16]. The dataset

Fig. 2. Validated JSON HAR training dataset.

published in [9] comprises nine multiple subject files la-beled subject101.dat through subject109.dat, whereas thedataset provided in [19] was collected from 14 subjectsperforming 5 trials of 12 low-level human activities storedin MATLAB files.

The datasets were selected because they contain per-sonal attributes such as age, gender, weight, and height,as recommended by Lockhart and Weiss in [8]. Duringthis process, the HAR dataset was reduced by extractingbag-of-features using K Nearest Neighbor (KNN) algo-rithm implemented in [16]. The extraction is based on thepersonal traits of the user, e.g., gender, age, height, andweight. Reduced bag-of-features are returned as JSONdataset containing different categories of human activi-ties, as shown in Fig. 2. The JSON file was validatedusing JSON Editor Online 3.7.5 [20].

The implemented KNN algorithm first computes theuser’s BMI using Eq. (1).

BMI =weightheight2 . . . . . . . . . . . . . (1)

The Euclidean distance between a new user’s BMI andthe BMI of the corpus subjects is then calculated usingEq. (2), after which the closest neighbors are determinedusing Eq. (3), both of which are implemented in [16]:

PPPDDD(user, subjects) =

√√√√ J

∑j=1

(w j −h j)2, . . . . . (2)

PPPDDD(user, subject jjj) ≤ K = {true, false}, . . . . (3)

where J is number of subjects from the training corpus,PPPDDD(user, subject jjj) is the personalization distance based onthe Euclidean distance between a new user and sub ject j

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 687and Intelligent Informatics

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

Table 1. Eight Time-Domain Features of SMV and TAs.

SMV TA of cosα TA of cosβ TA of cosγMean SD Mean SD Mean SD Mean SD

from the corpus, w j represents new user attributes (i.e.,height, weight, and BMI), and h j denotes the subject vari-ables. In addition, K is the threshold used to determinethe closest neighbors. The records of all closest neigh-bors containing the weight, height, BMI, accelerometer,and gyroscope tri-axial (X-, Y -, and Z-axes) and humanactivity labels plus the frequently used SMV determinedthrough Eq. (4) [21–24] are stored in the bag-of-features.

SMV =√

x2 + y2 + z2, . . . . . . . . . (4)

where x, y, and z are the accelerometer tri-axial axes. Inaddition to accurately detecting the human activities andtransitions tilt angles (TAs) of α , β , and γ between theX-, Y -, and Z-axes are computed for a given accelerome-ter point using the directional cosine expressed in degreesbased on Eqs. (5), (6), and (7).

TA of cosα =x√

x2 + y2 + z2· 180

π, . . . . (5)

TA of cosβ =y√

x2 + y2 + z2· 180

π, . . . . (6)

TA of cosγ =z√

x2 + y2 + z2· 180

π. . . . . (7)

Furthermore, bag-of-features is reduced by extract-ing the time-domain features using the Apache CommonMath Library package called SynchronizedDescriptiveS-tatistics [25]. The standard deviation (SD) and mean areextracted based on the SMV and three TAs per 20 rowsof grouped human activities. In total, 8 features were ex-tracted from 3 TAs, and one SMV per 20 rows of bag-of-features, as shown in Table 1.

Thereafter, the reduced bag-of-features containingeight time-domain features and the class for each humanactivity is converted into a JSON file. Each category con-tains 152, 142, 127, or 77 eight time-domain features, asshown in Fig. 2.

3.2. JSON Feature Extraction ProcessThis process is implemented in a resource-constrained

environment of a smartphone for real-time human activ-ity recognition. To meet the requirements of the imple-mented real-time GNB classification algorithm, the fol-lowing process is followed: First, a JSON parser is im-plemented to read the input JSON file acquired from thetraining dataset acquisition and pre-processing processfrom the secure digital card inserted in a smartphone. Sec-ond, for each classified human activity category, we com-pressed all of its features in a single record of eight fea-tures, by computing further time-domain features (meanand SD portrayed in Table 1) based on the SMV andTA of 152, 127, 142, or 77 records. The results of

Fig. 3. Accelerometer tri-axial coordinate system.

the compression are stored in a multidimensional arraycalled classificationTrainerSet with only 12 records basedon 8 features. The multidimensional array trainer is usedduring the training phase to train the implemented classi-fier.

3.3. Acquisition of Real-Time Feature ProcessMost smartphones come with a built-in tri-axial ac-

celerometer. The tri-axial coordinate system of the smart-phone is defined relative to the screen similarly to thestandard sensor coordinate system. The smartphone mea-sures the acceleration along the X-, Y -, and Z-axes, wherethe X-axis is horizontal and has points to the right (X+)and left (X−), whereas the Y -axis is vertical and pointsupward (Y+) and downward (Y−). Finally, the Z-axispoints both outside (Z+) and inside (Z−) of the smart-phone relative to the screen, as depicted in Fig. 3.

The acceleration is an orthogonal measurement of thetri-axial values at the rate an object moves toward any ofthe axis directions measured in [m/s2] units, continuousvalues are generated at a window size of [N/s] in hertz.The proposed algorithm collects real-time continuous val-ues at a frequency rate of 50 Hz, which is 50 values persecond. We employ our real-time value acquisition pro-cess using a function presented as Algorithm 1.

In the function, the accelerometer frequency rate is setto 50 Hz and the window size is set to a maximum of 50.During the collection, we tested whether the loop counterreached the set window size. When the window size is notreached, we continue computing the SMV and the TAsat approximately α , β , and γ of each generated tri-axialpoint, which are computed using Eqs. (5), (6), and (7)as in the training data acquisition process and store theresults thereof in temporary arrays. However, when the

688 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Naıve Bayes Human Activity Using Personalized Datasets

counter reaches the window size, we compute the meanand standard deviation of 50 stored SMV and TAs val-ues. Thereafter, we classify the collected values using theGNB algorithm implemented in the next process. At theend, the size counter is set to zero and the temporary ar-rays of the SMV and TA are emptied.

3.4. Implementation of Classification Process UsingNaıve Bayes Classifier

In this subsection, we describe the implementation ofour real-time classification algorithm to run on a resource-constrained smartphone. The GNB classification algo-rithm is implemented using a normal Gaussian distribu-tion because it is accurate and performs optimally whenusing small training datasets [26–28]. Moreover, it is notresource intensive, but is superior in terms of the centralprocessing unit and memory optimization and takes lesstime to be trained and tested [27, 28]. Hence, we imple-mented the Naıve Bayes classification algorithm becauseit uses a small training dataset with the missing values,and requires less computational power to train and test,and moreover is more scalable. The implemented algo-rithm based on the normal distribution is presented in Al-gorithm 2.

The implemented algorithm receives a pre-processedclassificationTrainerSet dataset and the real-time meanand standard deviation of the SMV and TAs as input fromthe JSON feature extraction and acquisition of real-timefeature processes. The algorithm then computes the like-lihood of each category in the classificationTrainerSet setusing Eq. (8):

P(X |C) =Xi

12

∑i=1

Xi

, . . . . . . . . . . . (8)

where Xi represents a specific feature (SMV and threeTAs) in each class category set C, as shown in Fig. 2

and ∑12i=1 Xi is the sum of all features in each class cate-

gory C stored in classificationTrainerSet. The algorithmthen computes the probability value using a Gaussian nor-mal distribution Eq. (9):

F(x | μ,σ2) =

1√2πσ2

e−(x−μ)2

2σ2 . . . . . . (9)

Here, μ is the mean of all features in Xi for the specificclass category computed using Eq. (10):

μ =1k

k

∑x=1

ix, . . . . . . . . . . . . . (10)

where x is either an SMV or a TA feature, and σ is thestandard deviation computed using Eq. (11).

σ =

√√√√ 1k−1

k

∑x=1

(ix −μx)2. . . . . . . . (11)

The values of μ and σ for each Xi features are alreadycomputed and stored as classificationTrainerSet, as de-scribed in Section 3.2. We then computed and accumu-lated all test probabilities of the products in TestProbabil-itySet to classify a new real-time instance r given μ and σfeatures for each class category C defined by Eq. (12).

P(r,μ,σ) =K

∏i=1

P(xi | r) . . . . . . . . . (12)

To determine the best matching category, we then rank allof the test probabilities in the TestProbabilitySet[] arrayin descending order. The class category with the highesttest probability value is then selected as a class for a newvalue of instance r.

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 689and Intelligent Informatics

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

Fig. 4. UML use case diagram.

Fig. 5. Apache Cordova framework [31].

3.5. Design and Implementation of Real-Time Hu-man Activity Recognition

The use case diagram shown in Fig. 4 indicates thehigh-level conceptual design of our application [29, 30].

The use case depicted in Fig. 4 consists of a “registernew user,” “data personalization,” “personalized JSON”and “perform daily activities” components and shows howa new mobile user interacts with the main application.The model is implemented using the Apache Cordovaframework indicated in Fig. 5.

The foundational PhoneGap plug-ins (shown in themiddle of the figure) of the Apache Cordova frameworkallow a developer to interact with the core components ofdifferent smartphones. Each plugin provides an interfaceto interact with the operating application interface to man-age and control smartphone core components: (a) sensors,(b) graphics, (c) inputs, and (d) services. Developers addsuitable Cordova plugins plus platforms (Android, iOS,Microsoft, or Blackberry) to develop a cross-platform mo-bile application. An accelerometer, geolocation, camera,

Fig. 6. New user registration sequence diagram.

files, network, and storage can be used to develop a con-text aware smartphone application specific to the user tomeasure and store the user position, location, and ac-celeration. Mobile application activity/views similar toa web application are created using HTML5 tags andJavaScript. To interact with remote server applicationsand obtain resources, a combination of JavaScript andJQuery can be used. In addition, the design of this ap-plication is demonstrated using a sequence diagram torepresent all behavioral components including interactionlinks, events, and methods of the implemented algorithmgiven in Figs. 6–10, which shows the user interface ofour real-time human activity application implemented inCordova, HTML5, and JavaScript.

3.6. Implementation of Real-Time Human ActivityRecognition

In this section, we describe HTML5, JQuery, andCordova events such as onLoad(), onclick(), onSubmit(),success(), failure(), watchAcceleration(), clearWatch(),$.Post(), and XMLHttpRequest() using a sequence dia-gram for each main component listed in the use case dia-gram in Fig. 4. All of these methods are implemented innative JavaScript to interact with smartphone utilities andsensors through the Cordova platform.

The chain of events for a registration starting with acollection of the user’s personal characteristics (weight,height, age, and gender) when applying the onWrite()method is shown in Fig. 6. Upon clicking submit(),the application uses the JQuery $.Post() method to in-stantiate communication with the remote server usingthe IP address and the requested resource. The remoteserver implements the personalization application pro-posed in [16], which generates and returns tailored JSONbag-of-features using JsonToString(). The returned JSONtext is stored in a secure digital card or phone cache. Af-ter registration, new users can detect their daily activityby clicking on the “perform daily activities” shown in themain use case diagram in Fig. 4. The sequence of eventsare from the HTML5 onLoad() events to the done() meth-

690 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Naıve Bayes Human Activity Using Personalized Datasets

Fig. 7. HAR activity OnLoad() sequence diagram.

Fig. 8. HAR activity OnStart() sequence diagram.

ods shown in Fig. 7.The onLoad() event of HTML5 is invoked, and the

JavaScript implemented method onLoader() is called. Theapplication reads the JSON file from the secure digi-tal card using the JSON parser XMLHttpRequest ob-ject. The input JSON file containing the validated JSONHAR training dataset is presented in Fig. 2. The time-domain features (mean and standard deviation) are thencompressed for each grouped human activity categoryin JSON using the onComputeStats() JavaScript imple-mented method. The computed time-domain features arethen stored in a multi-dimensional array called classifi-cationTrainerSet using the store() method. The store()method uses the Cordova FileSystem plugin to write acomma separated values (CSV) file. Upon success, a mes-sage is displayed using the JavaScript alert() method andcompletes the execution by calling the done() method.

The sequence of HAR process activities, from start()to displayResults(), is shown in Fig. 8. When the userclicks on the start button, the application invokes theJavaScript start(). This is followed by the application in-voking a Cordova device motion plugin event watchAc-

Fig. 9. HAR activity OnStart() sequence diagram.

Fig. 10. Real-time human activity recognition applicationinterface.

celeration(), which instantiates the accelerometer using afrequency rate of 50 Hz. Upon success, the accelerome-ter randomly generates 50 tri-axial values (X , Y , Z) whilecomputing each SMV and TA using the success() method.After computing 50 SMVs and TAs, the implementedJavaScript method, called storeSMV(), is invoked to com-pute the mean and standard deviation for the 50 collectedvalues. The Gaussian distribution function computeGaus-sian() is then invoked to compute the normal distributionof SMVs and TAs against the training values in the clas-sificationTrainerSet trainer array from the previous se-quence for each human activity category. This is thenfollowed by the normalization of the Gaussian to avoid anexponentially small value using the normalizedGaussian()method. The posterior probability is computed for each

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 691and Intelligent Informatics

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

category to predict the new instance classes of the SMVand TA using Naıve Bayes. The category with the high-est Gaussian, in descending order, is taken as the highestposterior probability and is displayed using the displayRe-sults() method on the HTML5 View Activity. Finally, thesequence of events of a stop activity is shown in Fig. 9.

When an onClick() event is triggered, the applicationinvokes the JavaScript implemented method stop(), whichcalls the Cordova motion clearWatch() event to stop thegeneration of a new set of tri-axial X , Y , and Z values.The onWriteOnCSV() method is then invoked to write allresults from a previous activity to comma separated values(CSV) file and stored in the secure digital card using theCordova FileSystem plugin. The fully implemented real-time HAR screenshots are shown in Fig. 10.

HTML5, CSS, JavaScript, and JQuery were used to im-plement the design shown by the sequence diagrams fromFigs. 6–9. The users need to install a real-time humanactivity application using the Android Packaged installer(APK) on their smartphones.

4. Evaluation of Real-Time HAR Algorithm

In this section, we present an experimental evalua-tion of the implemented real-time human activity algo-rithm. Two pre-processed PAMAP2 and HSC-HAD pre-sented in [16] were used. A cross validation was ap-plied to the real-time algorithm and the results were pre-sented using a confusion matrix to portray the true pos-itive (TP), true negative (TN), false positive (FP), falsenegative (FN), and precision using Eq. (12), recall us-ing Eq. (13), accuracy using Eq. (14), and F-Score usingEq. (15). The leave-one-activity-out (LOAO) techniqueis applied, where category 1 of the features is omitted asthe test subset, and the remaining K−1 subset is used fortraining.

Precision =TP

TP+FP, . . . . . . . . . (13)

Recall =TP

TP+FN, . . . . . . . . . . (14)

Accuracy =TP+TN

TP+TN+FP+FN, . . . . (15)

Fscore = 2 · Precision ·RecallPrecision+Recall

. . . . . . (16)

The precision indicates the ratio of correctly predictedpositive observations as the TP and FP only, whereas re-call is the ratio of correctly predicted positive observationgiven the TP and FN rates. The accuracy is a simple ratioof correctly predicted observations to the total observa-tions (TP, FP, TN, and FN), whereas the F-score is theweighted average of the precision and recall and is moreuseful than the accuracy in the presence of an unevenclass distribution. We implemented method testTrainer()to simulate the LOAO based cross validation technique,

Table 2. K category threshold values.

K Value Category1 JSON.Stand2 JSON.Walking

3 JSON.ascending Stairs

4 JSON.descending Stairs5 JSON.Cycling6 JSON.ironing

7 JSON.lying

8 JSON.Nordic Walking9 JSON.running

10 JSON.rope jumping

11 JSON.sitting12 JSON.vacuum cleaning

as a part of our packaged application presented in Algo-rithm 3.

The simulated LOAO takes category K activity as thetest subset and remaining K−1 as the training subset fromthe JSON HAR dataset presented in Fig. 2. The test-Trainer() function prompts for a K value to be removedas test subset K from a collection of all activities in theJSON HAR dataset. The test K value represents a specificcategory from the JSON dataset, as shown in Table 2.

The extracted K category and the remaining K − 1 areinput into our implemented GNB function. The K − 1subset is loaded in a training multi-dimensional arraycalled classificationTrainerSet. The testTrainer() thenloops through the classificationTrainerSet to determine aclass label for the selected category K using the maximumdensity. All of the test probabilities are stored in the Test-ProbabilitySet[] array. Thereafter, the test probabilitiesare ranked in descending order and the highest density istaken as the class for that category K of instances. Theprocess is repeated five times for each selected category Kuntil all features in subset K and the training subset areused. For each tested category K, a CSV file is generatedand stored in the secure digital card in a smartphone us-ing Cordova file plug-in and JavaScript. Each CSV fileproduced is then sent to local PC using USB cable for a

692 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Naıve Bayes Human Activity Using Personalized Datasets

Table 3. Confusion matrix using personalized dataset based on tilt angles.

Stan

ding

Sitti

ng

Lay

ing

Wal

king

Asc

end

stai

rs

Des

cend

stai

rs

Run

ning

Cyc

ling

Rop

eju

mpi

ng

Iron

ing

Vac

uum

clea

ning

Nor

dic

wal

king

Tota

l

TP

FP FN TN Prec

isio

n

Rec

all

F-M

easu

re

Acc

urac

y

Standing 151 0 0 0 0 1 0 0 0 0 0 0 152 151 1 0 0 99 100 100 99

Sitting 0 144 0 7 0 1 0 0 0 0 0 0 152 144 8 0 0 95 95 95 90

Laying 0 0 152 0 0 0 0 0 0 0 0 0 152 152 0 0 0 100 100 100 100

Walking 0 0 0 138 0 14 0 0 0 0 0 0 152 138 14 0 0 91 100 95 91

Ascend stairs 0 0 0 0 142 0 0 0 0 0 0 0 142 142 0 0 0 100 100 100 100

Descend stairs 0 0 0 1 0 98 0 0 0 28 0 0 127 98 29 0 0 77 100 87 77

Running 0 0 0 0 0 2 129 0 21 0 0 0 152 129 23 0 0 85 100 92 85

Cycling 0 0 0 0 0 0 0 152 0 0 0 0 152 152 0 0 0 100 100 100 100

Rope jumping 0 0 0 0 0 0 19 0 57 0 1 0 77 57 20 0 0 74 100 85 74

Ironing 0 0 0 0 0 20 0 0 0 122 0 10 152 122 30 0 0 80 100 89 80

Vacuum cleaning 0 0 0 0 0 0 0 0 1 0 151 0 152 151 1 0 0 99 100 100 99

Nordic walking 0 0 0 4 0 10 0 0 0 19 0 119 152 119 33 0 0 78 100 88 78

Overall 90 99.6 94.18 89.51

Table 4. Confusion matrix using personalized dataset based on signal magnitude vector.

Stan

ding

Sitti

ng

Lay

ing

Wal

king

Asc

end

stai

rs

Des

cend

stai

rs

Run

ning

Cyc

ling

Rop

eju

mpi

ng

Iron

ing

Vac

uum

clea

ning

Nor

dic

wal

king

Tota

l

TP

FP FN TN Prec

isio

n

Rec

all

F-M

easu

re

Acc

urac

y

Standing 138 0 14 0 0 0 0 0 0 0 0 0 152 138 14 0 0 91 100 95 91

Sitting 45 14 66 0 0 0 0 0 0 27 0 0 152 14 138 0 0 9 100 17 9

Laying 21 3 122 0 0 0 0 0 0 6 0 0 152 122 30 0 0 80 100 89 80

Walking 0 0 0 25 0 0 0 0 0 0 0 127 152 25 127 0 0 16 100 28 16

Ascend stairs 0 0 0 92 0 0 0 0 0 0 5 45 142 0 142 0 0 0 0 0 0

Descend stairs 0 0 0 14 0 0 24 1 0 0 1 87 127 0 127 0 0 0 0 0 0

Running 0 0 0 0 0 0 152 0 0 0 0 0 152 152 0 0 0 100 100 100 100

Cycling 0 0 0 1 0 0 0 28 0 0 123 0 152 28 124 0 0 18 100 31 18

Rope jumping 0 0 0 0 0 0 58 0 19 0 0 0 77 19 58 0 0 25 100 40 25

Ironing 0 0 0 0 0 0 0 0 0 0 152 0 152 0 152 0 0 0 0 0 0

Vacuum cleaning 0 0 0 0 0 0 0 2 0 0 150 0 152 150 2 0 0 99 100 99 99

Nordic walking 0 0 0 0 0 0 146 0 0 0 0 6 152 6 146 0 0 4 100 8 4

Overall 36.9 75 42 36.9

manual analysis (see Appendix A).

4.1. Results and DiscussionIn this section, we present comparison results of the

implemented real-time GNB using our proposed person-alized TA dataset as compared to the commonly usedSMV feature [21–23]. Both datasets are extracted fromthe USC-HAD and PAMAP2 datasets. We ran five sets ofexperimental trials for each selected category. The GNBalgorithm consistently produced the same results for eachcategory of human activity summarized in the confusionmatrices in Tables 3 and 4.

The categories shaded in yellow in both Tables 3 and 4are the total activities (i.e., 152 activities with eight time-domain features each), whereas the green shaded cate-gories have a total of 142, 127, and 77 activities, re-spectively. The GNB correctly detected laying, cycling,and ascending stairs activities with precision, recall, ac-

curacy, and F-measure rates of 100%, as shown in Ta-ble 3, when the T time-domain features are used. Table 3also shows that the GNB scored a balanced precision, ac-curacy, and F-measure of 99%, 99%, 95%, 91%, 85%,and 80% with a lower false-positive rate on standing, vac-uum cleaning, sitting, walking, running, and ironing ac-tivities. Further, the GNB scored a lower precision, ac-curacy, and F-measure of 74%, 77%, and 78% on ropejumping, descending stairs, and Nordic walking becausethey have an overlapping gait similarity with other activi-ties such as walking, running, and standing, although witha reasonable number of FPs. Overall, the precision, re-call, F-measure, and accuracy of the GNB using the TAfeatured dataset are 90%, 99.6%, 94.18%, and 89.51%,respectively, as compared to lower rates of 36.9%, 75%,42%, and 36.9%, respectively, if the GNB uses the com-monly used SMV featured dataset, as shown in Table 4.Both Tables 3 and 4 representing the three metrics of ac-

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 693and Intelligent Informatics

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

Fig. 11. Accuracy comparison of Gausian Naıve Bayes ontilt angles against signal magnitude vector.

Table 5. Three randomly selected validation user profiles.

User Height in meters Weight in kg BMIA 175 68 22.20B 169 70 24.51C 170 72 24.91

curacy, precision, and F-Score are summarized in Fig. 11.Figure 11 shows that out of 12 activities, GNB with

SMV features managed to correctly recognized only four(i.e., standing, running, laying, and vacuum cleaning atan accuracy rate of above 80%), and incorrectly detectedthe remaining eight with a low accuracy, precision, andF-measure of between 36.5% and 42%. The GNB usingthe TA pre-processed featured dataset for all 12 activitiessurpassed the GNB with the SMV featured dataset at afrequency rate of 16% to 84%. We are therefore confidentthat the implemented GNB algorithm with our proposedTA dataset produces satisfactory results as compared tothe GNB with the SMV dataset. The results of the com-parison therefore unlock the potential in predicting hu-man activities in a resource-constrained environment inreal time using a reduced personalized dataset.

4.2. Validation of the ResultsTo validate the results of the experimentation, we ran-

domly selected three user profiles and employed our per-sonification algorithm proposed in [16] to extract three TApersonalized bag-of-features datasets from the PAMAP2and HSC-HAD datasets, as shown in Table 5.

For each user profile, we ran our personification al-gorithm five times [16]. The personification algorithmconsistently produced 5 identical personalized bag-of-features datasets for each profile containing 302 recordsfor each 12 human activities, as shown in Table 6.

All human activity category records were cappedto 302 rows, allowing each category to have similar bal-anced instances. Thereafter, we simulated four state-of-

Table 6. Three personalized tilt angle extracted bag of features.

Person A Person B Person C

Table 7. Four algorithms selected for simulations.

Algorithm R Package used Method

C4.5RWeka is the Weka packageof R that allows R to useWeka methods.

J48

Naıve Bayes Caret nb

Support VectorMachine Caret svmLinear

K NearestNeighbor Caret knn

the-art classification algorithms, namely, a Support Vec-tor Machine (SVM), K Nearest Neighbor (KNN), NaıveBayes (NB), and C4.5 using the classification and regres-sion training (Caret) package created by Kuhn [32] andRWeka in an R environment. RWeka is an R library forWeka applied to the implementation of the C4.5 algo-rithm, as shown in Table 7.

To meet the requirements of R, we converted all threeextracted personalized bag-of-features files into threeproper CSV files. We then ran the simulation for each se-lected algorithm (SVM, C4.5, KNN, and NB) five timeson each of three personalized bag-of-features. The re-sults of the simulation are given in 12 confusion matrixtables for each simulated algorithm, as described in Ap-pendix B, and the overall performances in terms of accu-racy, precision, recall, and F-measure for each algorithm(SVM, C4.5, KNN, and NB) are summarized in Table 8.

Using the dataset extracted using profiles A, B, and C,the experiment results reveal that the KNN, SVM, andC4.5 models performed well below 65% as comparedwith the higher precision, accuracy, and recall, and theF-measure of above 90%, of the NB algorithm. The KNNfor each category of human activity recorded the high-est incorrectly identified activities at between 54 and 176

694 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Naıve Bayes Human Activity Using Personalized Datasets

Table 8. Overall SVM, C4.5, KNN, and NB performance.

Profile Technique Accuracy Precision Recall F-measureA KNN 0.63 0.9 0.73 0.6B KNN 0.62 0.9 0.73 0.59C KNN 0.62 0.91 0.72 0.59A SVM 0.55 0.77 0.63 0.52B SVM 0.63 0.67 0.59 0.49C SVM 0.58 0.68 0.6 0.5A C4.5 0.71 1 0.82 0.71B C4.5 0.59 0.8 0.67 0.55C C4.5 0.62 0.87 0.71 0.58A NB 0.98 0.99 0.98 0.97B NB 0.97 0.98 0.98 0.96C NB 0.97 0.99 0.98 0.96

with FNs ranging from 3 to 60, followed by the SVM,which recorded FPs and FNs ranging from 48 to 149 and 0to 302, respectfully, as described in Appendix B. Then,C4.5 recorded FPs of between 33 and 152 out of 302 hu-man activities for each category. However, the NB algo-rithm reflected a low number of false positive activities ofless than 26 and a higher count of correctly (true positives)identified activities of above 290. However, C4.5 was sec-ond behind the NB method after recording zero FN activ-ities out of 302 records for each category using the pro-file A dataset. The overall precision, accuracy, recall, andF-measure results in Table 8 are summarized in Fig. 12for each classification algorithm (KNN, SVM, C4.5, andNB).

The results shown in Fig. 12 indicate that the NB out-performs all other models by showing a steady preci-sion, accuracy, and recall, as well as an F-measure ofabove 90%, for all user profiles, whereas, the other mod-els recorded an F-measure of below 0.42%. Overall, theNB model reveals a balanced accuracy, precision, and bal-anced F-measure owing to low FNs and FPs. The valida-tion results also indicate that the NB model as a relatedapproach to our implemented GNB is compatible to ourproposed personalized TA dataset when compared to theother algorithms (SVM, C4.5, and KNN). This is due tothe fact that both the GNB and NB algorithms performwell with a small dataset with missing values [27, 28, 33,34]. The results validate and confirm that our proposedTA dataset is suitable for both the GNB and NB models.Moreover, the personalized reduced dataset can improveand optimize smartphone resources under limited mem-ory, storage, and CPU constraints.

5. Conclusion and Future Work

In this study, we proposed and implemented a real-timeGNB algorithm using HTML5, Cordova, and JavaScript.The algorithm utilizes a JSON personalized bag-of-features dataset (comprising 3D sensor data and hu-man activity labels) extracted from the publicly availabledataset PAMAP2 and the HSC-HAD dataset to address

Fig. 12. Performance of four algorithms on A, B and C userprofiles.

the shortage of personalized datasets. Previous techniqueshave used only the SMV to eliminate the orientation prob-lems; however, with this approach, we computed the TAsbetween the tri-axial (X , Y , Z) of the accelerometer pointusing the directional cosine in degrees to classify hu-man activities and reduce the gap between the lower level(laying, sitting, standing, and walking) human activities,and advanced human activities (running, Nordic walking,ironing, vacuum cleaning, and ascending and descendingstairs).

Moreover, we implemented a LOAO cross validationas a testTrainer() real-time method for use on a smart-phone. For the simulation and validation of the pro-posed GNB algorithm, we compared the performance ofGNB using features with TAs against features extractedusing SMV. The testTrainer() method produced a CSVfor all test probability results for each selected category.We tested the implemented GNB using the testTrainer()method where category K is selected as the test subsetand K − 1 is selected as the training subset. On each oc-casion, the test results of testTrainer() are stored in a Test-ProbabilitySet[] array, which is later ranked in descend-ing order, and a test with a higher probability density istaken as the class for the new instances of values. Theprocess was repeated five times for each category K, andthe results of the comparison were summarized in a con-fusion matrix with each tested category, i.e., TP, FP, FN,TN, precision, recall, accuracy, and F-measure. It wasfound from the comparison results that when the GNBuses the time-domain features of the TA, it is more ef-fective in detecting all 12 human activities with a highprecision, recall, F-measure, and accuracy than when theSMV dataset is used. The results of the simulation con-ducted using Caret and RWeka in R proved that the GNBand Naıve Bayes algorithms are superior to the other al-gorithms (SVM, C4.5, and KNN) when using our person-alized TA dataset. In the future, we intend to conductexperiments using real subjects to test our implementedalgorithm using real-time accelerometer values. We also

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 695and Intelligent Informatics

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

plan to determine whether we can effectively identify thedaily human activities of new users using the GNB algo-rithm in real-time on a smartphone.

AcknowledgementsWe would like to thank the Faculty of Information Communi-cation Technology and the Department of Computer Science,Tshwane University of Technology for the financial and other lo-gistical support for this project. This project was ethically ap-proved by the Faculty Committee of the Research Ethics, TshwaneUniversity of Technology (FCRE ICT Ref#2016=06=001(2)=GadebeML).

References:[1] H. Ellekjær, J. Holmen, E. Ellekjær, and L. Vatten, “Physical ac-

tivity and stroke mortality in women: ten-year follow-up of theNord-Trøndelag health survey, 1984–1986,” Stroke, Vo.31, No.1,pp. 14-18, 2000.

[2] F. B. Hu, M. J. Stampfer, G. A. Colditz, A. Ascherio, K. M.Rexrode, W. C. Willett, and J. E. Manson, “Physical activity andrisk of stroke in women,” J. of the American Medical Association,Vol.283, No.22, pp. 2961-2967, 2000.

[3] I. M. Lee, K. M. Rexrode, N. R. Cook, J. E. Manson, and J. E.Buring, “Physical activity and coronary heart disease in women: Is“no pain, no gain” passe?,” J. of the American Medical Association,Vol.285, No.11, pp. 1447-1454, 2001.

[4] S. Zhang, “Smartphone Based Activity Recognition System,” Ph.D.dissertation, The Ohio State University, 2012.

[5] J. W. Lockhart, T. Pulickal, and G. M. Weiss, “Applications of mo-bile activity recognition,” Proc. of the 2012 ACM Conf. on Ubiqui-tous Computing, pp. 1054-1058, 2012.

[6] X. Su, H. Tong, and P. Ji, “Activity recognition with smartphonesensors,” Tsinghua Science and Technology, Vo.19, No.3, pp. 235-249, 2014.

[7] V. V. Quang, M. T. Hoang, and D. Choi, “Personalization in mobileactivity recognition system using K-medoids clustering algorithm,”Int. J. of Distributed Sensor Networks, Vol.9, No.7, 2013.

[8] J. W. Lockhart and G. M. Weiss, “Limitations with activity recog-nition methodology & data sets,” Proc. of the 2014 ACM Int. JointConf. on Pervasive and Ubiquitous Computing: Adjunct Publica-tion, pp. 747-756, 2014.

[9] A. Reiss and D. Stricker, “Introducing a new benchmarked datasetfor activity monitoring,” 2012 16th Int. Symp. on Wearable Com-puters, pp. 108-109, 2012.

[10] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz,“A public domain dataset for human activity recognition usingsmartphones,” Proc. of European Symp. on Artificial Neural Net-works, Computational Intelligenceand Machine Learning, pp. 437-442, 2013.

[11] O. Banos, R. Garcia, J. A. Holgado-Terriza, M. Damas, H. Po-mares, I. Rojas, A. Saez, and C. Villalonga, “mHealthDroid: anovel framework for agile development of mobile health applica-tions,” Int. Workshop on Ambient Assisted Living, Lecture Notesin Computer Science, Vol.8868, pp. 91-98, 2014.

[12] J.-L. Reyes-Ortiz, L. Oneto, A. Sama, X. Parra, and D. Anguita,“Transition-aware human activity recognition using smartphones,”Neurocomputing, Vol.171, pp. 754-767, 2016.

[13] C. A. Martins, M. C. Monard, and E. T. Matsubara, “Reducing thedimensionality of bag-of-words text representation used by learningalgorithms,” Proc. of 3rd IASTED Int. Conf. on Artificial Intelli-gence and Applications, pp. 228-233, 2003.

[14] Z. Yan, V. Subbaraju, D. Chakraborty, A. Misra, and K. Aberer,“Energy-efficient continuous activity recognition on mobile phones:An activity-adaptive approach,” 2012 16th Int. Symp. on WearableComputers, pp. 17-24, 2012.

[15] L. T. Nguyen, M. Zeng, P. Tague, and J. Zhang, “Recognizing newactivities with limited training data,” Proc. of the 2015 ACM Int.Symp. on Wearable Computers, pp. 67-74, 2015.

[16] M. L. Gadebe and O. P. Kogeda, “Personification of Bag-of-Features Dataset for Real Time Activity Recognition,” 2016 3rd Int.Conf. on Soft Computing & Machine Intelligence (ISCMI), pp. 73-78, 2016.

[17] S. Harous, M. El Menshawy, M. Adel Serhani, and A. Benharref,“Mobile health architecture for obesity management using sensoryand social data,” Informatics in Medicine Unlocked, Vol.10, pp. 27-44, 2018.

[18] S. Harous, M. Adel Serhani, M. El Menshawy, and A. Benharref,“Hybrid obesity monitoring model using sensors and communityengagement,” 2017 13th Int. Wireless Communications and MobileComputing Conf. (IWCMC), pp. 888-893, 2017.

[19] M. Zhang and A. A. Sawchuk, “USC-HAD: a daily activity datasetfor ubiquitous activity recognition using wearable sensors,” Proc.of the 2012 ACM Conf. on Ubiquitous Computing, pp. 1036-1043,2012.

[20] J. de Jong, “JSON Editor Online 3.7.5,” http://www.jsoneditoronline.org/ [accessed June 15, 2019]

[21] H. Junker, P. Lukowicz, and G. Troster, “Sampling frequency, sig-nal resolution and the accuracy of wearable context recognition sys-tems,” 8th Int. Symp. on Wearable Computers, Vol.1, pp. 176-177,2004.

[22] D. Figo, P. C. Diniz, D. R. Ferreira, and J. M. P. Cardoso, “Pre-processing techniques for context recognition from accelerometerdata,” Personal and Ubiquitous Computing, Vol.14, No.7, pp. 645-662, 2010.

[23] F. Miao, Y. He, J. Liu, Y. Li, and I. Ayoola, “Identifying typicalphysical activity on smartphone with varying positions and orien-tations,” Biomedical Engineering Online, Vol.14, Article No.32,2015.

[24] C. Maher, J. Ryan, C. Ambrosi, and S. Edney, “Users’ experiencesof wearable activity trackers: a cross-sectional study,” BioMed Cen-tral J. of Public Health, Vol.17, Article No.880, 2017.

[25] Apache Commons Math, 2016, http://commons.apache.org/proper/commons-math/userguide/stat.html [accessed June 15, 2019]

[26] D. J. Hand and K. Yu, “Idiot’s Bayes: not so stupid after all?,” Int.Statistical Review, Vol.69, No.3, pp. 385-398, 2001.

[27] P. Langley, W. Iba, and K. Thompson, “An analysis of Bayesianclassifiers,” Proc. of the 10th National Conf. on Artificial Intelli-gence, Vol.90, pp. 223-228. 1992.

[28] P. Langley and S. Sage, “Induction of selective Bayesian classi-fiers,” Proc. of the 10th Int. Conf. on Uncertainty in Artificial In-telligence (UAI), pp. 399-406, 1994.

[29] V. N. Inukollu, D. D. Keshamoni, T. Kang, and M. Inukollu, “Fac-tors influencing quality of mobile apps: Role of mobile app devel-opment life cycle,” arXiv preprint, arXiv:1410.4537, 2014.

[30] A. G. Parada, T. A. Alves, and L. Brisolara, “Modeling Androidapplications using UML,” Proc. of 27th SIM – South Symp. on Mi-croelectronics, pp. 1-4, 2012.

[31] J. Jain, “Apache Cordova: Powerful Framework for Hybrid MobileApp Development,” 2016, http://www.codeproject.com/Articles/1069661/Apache-Cordova-Powerful-Framework-for-Hybrid-Mobil [accessed June 15, 2019]

[32] M. Kuhn, “A Short Introduction to the caret Package,” R Found StatComput, pp. 1-10, 2015.

[33] J. Huang, J. Lu, and C. X. Ling, “Comparing naive Bayes, decisiontrees, and SVM with AUC and accuracy,” 3rd IEEE Int. Conf. onData Mining, pp. 553-556, 2003.

[34] I. Kononenko, “Machine learning for medical diagnosis: history,state of the art and perspective,” Artificial Intelligence in Medicine,Vol.23, No.1, pp. 89-109, 2001.

Appendix A.

Figure 13 shows the summary of all real-time experi-mentation files.

Appendix B.

Tables 9–20 show comparison results of KNN, C4.5,Support Vector Machines and Naıve Bayes using person-alized datasets of profile A, B, and C.

696 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Naıve Bayes Human Activity Using Personalized Datasets

Fig. 13. Summarized human activity experiments files.

Table 9. K Nearest Neighbor prediction using person A dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 143 126 33 0 0.53 0.81 0.64 0.47Cycling 302 177 75 50 0 0.70 0.78 0.74 0.59Descending stairs 302 144 98 60 0 0.60 0.71 0.65 0.48Ironing 302 160 165 0 23 0.49 1.00 0.66 0.53Lying 302 206 98 0 2 0.68 1.00 0.81 0.68Nordic walking 302 229 74 0 1 0.76 1.00 0.86 0.76Rope jumping 302 232 54 16 0 0.81 0.94 0.87 0.77Running 302 228 62 12 0 0.79 0.95 0.86 0.75Sitting 302 131 176 0 5 0.43 1.00 0.60 0.44Standing 302 79 173 50 0 0.31 0.61 0.41 0.26Vacuum cleaning 302 239 97 0 34 0.71 1.00 0.83 0.74Walking 302 211 88 3 0 0.71 0.99 0.82 0.70

Overall 0.63 0.90 0.73 0.60

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 697and Intelligent Informatics

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

Table 10. K Nearest Neighbor prediction using person B dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 113 147 42 0 0.43 0.73 0.54 0.37Cycling 302 213 76 13 0 0.74 0.94 0.83 0.71Descending stairs 302 129 89 84 0 0.59 0.61 0.60 0.43Ironing 302 163 149 0 10 0.52 1.00 0.69 0.54Lying 302 203 92 7 0 0.69 0.97 0.80 0.67Nordic walking 302 197 109 0 4 0.64 1.00 0.78 0.65Rope jumping 302 229 38 35 0 0.86 0.87 0.86 0.76Running 302 249 60 0 7 0.81 1.00 0.89 0.81Sitting 302 132 165 5 0 0.44 0.96 0.61 0.44Standing 302 107 178 17 0 0.38 0.86 0.52 0.35Vacuum cleaning 302 232 88 0 18 0.73 1.00 0.84 0.74Walking 302 181 96 25 0 0.65 0.88 0.75 0.60

Overall 0.62 0.90 0.73 0.59

Table 11. K Nearest Neighbor prediction using person C dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 141 166 0 5 0.46 1.00 0.63 0.47Cycling 302 224 55 23 0 0.80 0.91 0.85 0.74Descending stairs 302 97 97 108 0 0.50 0.47 0.49 0.32Ironing 302 135 173 0 6 0.44 1.00 0.61 0.45Lying 302 205 101 0 4 0.67 1.00 0.80 0.67Nordic walking 302 202 104 0 4 0.66 1.00 0.80 0.66Rope jumping 302 247 24 31 0 0.91 0.89 0.90 0.82Running 302 264 50 0 12 0.84 1.00 0.91 0.85Sitting 302 136 170 0 4 0.44 1.00 0.62 0.45Standing 302 110 167 25 0 0.40 0.81 0.53 0.36Vacuum cleaning 302 234 90 0 22 0.72 1.00 0.84 0.74Walking 302 167 104 31 0 0.62 0.84 0.71 0.55

Overall 0.62 0.91 0.72 0.59

Table 12. C4.5 prediction using person A dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 150 152 0 0 0.50 1.00 0.66 0.50Cycling 302 206 96 0 0 0.68 1.00 0.81 0.68Descending stairs 302 197 105 0 0 0.65 1.00 0.79 0.65Ironing 302 207 95 0 0 0.69 1.00 0.81 0.69Lying 302 234 68 0 0 0.77 1.00 0.87 0.77Nordic walking 302 255 47 0 0 0.84 1.00 0.92 0.84Rope jumping 302 261 41 0 0 0.86 1.00 0.93 0.86Running 302 239 63 0 0 0.79 1.00 0.88 0.79Sitting 302 216 86 0 0 0.72 1.00 0.83 0.72Standing 302 104 198 0 0 0.34 1.00 0.51 0.34Vacuum cleaning 302 269 33 0 0 0.89 1.00 0.94 0.89Walking 302 246 56 0 0 0.81 1.00 0.90 0.81

Overall 0.71 1.00 0.82 0.71

698 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Naıve Bayes Human Activity Using Personalized Datasets

Table 13. C4.5 prediction using person B dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 74 102 126 0 0.42 0.37 0.39 0.25Cycling 302 211 71 20 0 0.75 0.91 0.82 0.70Descending stairs 302 102 59 141 0 0.63 0.42 0.50 0.34Ironing 302 193 215 0 106 0.47 1.00 0.64 0.58Lying 302 194 126 0 18 0.61 1.00 0.75 0.63Nordic walking 302 216 137 0 51 0.61 1.00 0.76 0.66Rope jumping 302 223 45 34 0 0.83 0.87 0.85 0.74Running 302 241 83 0 22 0.74 1.00 0.85 0.76Sitting 302 116 137 49 0 0.46 0.70 0.56 0.38Standing 302 56 116 130 0 0.33 0.30 0.31 0.19Vacuum cleaning 302 235 136 0 69 0.63 1.00 0.78 0.69Walking 302 192 116 0 6 0.62 1.00 0.77 0.63

Overall 0.59 0.80 0.77 0.55

Table 14. C4.5 prediction using person C dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 154 168 0 20 0.48 1.00 0.65 0.51Cycling 302 229 62 11 0 0.79 0.95 0.86 0.76Descending stairs 302 73 72 157 0 0.50 0.32 0.39 0.24Ironing 302 125 173 4 0 0.42 0.97 0.59 0.41Lying 302 175 77 50 0 0.69 0.78 0.73 0.58Nordic walking 302 208 104 0 10 0.67 1.00 0.80 0.68Rope jumping 302 241 15 46 0 0.94 0.84 0.89 0.80Running 302 276 59 0 33 0.82 1.00 0.90 0.84Sitting 302 160 184 0 42 0.47 1.00 0.63 0.52Standing 302 101 164 37 0 0.38 0.73 0.50 0.33Vacuum cleaning 302 253 121 0 72 0.68 1.00 0.81 0.73Walking 302 184 98 20 0 0.65 0.90 0.76 0.61

Overall 0.62 0.87 0.71 0.58

Table 15. Support Vector Machine prediction using person A dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 153 149 0 0 0.51 1.00 0.67 0.51Cycling 302 176 130 0 4 0.58 1.00 0.73 0.58Descending stairs 302 144 134 24 0 0.52 0.86 0.65 0.48Ironing 302 57 119 126 0 0.32 0.31 0.32 0.19Lying 302 256 50 0 4 0.84 1.00 0.91 0.84Nordic walking 302 220 120 0 38 0.65 1.00 0.79 0.68Rope jumping 302 203 48 51 0 0.81 0.80 0.80 0.67Running 302 218 71 13 0 0.75 0.94 0.84 0.72Sitting 302 64 116 122 0 0.36 0.34 0.35 0.21Standing 302 0 0 302 0 0.00 0.00 0.00 0.00Vacuum cleaning 302 234 134 0 66 0.64 1.00 0.78 0.69Walking 302 194 125 0 17 0.61 1.00 0.76 0.63

Overall 0.55 0.77 0.63 0.52

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 699and Intelligent Informatics

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

Table 16. Support Vector Machine prediction using person B dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 88 122 92 0 0.42 0.49 0.45 0.29Cycling 302 216 85 1 0 0.72 1.00 0.83 0.72Descending stairs 302 126 104 72 0 0.55 0.64 0.59 0.42Ironing 302 36 64 202 0 0.36 0.15 0.21 0.12Lying 302 270 50 0 18 0.84 1.00 0.92 0.85Nordic walking 302 193 106 3 0 0.65 0.98 0.78 0.64Rope jumping 302 231 32 39 0 0.88 0.86 0.87 0.76Running 302 238 43 21 0 0.85 0.92 0.88 0.79Sitting 302 3 41 258 0 0.07 0.01 0.02 0.01Standing 302 1 0 301 0 1.00 0.00 0.01 0.00Vacuum cleaning 302 208 119 0 25 0.64 1.00 0.78 0.66Walking 302 184 135 0 17 0.58 1.00 0.73 0.60

Overall 0.63 0.67 0.59 0.49

Table 17. Support Vector Machine prediction using person C dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 160 206 0 64 0.44 1.00 0.61 0.52Cycling 302 227 66 9 0 0.77 0.96 0.86 0.75Descending stairs 302 69 104 129 0 0.40 0.35 0.37 0.23Ironing 302 33 64 205 0 0.34 0.14 0.20 0.11Lying 302 244 50 0 8 0.83 1.00 0.91 0.83Nordic walking 302 194 86 22 0 0.69 0.90 0.78 0.64Rope jumping 302 235 12 55 0 0.95 0.81 0.88 0.78Running 302 269 54 0 21 0.83 1.00 0.91 0.84Sitting 302 26 89 187 0 0.23 0.12 0.16 0.09Standing 302 3 11 288 0 0.21 0.01 0.02 0.01Vacuum cleaning 302 217 97 0 12 0.69 1.00 0.82 0.70Walking 302 154 130 18 0 0.54 0.90 0.68 0.51

Overall 0.58 0.68 0.60 0.50

Table 18. Naıve Bayes prediction using person A dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 295 6 1 0 0.98 1.00 0.99 0.98Cycling 302 295 10 0 3 0.97 1.00 0.98 0.97Descending stairs 302 295 1 6 0 1.00 0.98 0.99 0.98Ironing 302 296 4 2 0 0.99 0.99 0.99 0.98Lying 302 293 4 5 0 0.99 0.98 0.98 0.97Nordic walking 302 295 14 0 7 0.95 1.00 0.98 0.96Rope jumping 302 293 5 4 0 0.98 0.99 0.98 0.97Running 302 293 3 6 0 0.99 0.98 0.98 0.97Sitting 302 294 2 6 0 0.99 0.98 0.99 0.97Standing 302 293 2 7 0 0.99 0.98 0.98 0.97Vacuum cleaning 302 291 25 0 14 0.92 1.00 0.96 0.92Walking 302 300 8 0 6 0.97 1.00 0.99 0.97

Overall 0.98 0.99 0.98 0.97

700 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Naıve Bayes Human Activity Using Personalized Datasets

Table 19. Naıve Bayes prediction using person B dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 296 2 4 0 0.99 0.99 0.99 0.98Cycling 302 298 9 0 5 0.97 1.00 0.99 0.97Descending stairs 302 298 6 0 2 0.98 1.00 0.99 0.98Ironing 302 295 11 0 4 0.96 1.00 0.98 0.96Lying 302 280 6 16 0 0.98 0.95 0.96 0.93Nordic walking 302 300 19 0 17 0.94 1.00 0.97 0.94Rope jumping 302 282 4 16 0 0.99 0.95 0.97 0.93Running 302 294 2 6 0 0.99 0.98 0.99 0.97Sitting 302 292 1 9 0 1.00 0.97 0.98 0.97Standing 302 289 3 10 0 0.99 0.97 0.98 0.96Vacuum cleaning 302 296 31 0 25 0.91 1.00 0.95 0.91Walking 302 298 6 0 2 0.98 1.00 0.99 0.98

Overall 0.97 0.98 0.98 0.96

Table 20. Naıve Bayes prediction using person C dataset.

Total TP FP FN TN Precision Recall F-Measure AccuracyAscending stairs 302 296 10 0 4 0.97 1.00 0.98 0.97Cycling 302 291 18 0 7 0.94 1.00 0.97 0.94Descending stairs 302 297 6 0 1 0.98 1.00 0.99 0.98Ironing 302 293 5 4 0 0.98 0.99 0.98 0.97Lying 302 291 1 10 0 1.00 0.97 0.98 0.96Nordic walking 302 300 20 0 18 0.94 1.00 0.97 0.94Rope jumping 302 278 4 20 0 0.99 0.93 0.96 0.92Running 302 292 5 5 0 0.98 0.98 0.98 0.97Sitting 302 293 0 9 0 1.00 0.97 0.98 0.97Standing 302 294 3 5 0 0.99 0.98 0.99 0.97Vacuum cleaning 302 293 20 0 11 0.94 1.00 0.97 0.94Walking 302 299 9 0 6 0.97 1.00 0.99 0.97

Overall 0.97 0.99 0.98 0.96

Vol.24 No.5, 2020 Journal of Advanced Computational Intelligence 701and Intelligent Informatics

Gadebe, M. L., Kogeda, O. P., and Ojo, S. O.

Name:Moses L. Gadebe

Affiliation:Lecturer, Tshwane University of Technology

Address:Department of Computer Science, Faculty of ICT, Tshwane University ofTechnology, Private Bag X680, Pretoria 0001, South AfricaBrief Biographical History:2000-2003 Web Developer, Megawatt Park, Eskom2003- Lecturer, Tshwane University of TechnologyMain Works:• “Discoverable Digital Clone Repository for Improved KnowledgeTransfer and Productivity,” Int. J. of Modeling and Optimization,doi: 10.7763/IJMO.2013.V3.329, 2013.• “Personalized Real Time Human Activity Recognition,” Proc. of IEEE5th Int. Conf. on Soft Computing & Machine Intelligence (ISCMI 2018),2018.Membership in Academic Societies:• Southern African Association for Institutional Research (SAAIR)

Name:Okuthe P. Kogeda

Affiliation:Associate Professor, Department of ComputerScience & Informatics, University of the FreeState

Address:P.O. Box 339, Bloemfontein 9300, South AfricaBrief Biographical History:1999-2000 Lecturer, University of Nairobi2004-2009 Lecturer, University of the Western Cape2009-2011 Senior Lecturer, University of Fort Hare2011-2019 Senior Lecturer, Tshwane University of Technology2019- Associate Professor, University of the Free StateMain Works:• “Modelling of Reliable Service Based Operations Support System(MORSBOSS),” LAP Lambert Academic Publishing GmbH & Co. KG,2008.Membership in Academic Societies:• The Institute of Information Technology Professionals South Africa(IITPSA)• International Association of Engineers (IAENG)• The Institute of Electrical and Electronic Engineers (IEEE)

Name:Sunday O. Ojo

Affiliation:Tshwane University of Technology

Address:State Artillery Road, Pretoria West, Pretoria 0001, South AfricaBrief Biographical History:1980- University of Ibadan1985- Ph.D. in Computing Science from University of Glasgow1991- University of Botswana2008- Tshwane University of TechnologyMain Works:• “Context-Driven Corpus-Based Model for Automatic Text Segmentationand Part of Speech Tagging in Setswana Using OpenNLP Tool,” G. Bellaand P. Bouquet (Eds), “Modeling and Using Context,” CONTEXT 2019,Lecture Notes in Computer Science, Vol.11939, 2019.• “A Serendipitous Research Paper Recommender System,” Int. J. ofBusiness and Management Studies, Vol.11, No.1, pp. 38-53, 2019.• “Hybrid Service Discovery Approach for Peer-to-peer Web ServiceSystems,” Proc. of SAI Intelligent Systems Conf., Lecture Notes inNetworks and Systems, Vol.16, pp. 281-293, 2016.Membership in Academic Societies:• Association of Computer Machinery• The Computer Society of South Africa (CSSA)

702 Journal of Advanced Computational Intelligence Vol.24 No.5, 2020and Intelligent Informatics

Powered by TCPDF (www.tcpdf.org)