Investigation of Using CAPTCHA Keystroke Dynamics ... - MDPI

21
Citation: Alamri, E.K.; Alnajim, A.M.; Alsuhibany, S.A. Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks. Future Internet 2022, 14, 82. https://doi.org/ 10.3390/fi14030082 Academic Editor: Cheng-Chi Lee Received: 24 February 2022 Accepted: 8 March 2022 Published: 10 March 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). future internet Article Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks Emtethal K. Alamri 1, * , Abdullah M. Alnajim 1 and Suliman A. Alsuhibany 2 1 Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia; [email protected] 2 Department of Computer Science, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia; [email protected] * Correspondence: [email protected] Abstract: Phishing is a cybercrime that is increasing exponentially day by day. In phishing, a phisher employs social engineering and technology to misdirect victims towards revealing their personal information, which can then be exploited. Despite ongoing research to find effective anti- phishing solutions, phishing remains a serious security problem for Internet users. In this paper, an investigation of using CAPTCHA keystroke dynamics to enhance the prevention of phishing attacks was presented. A controlled laboratory experiment was conducted, with the results indicating the proposed approach as highly effective in protecting online services from phishing attacks. The results showed a 0% false-positive rate and 17.8% false-negative rate. Overall, the proposed solution provided a practical and effective way of preventing phishing attacks. Keywords: phishing attacks; keystroke dynamics; text-based CAPTCHA; authentication 1. Introduction Due to the increasing use of the Internet in many aspects of modern life, the number and complexity of attacks on cyber-security have also been increasing exponentially day by day, making it difficult to identify, analyse, and regulate the relevant risk events [1]. Cyber-attacks are defined as any digital attempt to steal, disrupt, or gain unauthorised access to the computing environment/infrastructure so that controlled information can be stolen [2]. Attempts of this kind always involve unauthorised access to sensitive data, whether personal or organisational, thereby violating the confidentiality, integrity, and availability of that information. Consequently, companies and institutions are obliged to focus on ensuring the security of their own online services because an attack of any kind can have long-term effects, giving rise to severe financial losses, as well as the loss of customers’ trust. Phishing, which is considered to be one of the main types of cyber-attacks faced by online service users, is a dangerous and increasingly common phenomenon. Lastdrager has defined phishing as “a scalable act of deception whereby impersonation is used to obtain information from a target” [3]. The term was coined as a serious cyber threat in 1996, when phishers stole information about the credentials of America Online (AOL) users [4]. Since this attack on AOL, phishers have continued to change and develop their methods of attacking higher-value targets. In a phishing attack, the user is usually asked to log onto a fake website—which mimics a legitimate website—by opening a malicious email attachment. When a user fails to recognise this as a phishing attempt and inputs his or her log-in information, the phisher captures the log-in credentials, credit card information, etc. for the user’s account. In July of 2021, the Anti-Phishing Working Group (APWG) reported 260,642 phishing websites [5]. According to the APWG, this is the highest monthly total in its reporting Future Internet 2022, 14, 82. https://doi.org/10.3390/fi14030082 https://www.mdpi.com/journal/futureinternet

Transcript of Investigation of Using CAPTCHA Keystroke Dynamics ... - MDPI

�����������������

Citation: Alamri, E.K.; Alnajim, A.M.;

Alsuhibany, S.A. Investigation of

Using CAPTCHA Keystroke

Dynamics to Enhance the Prevention

of Phishing Attacks. Future Internet

2022, 14, 82. https://doi.org/

10.3390/fi14030082

Academic Editor: Cheng-Chi Lee

Received: 24 February 2022

Accepted: 8 March 2022

Published: 10 March 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional affil-

iations.

Copyright: © 2022 by the authors.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

future internet

Article

Investigation of Using CAPTCHA Keystroke Dynamics toEnhance the Prevention of Phishing AttacksEmtethal K. Alamri 1,* , Abdullah M. Alnajim 1 and Suliman A. Alsuhibany 2

1 Department of Information Technology, College of Computer, Qassim University,Buraydah 51452, Saudi Arabia; [email protected]

2 Department of Computer Science, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia;[email protected]

* Correspondence: [email protected]

Abstract: Phishing is a cybercrime that is increasing exponentially day by day. In phishing, aphisher employs social engineering and technology to misdirect victims towards revealing theirpersonal information, which can then be exploited. Despite ongoing research to find effective anti-phishing solutions, phishing remains a serious security problem for Internet users. In this paper,an investigation of using CAPTCHA keystroke dynamics to enhance the prevention of phishingattacks was presented. A controlled laboratory experiment was conducted, with the results indicatingthe proposed approach as highly effective in protecting online services from phishing attacks. Theresults showed a 0% false-positive rate and 17.8% false-negative rate. Overall, the proposed solutionprovided a practical and effective way of preventing phishing attacks.

Keywords: phishing attacks; keystroke dynamics; text-based CAPTCHA; authentication

1. Introduction

Due to the increasing use of the Internet in many aspects of modern life, the numberand complexity of attacks on cyber-security have also been increasing exponentially dayby day, making it difficult to identify, analyse, and regulate the relevant risk events [1].Cyber-attacks are defined as any digital attempt to steal, disrupt, or gain unauthorisedaccess to the computing environment/infrastructure so that controlled information canbe stolen [2]. Attempts of this kind always involve unauthorised access to sensitive data,whether personal or organisational, thereby violating the confidentiality, integrity, andavailability of that information. Consequently, companies and institutions are obligedto focus on ensuring the security of their own online services because an attack of anykind can have long-term effects, giving rise to severe financial losses, as well as the loss ofcustomers’ trust.

Phishing, which is considered to be one of the main types of cyber-attacks faced byonline service users, is a dangerous and increasingly common phenomenon. Lastdragerhas defined phishing as “a scalable act of deception whereby impersonation is used toobtain information from a target” [3]. The term was coined as a serious cyber threat in 1996,when phishers stole information about the credentials of America Online (AOL) users [4].Since this attack on AOL, phishers have continued to change and develop their methods ofattacking higher-value targets.

In a phishing attack, the user is usually asked to log onto a fake website—whichmimics a legitimate website—by opening a malicious email attachment. When a user failsto recognise this as a phishing attempt and inputs his or her log-in information, the phishercaptures the log-in credentials, credit card information, etc. for the user’s account.

In July of 2021, the Anti-Phishing Working Group (APWG) reported 260,642 phishingwebsites [5]. According to the APWG, this is the highest monthly total in its reporting

Future Internet 2022, 14, 82. https://doi.org/10.3390/fi14030082 https://www.mdpi.com/journal/futureinternet

Future Internet 2022, 14, 82 2 of 21

history. Figure 1 shows the number of phishing occurrences from the first quarter of 2021to the third quarter of 2021.

Future Internet 2022, 14, x FOR PEER REVIEW 2 of 21

history. Figure 1 shows the number of phishing occurrences from the first quarter of 2021

to the third quarter of 2021.

Figure 1. Detection of unique phishing sites [5].

Statistics indicate that the incidence of phishing scams has doubled in recent years

due to the 2019 coronavirus (COVID-19) pandemic, which suggests that phishers are seek-

ing opportunities to exploit current events. As stated by the World Health Organization

(WHO), COVID-19 has established an ‘infodemic’ that actually benefits phishers [6]. In

addition, the FBI claims that over 11 times as many phishing complaints were logged in

2020 as in 2016 [7] since phishing attacks often tailor their campaigns to current events.

Nonetheless, the number of phishing scams declined during the first quarter of 2021, as

shown in Figure 1, with users becoming more aware of COVID-19 phishing scams. This

resulted in phishers being less successful with emails related to the pandemic. However,

phishers continued to exploit the pandemic, especially with the COVID-19 vaccination

rollout [8]. In addition, phishers frequently impersonate leading brands in a bid to steal

confidential information from users, such as their payment credentials. New reports show

that the most frequently imitated brands in global phishing attempts are Microsoft and

the DHL delivery service provider [9]. All previous statistical explanations refer to the risk

of phishing attacks. Phishing is considered as a lucrative criminal activity, which targets

individuals and organisations and incurs millions of dollars in losses every day. Moreo-

ver, it is seldom prosecuted. According to a recent study conducted by the Ponemon In-

stitute, the annual cost of phishing attacks in the US has increased significantly in the last

six years to the point where large US companies are now paying out $15 million each year.

This amounts to nearly $1500 per employee annually [10]. To date, anti-phishing tech-

niques have not been sufficiently effective to reduce the risk of phishing attacks. Due to

phishers seeking to identify the weaknesses and vulnerabilities of a given solution so that

these can be exploited to carry out a successful attack, it is essential to protect users’ data

from phishing attacks by educating them in the correct course of action and reporting

methods if a phishing email is received. For example, Qassim University is about to launch

a new awareness programme to mitigate phishing attacks. This programme is similar to

one that was implemented in Stanford University, which now has a phishing awareness

service [11]. Moreover, online services need to adopt a reliable phishing-prevention mech-

anism to ensure that only genuine users gain access to their systems. Consequently, many

organisations currently seek to protect their systems by implementing stronger authenti-

cation requirements as a means of preventing unauthorised access. For example, online

24

5,7

71

15

8,8

98

20

7,2

08

20

4,0

50

19

0,7

62

22

2,1

27

26

0,6

42

25

5,3

85

21

4,3

45

17

2,7

93

11

2,3

69

39

,91

8

11

,40

0

92

39

96

69

11

,38

4

10

,71

6 64

,23

3

NU

MB

ER O

F A

TTA

CK

S

ATTACK IN MONTH

UNIQUE PHISHING SITES DETECTED, Q1 2021-Q3 2021

Website

Email

Figure 1. Detection of unique phishing sites [5].

Statistics indicate that the incidence of phishing scams has doubled in recent years dueto the 2019 coronavirus (COVID-19) pandemic, which suggests that phishers are seekingopportunities to exploit current events. As stated by the World Health Organization (WHO),COVID-19 has established an ‘infodemic’ that actually benefits phishers [6]. In addition,the FBI claims that over 11 times as many phishing complaints were logged in 2020 as in2016 [7] since phishing attacks often tailor their campaigns to current events. Nonetheless,the number of phishing scams declined during the first quarter of 2021, as shown inFigure 1, with users becoming more aware of COVID-19 phishing scams. This resultedin phishers being less successful with emails related to the pandemic. However, phisherscontinued to exploit the pandemic, especially with the COVID-19 vaccination rollout [8].In addition, phishers frequently impersonate leading brands in a bid to steal confidentialinformation from users, such as their payment credentials. New reports show that the mostfrequently imitated brands in global phishing attempts are Microsoft and the DHL deliveryservice provider [9]. All previous statistical explanations refer to the risk of phishingattacks. Phishing is considered as a lucrative criminal activity, which targets individualsand organisations and incurs millions of dollars in losses every day. Moreover, it is seldomprosecuted. According to a recent study conducted by the Ponemon Institute, the annualcost of phishing attacks in the US has increased significantly in the last six years to thepoint where large US companies are now paying out $15 million each year. This amountsto nearly $1500 per employee annually [10]. To date, anti-phishing techniques have notbeen sufficiently effective to reduce the risk of phishing attacks. Due to phishers seeking toidentify the weaknesses and vulnerabilities of a given solution so that these can be exploitedto carry out a successful attack, it is essential to protect users’ data from phishing attacks byeducating them in the correct course of action and reporting methods if a phishing email isreceived. For example, Qassim University is about to launch a new awareness programmeto mitigate phishing attacks. This programme is similar to one that was implementedin Stanford University, which now has a phishing awareness service [11]. Moreover,online services need to adopt a reliable phishing-prevention mechanism to ensure that onlygenuine users gain access to their systems. Consequently, many organisations currently seekto protect their systems by implementing stronger authentication requirements as a meansof preventing unauthorised access. For example, online banking services exclusively use

Future Internet 2022, 14, 82 3 of 21

one-time passwords (OTP) to prevent identity theft, wherein a new password is generatedand required to enter for each log-in attempt. Furthermore, biometric authentication isanother promising trend for combatting phishing attacks. There are a number of differentsystems that apply biometric information as a means of identifying people, as in the caseof civil, government, and healthcare identification. Biometric authentication schemeshave been gaining popularity over other types of authentication in recent years since theyprovide high security to protect people’s identities and are easily combined with traditionalauthentication techniques.

Biometric information may be divided approximately into physiological and be-havioural characteristics. The biometric information used in physiological authenticationtechniques is derived from an individual’s physical traits, such as fingerprint and facerecognition. However, the measurement of these characteristics is very costly to deploy,as is the accompanying hardware. Conversely, behavioural characteristics are based onwhat users have learned or acquired that differentiates them from others. These includekeystroke and mouse dynamics. Out of the many possible biometric traits, keystrokedynamics are the most popular and have been extensively studied for recognition pur-poses [12]. Recent research has investigated the effectiveness of keystroke dynamics inorder to increase the level of security in authentication systems. These studies have variedin their approach, adopting different classes of keystroke dynamics (for example, free- andfixed-text), pattern classification techniques (such as statistical and machine-learning), andexperimental environments (controlling or non-controlling).

All have yielded promising results, but the results obtained with free text are un-doubtedly more secure than those produced with fixed text. Moreover, numerous studieshave sought to explore the intrinsic benefits of free-text keystroke dynamics in providingcontinuous and non-intrusive authentication. Therefore, this current study investigates theeffectiveness of incorporating free-text keystroke dynamics into completely automated pub-lic Turing test sentences (CAPTCHAs) in order to be able to distinguish between computersand humans (CAPTCHAs), thereby preventing phishing attacks.

CAPTCHA technology has played a significant role as a defence mechanism, pro-tecting Web security from malicious bot programmers across the Internet. It is one of therecognised shields used to distinguish between humans and computer programs (bots).CAPTCHA technology generates simple tests based on problems that humans can solvewith ease but that are difficult for computers (i.e., artificial intelligence [AI]) [13]. Whenthe right answer is received, it is, consequently, assumed that it was entered by a human,so the user is given access to the system [14]. CAPTCHAs exist on most websites and aremainly classified into four types: image-, audio-, video-, and text-based.

Text-based CAPTCHAs are one of the most widely used CAPTCHA schemes, requiringusers to read distorted text (digits/letters) that is presented in an image in registrationor log-in forms. Users must recognise and write the text in the input text box to obtainvalidation. Only then will they be granted access to the site, provided that the input textmatches the CAPTCHA characters and/or digits. This task supposedly cannot be solvedby AI programs. Popular platforms, such as Microsoft, Google, eBay, and Yahoo, have usedthis scheme as a security arrangement to authenticate users and enhance website security.In this study, the term ‘CAPTCHA’ refers solely to text-based CAPTCHAs. The followingis a summary of this study:

− Designed and implemented an effective and secure approach to investigate the ef-fectiveness of using CAPTCHA keystroke dynamics in enhancing the prevention ofphishing attacks.

− Analysed the existing schemes to design effective CAPTCHA, which helps to takeadvantage of keystroke dynamics to prevent phishing attacks.

− Significant time features were selected, representing users’ typing behaviour, andmeasured according to the existing literature. To the best of our knowledge, thesefeatures have not been used before in preventing phishing attacks.

− Appropriate similarity threshold was determined to produce excellent results.

Future Internet 2022, 14, 82 4 of 21

− Collected a large number of participants compared with previous works.− A controlled laboratory experiment was conducted in order to practically evaluate the

approach applied.

The structure of this paper is as follows: Section 2 reviews related work and thebackground of the study, while Section 3 outlines the proposed work, and Section 4 presentsthe methodology. Section 5 then describes the experimental study, and Section 6 includesthe evaluation metrics, while Section 7 presents and discusses the results of the experiment.Section 8 concludes the paper and provides some direction for future work.

2. Related Work and Background

To address the issue of phishing attacks, several solutions have been proposed tomitigate their effect. These solutions may be classified as technical (using software), non-technical (educational), or a combination of the two. Technical solutions can be furthercategorised as phishing detection and phishing prevention. Phishing detection involvesanalysing a website’s media content. It can use an image or URL to verify whether thewebsite is genuine [15]. Several anti-phishing detection techniques have been developedto mitigate the impact of phishing attacks, such as security toolbars, search engines [16],black/white lists [17], machine learning [18], and visual similarity-based techniques [19].Phishing prevention solutions attempt to prevent phishing attacks by enhancing websitesecurity. This is endeavoured through the implementation of unique methods of securingauthentication schemes and platforms for user interaction [20]. These solutions haveachieved widespread success in recent years, especially with the increase in identity theftand data leakage resulting from diverse means of attack, such as phishing, email spoofing,and malware. Therefore, companies, including banks, have focused on integrating strongauthentication methods to protect their systems based on the premise that, even if phishersmanage to steal victims’ information, they cannot use it to achieve their goals. For instance,in one study [21], an OTP and authentication tokens were used to construct a systemof two-factor authentication to prevent phishing attacks. In another study [22], a novelauthentication system was developed for online banking, which depended on integratedOTP and QR-codes to provide greater security and convenience for users. The authors inRef. [23] have proposed a two-factor authentication/verification scheme using fingerprintand code verification with a pattern matrix to prevent phishing attacks. In the above study,it was claimed that the scheme would be very useful in the financial sector to provide ahigh level of security, with a phisher being unable to bypass two levels of authenticationand verification. Moreover, Chetalam introduced a multi-factor authentication schemeto improve the security of a mobile money transfer system (the Mpesa app) in Kenya,integrating username and password, phone number, and voice biometrics to preventmobile money fraud, including phishing attacks [24].

In a recent study [25], another approach to protecting electronic payment systemswas developed, combining password, fingerprint, and OTP verification. This combinedapproach was intended to provide more reliable user authentication, consisting of multi-factor authentication in three main phases: (1) Registration: with various types of databeing required (i.e., the user’s biodata, password, and fingerprint), followed by verificationof accuracy and storage in the database, (2) Authentication: with the user required toenter a password and biometric fingerprint so that the latter could be checked against thefingerprint stored in the database; if a match was verified, access would be granted, and(3) Transaction: with the user selecting the amount for transfer and being authenticated byscanning their biometric fingerprint. If a match was verified, the system would create anOTP and transmit it to the user’s registered phone number. The user would then have toenter the correct OTP to complete the money transfer.

In other studies [26,27], a novel approach was introduced for the detection of phishingwebsites and authentication of registered users through CAPTCHA-based validation. Thefirst study used visual cryptography to protect the privacy of an image-based CAPTCHAby dividing the original image CAPTCHA into two shares. These were maintained on

Future Internet 2022, 14, 82 5 of 21

separate database servers (the user’s device and the server). As a result, the originalimage CAPTCHA was only exposed if both CAPTCHAs were simultaneously available.The proposed methodology was implemented by using Matlab without investigatingthe effectiveness of the approach in a real experiment. In contrast, the other study useda cropped image CAPTCHA (CIC) algorithm, where the user undertook two phases:registration and log-in. In the registration phase, the user was required to enter all details,select two images from among 80 images that were randomly generated by the server, andperform a cropping process based on predefined constraints. The system subsequentlycreated a user profile containing registration details and the cropped pieces, which wouldthen be sent server-side for storage in a database. During the log-in phase, after enteringtheir details, the users could determine whether a suspicious webpage was legitimate ora phishing website according to the image received in this phase. If the user receivedany of the images selected in the registration phase, along with nine random images, thesuspicious webpage could be identified as legitimate. Otherwise, it would be recognised asa phishing website. If a webpage was determined as legitimate, the user would have toperform the same cropping procedure as for registration. To evaluate the proposed system,the authors recruited 114 users, all familiar with the Internet and computers. The authorsused three verification keys, which were password, PIN, and the proposed method, toevaluate the user performance over three months. The results indicate that the CIC successrate was higher than a password at 9% and higher than a PIN at 1% in the third month. Inaddition, some demographic data were collected in the survey to evaluate the proposedapproach in terms of ease to remember and response time. According to the results, 82%of the participants believed CIC was easier to use, while 11% believed CIC was slow tosolve. Moreover, the study by the authors of Ref. [28] proposed a method to distinguishbetween authorized and unauthorized users by combining CAPTCHA attributes andhuman capabilities called bio-detection functionalities (BDF). The authors concluded thata CAPTCHA could help protect against third-party attacks when it is embedded withmechanisms (i.e., a BDF mechanism) that help identify the user.

Furthermore, in Ref. [29], it was highlighted how behavioural biometric authenticationcould be used to prevent phishing. The above author discussed how keystroke dynamicscould be used to prevent phishing attacks, as well as ensuring that phishers do not imper-sonate users. However, this idea was merely underlined without proving the concept in anexperimental study.

The use of the keystroke dynamics technique is a promising trend in identifying usersfrom typing patterns. In the early 1980s, both the National Science Foundation (NSF)and National Institute of Standards and Technology (NIST) in the US found in studiesthat typing patterns were unique and recognisable [30]. These initial promising resultsprompted a large number of researchers to explore keystroke dynamics, whereupon newmethods were developed to improve the efficiency of using keystroke dynamics in userauthentication, achieved by measuring and testing users’ typing patterns. In fact, allresearch in this area has sought to provide efficient solutions that will provide a robust andinexpensive authentication mechanism. In particular, a study [31] examined the use of freetext, applying Euclidean principles to obtain the distances between key pairs according totheir position on the keyboard. An interesting finding of this work, which relates closelyto the present research, is that flight time is more relevant than dwell time. This findingis evaluated in the current study. The best results were obtained with the features, down-down (DD), up-down (UD), and up-up (UU), producing a 21% false acceptance rate (FAR)and 17% false rejection rate (FRR).

In addition, two studies [32,33] investigated a free-text keystroke dynamic authentica-tion approach for English and Arabic input. However, the timing features and classificationalgorithms used in both the above studies differed. For example, in the first, five extractedtiming features were used for each key-pair to be included in the user feature vector. Theexperiment was conducted on a sample of 21 participants. The researchers used decisiontrees (DTs) and support vector machines (SVMs) to classify users based on the proposed

Future Internet 2022, 14, 82 6 of 21

timing features. The results for the English input revealed FAR = 0.245 and FRR = 0.613 forthe SVM classifier. Meanwhile, the DT results showed that FAR = 0.281 and FRR = 0.702.In contrast, the other study used keystroke duration, di-graph duration, and latency, com-bined as a single feature to distinguish among and calculated to classify the user samples.The results for the English input indicated that FAR = 0.22 and FRR = 0. However, in bothstudies, only the English input results were reviewed since they related to the proposedstudy. In this regard, a study in Ref. [33] achieved more promising results than a studyin Ref. [32]. Table 1 provides a simple comparison of the most relevant studies with thecurrent work, clearly indicating the contributions.

Table 1. Comparison between the current and relevant previous studies.

Reference Contribution Results

[24]Proposed a multi-factor authentication scheme using voicetraits as behavioural biometrics to protect a mobile banking

system (Mpesa) in Kenya.

The results demonstrated that voice biometrics havenumerous potential advantages and benefits in

terms of reducing risk for mobile financial systems.

[33]

Introduced a new approach involving free-text keystrokedynamics authentication to provide a high degree of

usability and security. Arabic and English language typingtext was used to compare the performance of Arabic inputwith another input. Keystroke duration, di-graph duration,

and latency were combined as a feature to distinguishbetween samples of authenticated users and impostors.

The results showed that the proposed approachachieved excellent results with FAR = 0.2 and FRR =

0.0.

Currentwork

Investigate the effectiveness of using CAPTCHA keystrokedynamics in enhancing the prevention of phishing attacks.

The results are promising in terms of providing apractical and cost-effective solution to prevent

phishing attacks.

3. Proposed Work

To precisely define the proposed work, this section describes a system interpretationapproach to the prevention of phishing attacks as a means of protecting the online services.This system deploys keystroke timing data to prevent phishing attacks, as well as providingsecure authentication, using log-in credentials (username and password) and a text-basedCAPTCHA, which integrates keystroke dynamics into a single system to prevent phishingattacks. Keystroke dynamics capture a user’s typing patterns in order to identify thatuser as it is difficult to reproduce a user’s typing pattern. The keystroke dynamics systemevaluates a user’s typing pattern in milliseconds (ms). In general, the system verifies userswho are requesting to access an online service. If a phisher requests access, they will eitherbe denied or the request will be accepted.

The approach illustrated in Figure 2 provides the basis of the experimental studypresented in the next section. In the system model, (1) a text-based CAPTCHA is shown onthe website’s log-in page. The user then enters their credentials and solves the CAPTCHA;(2) the request, along with the keystroke timing data, is sent to the authentication server;(3) the server checks the user’s details and compares them with the user’s profile in thedatabase; and (4) the server grants access or rejects the user’s request. The following sectiondetails the proposed methodology for this study.

Future Internet 2022, 14, 82 7 of 21Future Internet 2022, 14, x FOR PEER REVIEW 7 of 21

Figure 2. Proposed system model.

4. Methodology

This section presents the proposed biometric user keystroke dynamics authentication

system and its components. Figure 2 depicts the proposed system approach and how it

serves to prevent phishing attacks. According to Figure 2, the main implementation steps

involve identifying time features that represent users’ typing behaviour and extracting

timing vectors. Moreover, the methods of measuring distance and classifying users are

explained, and the typing text used in this experiment is identified.

4.1. Definition of Features

Keystroke dynamics are mainly based on features of time, but some research has been

grounded on other features, such as pressure, the sequence of special action keys (i.e., left-

right Alt, Shift, Ctrl), and speed typing. All these features have been calculated in milli-

seconds. The present work applied time features obtained from two keyboard actions:

depression (Dn) and release (Un) for each key typed, wherein n indicates the key, and time

is recorded in milliseconds. In this study, three timing features were extracted, as sug-

gested by [33]:

• Keystroke duration or hold time: the interval between a key being pressed and re-

leased, which may be computed according to the following formula:

HK1 = Uk1 − Dk1

• Keystroke latencies (also called press-press or DD time): time taken by the user to

press two consecutive keys, which can be calculated with the following formula:

DD = Dk2 − Dk1

• Di-graph duration: time difference between releasing one key and pressing another,

computed with the following formula:

UD = Dk2 − Uk1

Figure 3 presents an example of these time features extracted for two keys. Based on

this example, the hold time for key ‘B’ = 300 – 000 = 300 ms, and for key ‘I’= 750 – 400 = 350

ms. In addition, the DD time (for keys ‘B’ and ‘I’) = 400 – 000 = 400 ms and the UD time =

400 – 300 = 100 ms, respectively.

Figure 2. Proposed system model.

4. Methodology

This section presents the proposed biometric user keystroke dynamics authenticationsystem and its components. Figure 2 depicts the proposed system approach and how itserves to prevent phishing attacks. According to Figure 2, the main implementation stepsinvolve identifying time features that represent users’ typing behaviour and extractingtiming vectors. Moreover, the methods of measuring distance and classifying users areexplained, and the typing text used in this experiment is identified.

4.1. Definition of Features

Keystroke dynamics are mainly based on features of time, but some research has beengrounded on other features, such as pressure, the sequence of special action keys (i.e.,left-right Alt, Shift, Ctrl), and speed typing. All these features have been calculated inmilliseconds. The present work applied time features obtained from two keyboard actions:depression (Dn) and release (Un) for each key typed, wherein n indicates the key, and timeis recorded in milliseconds. In this study, three timing features were extracted, as suggestedby [33]:

• Keystroke duration or hold time: the interval between a key being pressed andreleased, which may be computed according to the following formula:

HK1 = Uk1 − Dk1

• Keystroke latencies (also called press-press or DD time): time taken by the user topress two consecutive keys, which can be calculated with the following formula:

DD = Dk2 − Dk1

• Di-graph duration: time difference between releasing one key and pressing another,computed with the following formula:

UD = Dk2 − Uk1

Figure 3 presents an example of these time features extracted for two keys. Based onthis example, the hold time for key ‘B’ = 300 − 000 = 300 ms, and for key ‘I’= 750 − 400 =350 ms. In addition, the DD time (for keys ‘B’ and ‘I’) = 400 − 000 = 400 ms and the UDtime = 400 − 300 = 100 ms, respectively.

Future Internet 2022, 14, 82 8 of 21Future Internet 2022, 14, x FOR PEER REVIEW 8 of 21

Figure 3. Example of extracting time features of keystroke dynamics.

4.2. Extraction of Timing Vectors

After extracting time features, the collected data were pre-processed to remove out-

liers and noisy data (i.e., the large amounts of data generated when a user presses two

keys simultaneously in error). The server then calculated the mean values for each time

feature (hold time, UD time, and DD time) to build the user’s profile, as in Ref. [33]. The

system also assigned each participant a unique ID for their identification, which could

also be used to label the data. Thus, the time vectors were categorised based on the user’s

label data. In addition, the system provided a fake IP address for each participant because

the experiment was conducted on a single laptop.

4.3. Finding the Distance and Classification Methods

To determine how much a user’s test data matched their profile, a Euclidean distance

measure was used. Thus, the system measured the distance between two vectors based

on a Euclidean distance equation in three-dimensional space [34]:

d(x, y) = √(𝑥1 − 𝑦1)2 + (𝑥2 − 𝑦2)2 + (𝑥3 − 𝑦3)2 (1)

where x and y are two timing vectors. In this study, the user login was x and the user

profile vector was y. Moreover, d(x, y) ≥ 0 [34]. Algorithm 1 indicates how Euclidean dis-

tance was computed in the proposed work.

Algorithm 1: Euclidean distance (ED)

1: begin

2: Compute the different values between two timing vectors.

3: Calculate Square value

4: Sum the values of step 3

5: Take the square root

6: end

To identify the users, standard deviation (SD) was applied as a threshold for Euclid-

ean distance in an approach inspired by the work in Ref. [33]. In the proposed system,

Euclidean distance was compared to the SD. If the Euclidean distance was less than the

half rows of SD values stored in the database, it was considered as a similarity threshold,

with the new time vector belonging to the same user as the profile being compared. Hence,

the new vector would be stored in the database. Otherwise, the system would give the

user six attempts. If this number of attempts was exceeded, the user would be classified

as a phisher.

SD = √∑ (𝑥𝑖

𝑁𝑖 =1 −��)2

𝑁−1 (2)

Figure 3. Example of extracting time features of keystroke dynamics.

4.2. Extraction of Timing Vectors

After extracting time features, the collected data were pre-processed to remove outliersand noisy data (i.e., the large amounts of data generated when a user presses two keyssimultaneously in error). The server then calculated the mean values for each time feature(hold time, UD time, and DD time) to build the user’s profile, as in Ref. [33]. The systemalso assigned each participant a unique ID for their identification, which could also beused to label the data. Thus, the time vectors were categorised based on the user’s labeldata. In addition, the system provided a fake IP address for each participant because theexperiment was conducted on a single laptop.

4.3. Finding the Distance and Classification Methods

To determine how much a user’s test data matched their profile, a Euclidean distancemeasure was used. Thus, the system measured the distance between two vectors based ona Euclidean distance equation in three-dimensional space [34]:

d(x, y) =√(x1 − y1)

2 + (x2 − y2)2 + (x3 − y3)

2 (1)

where x and y are two timing vectors. In this study, the user login was x and the user profilevector was y. Moreover, d(x, y) ≥ 0 [34]. Algorithm 1 indicates how Euclidean distance wascomputed in the proposed work.

Algorithm 1: Euclidean distance (ED)

1: begin2: Compute the different values between two timing vectors.3: Calculate Square value4: Sum the values of step 35: Take the square root6: end

To identify the users, standard deviation (SD) was applied as a threshold for Euclideandistance in an approach inspired by the work in Ref. [33]. In the proposed system, Euclideandistance was compared to the SD. If the Euclidean distance was less than the half rowsof SD values stored in the database, it was considered as a similarity threshold, with thenew time vector belonging to the same user as the profile being compared. Hence, thenew vector would be stored in the database. Otherwise, the system would give the usersix attempts. If this number of attempts was exceeded, the user would be classified asa phisher.

SD =

√∑N

i=1 (xi − x)2

N − 1(2)

Future Internet 2022, 14, 82 9 of 21

where xi = {x1, x2, . . . ,xi} represents the values of the time features, these being the meanvalues for hold, UD, and DD times. Meanwhile, X represents the mean value of all featuresused, and N represents the number of features [35]. Algorithm 2 illustrates how to computethe SD in the proposed work.

Algorithm 2: Standard Deviation (SD)

1: begin2: Compute the mean values for each feature3: Calculate Square value4: Sum the value of step 35: Divide by 26: Take the square root7: end

4.4. Typing Text

There are two main phases in this study, as in all keystroke dynamics systems: en-rolment and verification. The proposed study deals with free text because text-basedCAPTCHA provides completely different text each time. Thus, it does not require the userto memorise any text. For the purpose of usability, the system generates a CAPTCHA thatcombines lowercase letters (a–z) and digits (1–9). The proposed solution targets stringswith lowercase letters to exclude the use of shift and caps lock keys because the systemneed only focus on collecting key events for the character keys, thereby avoiding any otherkeystroke sequences.

Informed by the pilot experiment, some confusable letters were removed to increasethe usability and accuracy of the proposed solution. For example, the numerical digit ‘0’was removed because it is often confused with the letter ‘O’, and the capital letter ‘I’ wasremoved because it is often confused with the lowercase letter ‘l’. The lowercase letters‘l’, ‘s’, and ‘g’ were also removed because they are often mistaken for the numbers ‘1’, ‘5’,and ‘9’, respectively, as suggested in Ref. [36]. In addition, the font size was increased to48 points to ensure that all the characters could be clearly read by the users. The Verdanafont was selected because it is stated in the literature that users solve CAPTCHAs moreaccurately when using the Arial or Verdana fonts [37]. Figure 4 shows a generated sample.

Future Internet 2022, 14, x FOR PEER REVIEW 9 of 21

where xi = {x1, x2, …,xi} represents the values of the time features, these being the mean

values for hold, UD, and DD times. Meanwhile, �� represents the mean value of all fea-

tures used, and N represents the number of features [35]. Algorithm 2 illustrates how to

compute the SD in the proposed work.

Algorithm 2: Standard Deviation (SD)

1: begin

2: Compute the mean values for each feature

3: Calculate Square value

4: Sum the value of step 3

5: Divide by 2

6: Take the square root

7: end

4.4. Typing Text

There are two main phases in this study, as in all keystroke dynamics systems: enrol-

ment and verification. The proposed study deals with free text because text-based CAP-

TCHA provides completely different text each time. Thus, it does not require the user to

memorise any text. For the purpose of usability, the system generates a CAPTCHA that

combines lowercase letters (a–z) and digits (1–9). The proposed solution targets strings

with lowercase letters to exclude the use of shift and caps lock keys because the system

need only focus on collecting key events for the character keys, thereby avoiding any other

keystroke sequences.

Informed by the pilot experiment, some confusable letters were removed to increase

the usability and accuracy of the proposed solution. For example, the numerical digit ‘0’

was removed because it is often confused with the letter ‘O’, and the capital letter ‘I’ was

removed because it is often confused with the lowercase letter ‘l’. The lowercase letters ‘l’,

‘s’, and ‘g’ were also removed because they are often mistaken for the numbers ‘1’, ‘5’, and

‘9’, respectively, as suggested in Ref. [36]. In addition, the font size was increased to 48

points to ensure that all the characters could be clearly read by the users. The Verdana

font was selected because it is stated in the literature that users solve CAPTCHAs more

accurately when using the Arial or Verdana fonts [37]. Figure 4 shows a generated sample.

Figure 4. A generated sample.

Previous studies have focused on using a long-text system to obtain a large amount

of timing information. However, this method has a longer training phase and is not user-

friendly. In addition, the accuracy of this technique is not high because the user must

pause frequently to look at the text during the copy task, which can lead to inconsistencies

in the collected data [38]. Therefore, in the proposed solution, a text length of 10 characters

was adopted to authenticate keystroke dynamics. This text length was selected based on

Ref. [39], in which the authors investigated a number of studies on anomaly detection

using authentication via keystroke dynamics, wherein they observed that a text length of

10 characters was typical in keystroke dynamics authentication. This text length has

Figure 4. A generated sample.

Previous studies have focused on using a long-text system to obtain a large amountof timing information. However, this method has a longer training phase and is not user-friendly. In addition, the accuracy of this technique is not high because the user must pausefrequently to look at the text during the copy task, which can lead to inconsistencies inthe collected data [38]. Therefore, in the proposed solution, a text length of 10 characterswas adopted to authenticate keystroke dynamics. This text length was selected based onRef. [39], in which the authors investigated a number of studies on anomaly detectionusing authentication via keystroke dynamics, wherein they observed that a text lengthof 10 characters was typical in keystroke dynamics authentication. This text length hasproved to be effective when applied with a text-based CAPTCHA to detect attacks fromhumans [40]. However, the proposed system asks the user to solve a CAPTCHA seven

Future Internet 2022, 14, 82 10 of 21

times. This was inspired by Ref. [41], in which the authors achieved the best performance,consisting of 0.00% FAR and 0.00% FRR. Moreover, a large sample can help ensure anaccurate and conclusive test result.

In this study, the generated CAPTCHA word was presented on a grey backgroundwith no background lines or noise. The main aim was to prove the effectiveness of theproposed solution and increase acceptance of the submitted idea. Finally, the intentionwas to display the CAPTCHA to the user on the signup and log-in webpage. A ‘refreshCAPTCHA’ button was also included, which would allow the user to view a new problem.In addition, instructions were provided to clarify that all characters were lowercase withno spaces between them.

• Typed Text in the Sign-up (Enrolment) Phase

During the enrolment phase of this experiment, the system began to create a biometrictemplate for each user by asking them to enter their credentials (username, email, andpassword) and to solve a CAPTCHA seven times. Moreover, guidelines appeared in thesign-up and log-in pages, explaining that all characters of the CAPTCHA were lowercaseletters with no spaces between them. In addition, the participants were informed that anyinformation entered would only be used for the purposes of the research. Figure 5 depictsa screenshot of the sign-up page.

Future Internet 2022, 14, x FOR PEER REVIEW 10 of 21

proved to be effective when applied with a text-based CAPTCHA to detect attacks from

humans [40]. However, the proposed system asks the user to solve a CAPTCHA seven

times. This was inspired by Ref. [41], in which the authors achieved the best performance,

consisting of 0.00% FAR and 0.00% FRR. Moreover, a large sample can help ensure an

accurate and conclusive test result.

In this study, the generated CAPTCHA word was presented on a grey background

with no background lines or noise. The main aim was to prove the effectiveness of the

proposed solution and increase acceptance of the submitted idea. Finally, the intention

was to display the CAPTCHA to the user on the signup and log-in webpage. A ‘refresh

CAPTCHA’ button was also included, which would allow the user to view a new prob-

lem. In addition, instructions were provided to clarify that all characters were lowercase

with no spaces between them.

• Typed Text in the Sign-up (Enrolment) Phase

During the enrolment phase of this experiment, the system began to create a bio-

metric template for each user by asking them to enter their credentials (username, email,

and password) and to solve a CAPTCHA seven times. Moreover, guidelines appeared in

the sign-up and log-in pages, explaining that all characters of the CAPTCHA were lower-

case letters with no spaces between them. In addition, the participants were informed that

any information entered would only be used for the purposes of the research. Figure 5

depicts a screenshot of the sign-up page.

Figure 5. Sign-up page.

• Text in the Log-in (Verification) Phase

During this phase, the participants were required to enter their username and pass-

word, and to solve a CAPTCHA once on the log-in page. The features of the captured

typing pattern were then extracted from the CAPTCHA solution and compared with

those stored in the profile associated with the corresponding username and password in

the database. Figure 6 depicts a screenshot of the log-in page.

Figure 5. Sign-up page.

• Text in the Log-in (Verification) Phase

During this phase, the participants were required to enter their username and pass-word, and to solve a CAPTCHA once on the log-in page. The features of the capturedtyping pattern were then extracted from the CAPTCHA solution and compared with thosestored in the profile associated with the corresponding username and password in thedatabase. Figure 6 depicts a screenshot of the log-in page.

Future Internet 2022, 14, 82 11 of 21Future Internet 2022, 14, x FOR PEER REVIEW 11 of 21

Figure 6. Log-in page.

5. Experimental Evaluation of the Proposed System

This section discusses the experiments conducted to evaluate the effectiveness of the

proposed solution in preventing phishing attacks. As mentioned previously, the proposed

work is based on calculating users’ hold time, latencies, and di-graph duration when solv-

ing a CAPTCHA test. To collect the features of keystroke dynamics, a toolkit was required.

In turn, this necessitated the selection of an appropriate development platform. Although

the project included both mobile and Web platforms, the focus was on Web development.

The language chosen for developing the data acquisition was JavaScript because this is

one of the most commonly used languages for Web and mobile applications, offering the

benefits of low cost and high performance. In addition, HTML, Bootstrap, and JQuery

AJAX were used. For the backend portion of this project, Python was adopted as the lan-

guage and Flask as the framework. The system was developed and tested on a Windows

laptop, using SQLite as the database system. Table 2 presents a brief summary of the com-

ponents used to perform this experimental study. In order to evaluate the proposed sys-

tem and increase the chance of generating clear results, two controlled laboratory experi-

ments were conducted. The following subsections briefly describe each of these experi-

ments:

Table 2. Components of the proposed system.

Components Description

Participants 75

Genuine group 30

Phishing group 45

Gender Female

User’s profession Students at Qassim University

Age range Between 19 and 27 years

Language Python

Recording of typing rhythms JavaScript

Timing function Date.getTime()

Keyboard QWERTY (laptop)

Acquisition platform Windows

Controlled environment Yes

Typing text English language free text

Training sample 7 samples

Testing sample One-time sample

Figure 6. Log-in page.

5. Experimental Evaluation of the Proposed System

This section discusses the experiments conducted to evaluate the effectiveness of theproposed solution in preventing phishing attacks. As mentioned previously, the proposedwork is based on calculating users’ hold time, latencies, and di-graph duration when solvinga CAPTCHA test. To collect the features of keystroke dynamics, a toolkit was required. Inturn, this necessitated the selection of an appropriate development platform. Although theproject included both mobile and Web platforms, the focus was on Web development. Thelanguage chosen for developing the data acquisition was JavaScript because this is one ofthe most commonly used languages for Web and mobile applications, offering the benefitsof low cost and high performance. In addition, HTML, Bootstrap, and JQuery AJAX wereused. For the backend portion of this project, Python was adopted as the language andFlask as the framework. The system was developed and tested on a Windows laptop, usingSQLite as the database system. Table 2 presents a brief summary of the components used toperform this experimental study. In order to evaluate the proposed system and increase thechance of generating clear results, two controlled laboratory experiments were conducted.The following subsections briefly describe each of these experiments:

Table 2. Components of the proposed system.

Components Description

Participants 75

Genuine group 30

Phishing group 45

Gender Female

User’s profession Students at Qassim University

Age range Between 19 and 27 years

Language Python

Recording of typing rhythms JavaScript

Timing function Date.getTime()

Keyboard QWERTY (laptop)

Acquisition platform Windows

Controlled environment Yes

Typing text English language free text

Training sample 7 samples

Testing sample One-time sample

Future Internet 2022, 14, 82 12 of 21

5.1. Pilot Experiment

Before initiating the main experiment, a simple preliminary experiment was conductedon a small sample to examine the system’s performance. This pilot experiment wouldalso determine the appropriate similarity threshold for authenticating genuine users andexcluding phishers from the main experiment. In this pilot, the similarity threshold wasdetermined as five (5), this being a randomly selected number, meaning that if the Euclideandistance value was less than or equal to five SD values stored in the user database, the userwould be considered genuine and granted access to the system. Otherwise, the user wouldbe identified as a phisher and prevented from accessing the system. The pilot experimentwas conducted with five participants acting as phishers. The participants were asked to loginto the website using the information provided, repeating their login attempts until thesixth attempt, which is when they would be stopped by the system. This plugin was used inthe system to block the user’s Internet address from further attempts once a specified retrylimit had been reached. The proposed system was inspired by Google’s six attempts per IPaddress. Moreover, Microsoft recommends a minimum of four attempts and maximumof 10. Table 3 presents the IP address of each participant and the time taken by eachto complete the task. The pilot experiment demonstrated that the system was workingproperly and capable of preventing all phishers. Moreover, the similarity threshold wasobserved to be the total number of SD in similar profiles divided by two, where the numberof user profiles will be increased after each successfully authenticated attempt, meaningthat identifying a specific number might not be effective after a certain period of time aswell as it would increase the FRR rate. As shown in Table 3, the number of similar profilesdid not exceed four, indicating that the determined threshold will be capable of producingexcellent results.

Table 3. Details of participants in the pilot experiment.

IP Address SuccessfulAttempts Failed Attempts Number of

Similar ProfilesTime Consumed

(In Minutes)

192.168.0.1

0 6

4

Most users did notexceed two minutes

192.168.0.2 2192.168.0.3 0192.168.0.4 0192.168.0.5 1

5.2. Main Experiment

Seventy-five participants participated in a controlled laboratory experiment to evalu-ate the proposed system. The participants comprised undergraduate students studyingdifferent subjects at Qassim University, all with different levels of typing skill. In addition,all were familiar with text-based CAPTCHAs. These participants were divided into twogroups: genuine users and phishers. The group of genuine users consisted of 30 partic-ipants, while the phishing group contained 45 participants. The number of participantsand their division into groups was very similar to the method adopted in most of theprevious studies, for example, study of Ref. [40]. Thus, the results of these studies wereall considered to be equally credible. Nonetheless, a higher number of participants wassampled in this current work compared to previous studies. The following subsectionsexplain the actual experimental setup and procedure.

5.2.1. Experimental Setup

In this step, the system was prepared by deleting all data from the database and makingthe necessary changes identified in the pilot experiment. After preparing an appropriateplace to conduct the experiment, the experimental procedure was explained to the usersin an information sheet. The procedure was then re-explained to the users immediatelybefore starting the main experiment. The experiment began with the group of genuineusers. The participants in this group were registered in the system as genuine users, having

Future Internet 2022, 14, 82 13 of 21

entered their usernames, email addresses, and passwords, and having solved a text-basedCAPTCHA seven times to collect sufficient keystroke timing data. The user subsequentlyneeded to log into the system to gain access to the results page. If a user managed togain access to the system, it would mean that the task was completed successfully, and allattempts were stored in the database as time vectors (user profile). Figure 7 illustrates thetime vector of one genuine user, which was stored in the database. From Figure 7, it can beseen that eight samples from the user are present.

Future Internet 2022, 14, x FOR PEER REVIEW 13 of 21

users in an information sheet. The procedure was then re-explained to the users immedi-

ately before starting the main experiment. The experiment began with the group of genu-

ine users. The participants in this group were registered in the system as genuine users,

having entered their usernames, email addresses, and passwords, and having solved a

text-based CAPTCHA seven times to collect sufficient keystroke timing data. The user

subsequently needed to log into the system to gain access to the results page. If a user

managed to gain access to the system, it would mean that the task was completed success-

fully, and all attempts were stored in the database as time vectors (user profile). Figure 7

illustrates the time vector of one genuine user, which was stored in the database. From

Figure 7, it can be seen that eight samples from the user are present.

Figure 7. Time vector of one genuine user.

Once the information for the group of genuine participants had been collected, nine

credentials of different users from various specialties were selected. This information was

then offered to the phishing group so that they could gain illegal access to the system.

These phishers were permitted six log-in attempts per IP address. The system would then

block any further log-in attempts from that IP address. Figure 8 shows a phisher’s at-

tempts to gain illegal access to the account of a genuine user, whose data are presented

above in Figure 7.

Figure 8. Example of time vector from one phisher’s attempts.

5.2.2. The Experimental Procedure

A controlled laboratory environment was used as the experimental setting to avoid

any interruption while the text-based CAPTCHA was being solved. This meant that all

phones needed to be switched off (or put on silent) and any chatting with friends had to

be avoided. All the participants were instructed that they needed to sign up, first by en-

tering their username, email address, and password, and then by solving the CAPTCHA

Figure 7. Time vector of one genuine user.

Once the information for the group of genuine participants had been collected, ninecredentials of different users from various specialties were selected. This information wasthen offered to the phishing group so that they could gain illegal access to the system.These phishers were permitted six log-in attempts per IP address. The system would thenblock any further log-in attempts from that IP address. Figure 8 shows a phisher’s attemptsto gain illegal access to the account of a genuine user, whose data are presented above inFigure 7.

Future Internet 2022, 14, x FOR PEER REVIEW 13 of 21

users in an information sheet. The procedure was then re-explained to the users immedi-

ately before starting the main experiment. The experiment began with the group of genu-

ine users. The participants in this group were registered in the system as genuine users,

having entered their usernames, email addresses, and passwords, and having solved a

text-based CAPTCHA seven times to collect sufficient keystroke timing data. The user

subsequently needed to log into the system to gain access to the results page. If a user

managed to gain access to the system, it would mean that the task was completed success-

fully, and all attempts were stored in the database as time vectors (user profile). Figure 7

illustrates the time vector of one genuine user, which was stored in the database. From

Figure 7, it can be seen that eight samples from the user are present.

Figure 7. Time vector of one genuine user.

Once the information for the group of genuine participants had been collected, nine

credentials of different users from various specialties were selected. This information was

then offered to the phishing group so that they could gain illegal access to the system.

These phishers were permitted six log-in attempts per IP address. The system would then

block any further log-in attempts from that IP address. Figure 8 shows a phisher’s at-

tempts to gain illegal access to the account of a genuine user, whose data are presented

above in Figure 7.

Figure 8. Example of time vector from one phisher’s attempts.

5.2.2. The Experimental Procedure

A controlled laboratory environment was used as the experimental setting to avoid

any interruption while the text-based CAPTCHA was being solved. This meant that all

phones needed to be switched off (or put on silent) and any chatting with friends had to

be avoided. All the participants were instructed that they needed to sign up, first by en-

tering their username, email address, and password, and then by solving the CAPTCHA

Figure 8. Example of time vector from one phisher’s attempts.

5.2.2. The Experimental Procedure

A controlled laboratory environment was used as the experimental setting to avoidany interruption while the text-based CAPTCHA was being solved. This meant that allphones needed to be switched off (or put on silent) and any chatting with friends hadto be avoided. All the participants were instructed that they needed to sign up, first byentering their username, email address, and password, and then by solving the CAPTCHAseven times. Guidelines for solving the CAPTCHA appeared in the sign-up and log-inpages, clarifying that all characters of the CAPTCHA were lowercase letters, with no spacesbetween them. Moreover, the system included a button to refresh the CAPTCHA so that a

Future Internet 2022, 14, 82 14 of 21

user could receive readable text and re-enter a solution. In addition, the participants wereinformed that any information entered would only be used for the purposes of the research.Following the sign-up phase, the participants were instructed to log into the system withthe same email address and password that were entered in the sign-up phase, and to solvethe CAPTCHA once. The participants were permitted to use backspace keys if requiredwhile typing. Finally, a welcome page with the corresponding username appeared to notifythe participants that the experiment had ended. Once the genuine group’s information hadbeen collected, the collection of the phishing group’s information began. For the phishinggroup, the same steps were undertaken as for the genuine group, except that the phishersdid not need to sign up as they were being provided with other people’s information. Theywere to use this information to try and log in. All the participants received an informationsheet explaining the primary goal of the experiment and how it would be conducted. Thisexplanation was reiterated for each user individually before starting the experiment inorder to ensure accurate understanding.

6. Evaluation Metrics

There are several metrics that were used to evaluate the effectiveness of the model.The false positive (FP), false negative (FN), true positive (TP), and true negative (TN) areparameters often used by any phishing solution researchers to judge the performance ofsolutions. Let true positive (TP) indicate the number of phishers correctly classified asphishing attackers; true negative (TN) indicates the number of genuine users correctlyclassified as genuine, false positive (FP) indicates the number of genuine users who areincorrectly classified as phishers, and false negative (FN) indicates the number of phisherswho are incorrectly classified as genuine users. This study employs five different metricsbased on these parameters, as follows:

True positive rate (TPR): it is the rate of phishers who are correctly classified as phishersof the total phishers. The equation of the TPR is shown in Equation (3):

TPR =TP

TP + FN(3)

True negative rate (TNR): it is the rate of genuine users who are correctly classifiedas genuine users of the total genuine users; the equation of calculating TNR is shown inEquation (4):

TNR =TN

TN + FP(4)

False positive rate (FPR): it is the rate of genuine users who are incorrectly classifiedas phishers of the total genuine users. The Equation (5) defined a FPR equation:

FPR =FP

FP + TN(5)

False negative rate (FNR): it is the rate of phishers who are incorrectly classified asgenuine users of the total phishers. Equation (6) shows how to compute FNR:

FNR =FN

FN + TP. (6)

Accuracy refers to the total number of correctly classified attempts (true accept/truereject) in relation to the total number of all users’ completed attempts, and computes asshown in Equation (7):

Accuracy =TP + TN

TP + TN + FP + FN(7)

Future Internet 2022, 14, 82 15 of 21

7. Results and Discussion

This section presents and discusses the results of the real experimental study for theproposed system. All the data were obtained from the actual experiment, and it wasverified that all the participants successfully completed the given tasks. Details of theseresults are displayed and discussed in the following subsections.

7.1. Time Taken for Each Genuine User to Register in the System

Despite the length of the CAPTCHA and its sevenfold repetition in the registra-tion phase, as well as having to complete the log-in phase, most of the users solved theCAPTCHA within 3–5 min, as shown in Figure 9. This indicated that the proposed systemwas not overly complicated or laborious.

Future Internet 2022, 14, x FOR PEER REVIEW 15 of 21

7. Results and Discussion

This section presents and discusses the results of the real experimental study for the

proposed system. All the data were obtained from the actual experiment, and it was ver-

ified that all the participants successfully completed the given tasks. Details of these re-

sults are displayed and discussed in the following subsections.

7.1. Time Taken for Each Genuine User to Register in the System

Despite the length of the CAPTCHA and its sevenfold repetition in the registration

phase, as well as having to complete the log-in phase, most of the users solved the CAP-

TCHA within 3–5 min, as shown in Figure 9. This indicated that the proposed system was

not overly complicated or laborious.

Figure 9. Time consumed by each genuine user.

7.2. Results for Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time for

Each Genuine User

Figure 10 illustrates the average of all features used in the proposed system (hold

time, UD time, and DD time) for all the successful CAPTCHA answers typed by each

genuine user. It should be noted that the average hold time was more constant, whereas

the di-graph features were less constant between users. However, the system appeared to

be effective in preventing phishers.

Figure 10. Average hold time, UD time, and DD time for each genuine user.

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Co

nsu

med

Tim

e(i

n M

inu

tes)

Users

Time Consumed by Each Genuine User to Register in the System

0.0

500.0

1000.0

1500.0

2000.0

2500.0

3000.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Tim

e (i

n m

illi

seco

nd

s)

Users

Average Hold Time, Up-down (UD) Time, and Down-down (DD)

time for Each Genuine User (in Milliseconds)

Average of Hold Time

Average of Up-DownTime

Average of Down-DownTime

Figure 9. Time consumed by each genuine user.

7.2. Results for Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time for EachGenuine User

Figure 10 illustrates the average of all features used in the proposed system (holdtime, UD time, and DD time) for all the successful CAPTCHA answers typed by eachgenuine user. It should be noted that the average hold time was more constant, whereasthe di-graph features were less constant between users. However, the system appeared tobe effective in preventing phishers.

Future Internet 2022, 14, x FOR PEER REVIEW 15 of 21

7. Results and Discussion

This section presents and discusses the results of the real experimental study for the

proposed system. All the data were obtained from the actual experiment, and it was ver-

ified that all the participants successfully completed the given tasks. Details of these re-

sults are displayed and discussed in the following subsections.

7.1. Time Taken for Each Genuine User to Register in the System

Despite the length of the CAPTCHA and its sevenfold repetition in the registration

phase, as well as having to complete the log-in phase, most of the users solved the CAP-

TCHA within 3–5 min, as shown in Figure 9. This indicated that the proposed system was

not overly complicated or laborious.

Figure 9. Time consumed by each genuine user.

7.2. Results for Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time for

Each Genuine User

Figure 10 illustrates the average of all features used in the proposed system (hold

time, UD time, and DD time) for all the successful CAPTCHA answers typed by each

genuine user. It should be noted that the average hold time was more constant, whereas

the di-graph features were less constant between users. However, the system appeared to

be effective in preventing phishers.

Figure 10. Average hold time, UD time, and DD time for each genuine user.

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Co

nsu

med

Tim

e(i

n M

inu

tes)

Users

Time Consumed by Each Genuine User to Register in the System

0.0

500.0

1000.0

1500.0

2000.0

2500.0

3000.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Tim

e (i

n m

illi

seco

nd

s)

Users

Average Hold Time, Up-down (UD) Time, and Down-down (DD)

time for Each Genuine User (in Milliseconds)

Average of Hold Time

Average of Up-DownTime

Average of Down-DownTime

Figure 10. Average hold time, UD time, and DD time for each genuine user.

Future Internet 2022, 14, 82 16 of 21

7.3. Number of Attempts Made by Attackers to Gain Unauthorised Access to the System

The number of attempts to obtain access to the system was limited to six for eachIP address. If this maximum number was exceeded, the IP address would be blocked.Figure 11 presents the number of attempts made by attackers who gained successful accessto the system. Conversely, Table 4 displays the IP addresses that were blocked because themaximum number of attempts was exceeded.

Future Internet 2022, 14, x FOR PEER REVIEW 16 of 21

7.3. Number of Attempts Made by Attackers to Gain Unauthorised Access to the System

The number of attempts to obtain access to the system was limited to six for each IP

address. If this maximum number was exceeded, the IP address would be blocked. Figure

11 presents the number of attempts made by attackers who gained successful access to the

system. Conversely, Table 4 displays the IP addresses that were blocked because the max-

imum number of attempts was exceeded.

Figure 11. Number of attempts made by attackers to gain access to the system.

Table 4. Blocked IP addresses.

ID IP Address

1 192.168.0.1

2 192.168.0.2

3 192.168.0.3

4 192.168.0.4

5 192.168.0.5

6 192.168.0.6

9 192.168.0.9

11 192.168.0.11

12 192.168.0.12

15 192.168.0.15

18 192.168.0.18

32 192.168.0.32

34 192.168.0.34

35 192.168.0.35

38 192.168.0.38

40 192.168.0.40

41 192.168.0.41

7.4. Results of Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time for Each

Phisher

Figure 12 shows the average hold time, UD time, and DD time for all phishing at-

tempts to gain unauthorised access to the system. It should be noted that the system rec-

orded the timestamp in milliseconds, as mentioned previously. The results prove the ef-

fectiveness of the proposed system for preventing phishing attacks.

6

1

2

1

3

4

1 1

6

3

4

1 1 1 1

2

1 1 1 1

2

1 1

2

1 1 1

4

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Nu

mb

er o

f at

tem

pts

Phishers

Number of Access Attempts Made by Attackers

Figure 11. Number of attempts made by attackers to gain access to the system.

Table 4. Blocked IP addresses.

ID IP Address

1 192.168.0.12 192.168.0.23 192.168.0.34 192.168.0.45 192.168.0.56 192.168.0.69 192.168.0.911 192.168.0.1112 192.168.0.1215 192.168.0.1518 192.168.0.1832 192.168.0.3234 192.168.0.3435 192.168.0.3538 192.168.0.3840 192.168.0.4041 192.168.0.41

7.4. Results of Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time forEach Phisher

Figure 12 shows the average hold time, UD time, and DD time for all phishing attemptsto gain unauthorised access to the system. It should be noted that the system recorded thetimestamp in milliseconds, as mentioned previously. The results prove the effectiveness ofthe proposed system for preventing phishing attacks.

Future Internet 2022, 14, 82 17 of 21Future Internet 2022, 14, x FOR PEER REVIEW 17 of 21

Figure 12. Average hold time, UD time, and DD time for each phisher.

Figures 10 and 12 illustrate the unique rhythm of each participant (genuine users and

phishers) generated when solving the text-based CAPTCHA. Although the values were

close, there were no duplicates. Keystroke dynamics were, therefore, found to be effective

in preventing phishing attacks. In addition, Figure 11 shows that some phishers made

more than one attempt to gain access to the system. These repeated attempts indicate that,

although the phishers obtained information about a genuine user’s credentials, it was dif-

ficult to mimic the genuine user’s typing dynamics when solving the CAPTCHA because

the participants’ typing rhythms were recorded in milliseconds. Moreover, the proposed

system appeared to have many usability advantages over traditional systems in terms of

its ability to operate in stealth mode, together with its low cost, lack of additional hard-

ware, user acceptance, and ease of integration into existing security systems. However,

keystroke dynamics have two disadvantages: lower accuracy (they are affected by exter-

nal factors, such as fatigue or stress) and lower permanence (a user’s typing pattern may

change over time). The proposed system overcame these disadvantages as the experiment

was conducted in a controlled environment to prove that each user’s typing pattern dif-

fered. In addition, the system stored new user profiles with each successful attempt to

update the user profiles stored in the database. From Table 5, it shall be inferred that the

proposed model is capable of preventing phishing attacks through identifying each user

from their own typing pattern as well, with promising results.

Table 5. Performance of the proposed system.

Approach TPR TNR FPR FNR Accuracy

The proposed approach 82.2% 100% 0% 17.8% 85%

Furthermore, the proposed approach was compared with those of previous studies,

such as Refs. [21–23], in terms of ease of use, cost-effectiveness, observed popularity, and

general security. Each of these terms is briefly explained below, as defined in Ref. [42],

while the comparison is shown in Table 6.

▪ Ease of use is a basic concept referring to the facility of the authentication method

adopted in terms of the level of user acceptance and system availability.

▪ Cost-effectiveness refers to an authentication method that provides excellent results

without requiring high expenditure.

▪ Observed popularity indicates the percentage popularity of the method used as com-

pared to other types of authentication.

0.0

500.0

1000.0

1500.0

2000.0

2500.0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45

Tim

e (i

n m

illi

seco

nd

s)

Phisher

Average Hold Time, UD time, and DD time for Each Phisher

Average of Hold Time

Average of Up-Down Time

Average of Down-Down Time

Figure 12. Average hold time, UD time, and DD time for each phisher.

Figures 10 and 12 illustrate the unique rhythm of each participant (genuine users andphishers) generated when solving the text-based CAPTCHA. Although the values wereclose, there were no duplicates. Keystroke dynamics were, therefore, found to be effectivein preventing phishing attacks. In addition, Figure 11 shows that some phishers mademore than one attempt to gain access to the system. These repeated attempts indicatethat, although the phishers obtained information about a genuine user’s credentials, it wasdifficult to mimic the genuine user’s typing dynamics when solving the CAPTCHA becausethe participants’ typing rhythms were recorded in milliseconds. Moreover, the proposedsystem appeared to have many usability advantages over traditional systems in terms of itsability to operate in stealth mode, together with its low cost, lack of additional hardware,user acceptance, and ease of integration into existing security systems. However, keystrokedynamics have two disadvantages: lower accuracy (they are affected by external factors,such as fatigue or stress) and lower permanence (a user’s typing pattern may changeover time). The proposed system overcame these disadvantages as the experiment wasconducted in a controlled environment to prove that each user’s typing pattern differed.In addition, the system stored new user profiles with each successful attempt to updatethe user profiles stored in the database. From Table 5, it shall be inferred that the proposedmodel is capable of preventing phishing attacks through identifying each user from theirown typing pattern as well, with promising results.

Table 5. Performance of the proposed system.

Approach TPR TNR FPR FNR Accuracy

The proposed approach 82.2% 100% 0% 17.8% 85%

Furthermore, the proposed approach was compared with those of previous studies,such as Refs. [21–23], in terms of ease of use, cost-effectiveness, observed popularity, andgeneral security. Each of these terms is briefly explained below, as defined in Ref. [42],while the comparison is shown in Table 6.

n Ease of use is a basic concept referring to the facility of the authentication methodadopted in terms of the level of user acceptance and system availability.

n Cost-effectiveness refers to an authentication method that provides excellent resultswithout requiring high expenditure.

n Observed popularity indicates the percentage popularity of the method used ascompared to other types of authentication.

Future Internet 2022, 14, 82 18 of 21

n General security refers to an evaluation of the safety provided by the authenticationmethod used.

Table 6. Comparison between some of the related research and proposed work.

Approach Ease of Use Cost-Effectiveness

ObservedPopularity

GeneralSecurity

[21]**

** *****

** **

[22] **** *** ******

[23]***

** ****

** ***

Currentapproach

*** *** *** ****** *** *** **

In addition, the proposed study provides a high level of security to protect againsttraditional attacks, such as brute force, shoulder surfing, guessing, and dictionary attacks,as explained in the following:

n Brute force: the attackers attempt to try all possible combinations of characters in thehopes to find the username and password.

− If the attacker finds the username and password, they must know the typingrhythm of the user when solving the CAPTCHA.

n Shoulder surfing: the attacker observes the typing pattern of the victim when solvingthe CAPTCHA to try mimicking typing rhythms.

− Although it is possible to mimic a user’s typing pattern in fixed-text systems, itis more difficult in free-text systems because it requires the attacker to observethe victim’s behaviour for the duration of their logged-in session. Therefore, it isquite rare for an attacker to be able to replicate all typing rhythms of users.

n Guessing: the attacker tries to guess the correct password by using the most commonwords that they expect all the users used.

− The attacker needs to obtain the typing pattern of the user to pass the CAPTCHAtest.

n Dictionary attacks: the attacker attempts to defeat the authentication mechanism bydetermining the correct password from a large number of possibilities.

− If the attacker finds the correct password, they must know the typing rhythm ofthe user when solving the CAPTCHA

However, several previous studies [26,27] used an image-based CAPTCHA to preventphishing attacks, whereas the proposed system used a text-based CAPTCHA. Thus, thepresent results cannot be compared to this existing work. Besides, research by the authorsof [24] used voice biometrics to protect Mpesa (a Kenyan mobile banking system) fromfraud in a mobile money-transfer app. A voice trait, considered as a behaviour biometricin some cases, was adopted in the above study. However, this is incompatible with theproposed approach, which involves typing patterns. Moreover, the Mpesa approach relatedto a mobile platform, unlike the current study, which was conducted on a Web platform.

In contrast, other studies [31–33] used free-text keystroke dynamics for English lan-guage input, and time features were applied to distinguish between user samples. It is,therefore, very tempting to compare the proposed work with previous research in the samedomain, but all the earlier studies differ in the number of participants, extracted features,environment in which the experiments were conducted, and the classification methodsapplied. Hence, it is not possible to rely on the credibility of these comparisons. That said,the three above-mentioned studies are similar to this current work in some respects, such as

Future Internet 2022, 14, 82 19 of 21

in controlling the environment, the use of English language free text, and the application ofEuclidean distance as a classification method. Therefore, a simple comparison may be madeto identify limitations in the benchmarking of these present results against those reportedin Refs. [31–33], as illustrated in Table 7. However, the proposed approach outperformsthese previous studies in terms of sample size and in determining the appropriate thresholdfor enhancing system performance. Note that equal error rate (EER) is a value of FAR/FRRwhere FAR equals FRR.

Table 7. Summary of comparison results.

Study Sample Size FAR FRR EER

[31] 15 0.21 0.17 0.19

[33] 30 0.22 0 0.11

[32] 21 0.245 (SVM) 0.281 (DT) 0.613 (SVM) 0.702 (DT) 0.429 (SVM) 0.491 (DT)

Current study 75 0.17 0.0 0.08

8. Conclusions and Future Work

Phishing is a constant and complex issue, with the capacity to do extensive damage tothe targeted party and maximise gains for the attackers. It has already led to injurious lossesin the business, government, and technology sectors. Nevertheless, to date, the mechanismsfor preventing phishing attacks have proved insufficient, leaving practical challenges to beovercome. Therefore, this paper proposes an approach to preventing phishing attacks usingCAPTCHA keystroke dynamics. In particular, a combination of three timing features wasdeployed to distinguish between samples of authenticated users and phishers: keystrokeduration, di-graph duration, and latency. In total, 75 users participated in the experimentover a period of six weeks, and Euclidean distance was used to verify the user samples.

The results offer sufficient evidence of the effectiveness of capturing keystroke dynam-ics to prevent phishing, indicating that this approach is worthy of further study. Moreover,the proposed approach outperforms all previous work in the literature in terms of speed, ac-ceptance, cost-effectiveness, and ease of integration with other systems. Also demonstratedwas the benefit of collecting adequate samples in the enrolment phase, where the userswere identified. Additionally, the experiment proved that the performance of keystrokedynamics could be improved by determining the appropriate threshold. This will be thefocus of the author’s research going forward.

In future work, experiments will be conducted in an online environment with alarger number of participants in order to evaluate the proposed system in a real-worldenvironment and obtain more accurate results. In addition, it will also be interesting toinvestigate other attacks that may face the proposed system when applied in the real world,such as replay attacks. Moreover, further features will be included to improve the results,and additional classifiers and more advanced methodologies will be applied.

Author Contributions: Conceptualization, E.K.A., A.M.A. and S.A.A.; Data curation, E.K.A.; Formalanalysis, E.K.A.; Funding acquisition, E.K.A.; Investigation, A.M.A. and S.A.A.; Methodology, E.K.A.and A.M.A.; Project administration, A.M.A. and S.A.A.; Software, E.K.A.; Supervision, A.M.A. andS.A.A.; Validation, E.K.A., A.M.A. and S.A.A.; Writing—original draft, E.K.A.; Writing—review& editing, A.M.A. and S.A.A. All authors have read and agreed to the published version of themanuscript.

Funding: This research received no external funding.

Data Availability Statement: Not applicable, the study does not report any data.

Acknowledgments: The researchers would like to thank the Deanship of Scientific Research, QassimUniversity for funding the publication of this project.

Conflicts of Interest: The authors declare no conflict of interest.

Future Internet 2022, 14, 82 20 of 21

References1. Basit, A.; Zafar, M.; Liu, X.; Javed, A.R.; Jalil, Z.; Kifayat, K. A comprehensive survey of AI-enabled phishing attacks detection

techniques. Telecommun. Syst. 2021, 76, 139–154. [CrossRef] [PubMed]2. Uma, M.; Padmavathi, G. A survey on various cyber attacks and their classification. Int. J. Netw. Secur. 2013, 15, 390–396.3. Lastdrager, E.E.H. Achieving a consensual definition of phishing based on a systematic review of the literature. Crime Sci. 2014, 3,

1–10. [CrossRef]4. Jakobsson, M.; Myers, S. Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Thef ; John Wiley &

Sons: Hoboken, NJ, USA, 2006.5. APWG. Phishing Activity Trends Report: 3rd Quarter 2021. 2021. Available online: https://docs.apwg.org/reports/apwg_

trends_report_q3_2021.pdf?_ga=2.147528119.149518382.1644108193-680326765.1644108193&_gl=1*cr9iea*_ga*NjgwMzI2NzY1LjE2NDQxMDgxOTM.*_ga_55RF0RHXSR*MTY0NDEwODE5My4xLjAuMTY0NDEwODE5My4w (accessed on 23 February2022).

6. Hewage, C. Coronavirus pandemic has unleashed a wave of cyber attacks-here’s how to protect yourself. Conversation 2020, 31.Available online: https://theconversation.com/coronavirus-pandemic-has-unleashed-a-wave-of-cyber-attacks-heres-how-to-protect-yourself-135057 (accessed on 23 February 2022).

7. Federal Bureau of Investigation-Internet Crime Complaint Center (IC3). 2020 Internet Crime Report. 2021. Available online:https://www.ic3.gov/Media/PDF/AnnualReport/2020_IC3Report.pdf (accessed on 23 February 2022).

8. Kulikova, T.; Shcherbakova, T.; Sidorina, T. Spam and phishing in Q1 2021. Available online: https://securelist.com/spam-and-phishing-in-q1-2021/102018/ (accessed on 23 February 2022).

9. Kulikova, T.; Shcherbakova, T.; Sidorina, T. Spam and phishing in 2020. Secur. Kapersky 2021. Available online: https://securelist.com/spam-and-phishing-in-2020/100512/ (accessed on 23 February 2022).

10. Ponemon, L. The 2021 Cost of Phishing Study. 2021. Available online: https://www.proofpoint.com/us/resources/analyst-reports/ponemon-cost-of-phishing-study (accessed on 23 February 2022).

11. Stanford University IT. University IT Launches Phishing Awareness Service. 2016. Available online: https://uit.stanford.edu/news/university-it-launches-phishing-awareness-service (accessed on 23 February 2022).

12. Buza, K. Person identification based on keystroke dynamics: Demo and open challenge. CEUR Workshop Proc. 2016, 1612, 161–168.13. Brodic, D.; Amelio, A. The CAPTCHA—Perspectives and Challenges Perspectives and Challenges; Springer Nature: Cham, Switzerland,

2020.14. Ahn, L.V.; Blum, M.; Hopper, N.J.; Langford, J. CAPTCHA: Using Hard AI Problems for Security; Lecture Notes in Computer Science;

Springer Nature: Berlin/Heidelberg, Germany, 2003; Volume 2656, pp. 294–311. [CrossRef]15. Varshney, G.; Misra, M.; Atrey, P.K. A survey and classification of web phishing detection schemes. Secur. Commun. Networks

2016, 9, 6266–6284. [CrossRef]16. Masri, R.; Aldwairi, M. Automated Malicious Advertisement Detection using VirusTotal, URLVoid, and TrendMicro. In

Proceedings of the 2017 8th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 4–6April 2017; pp. 336–341.

17. Jain, A.K.; Gupta, B.B. A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIPJ. Inf. Secur. 2016, 2016, 1–11. [CrossRef]

18. Kumar, A.; Gupta, J.B.B. Towards detection of phishing websites on client-side using machine learning based approach. Telecom-mun. Syst. 2017, 68, 687–700. [CrossRef]

19. Mao, J.; Li, P.; Li, K.; Wei, T.; Liang, Z. BaitAlarm: Detecting phishing sites using similarity in fundamental visual features. InProceedings of the 2013 5th International Conference on Intelligent Networking and Collaborative Systems, Xi’an, China, 9–11September 2013; pp. 790–795. [CrossRef]

20. Tirfe, D.; Anand, V.K. A survey on trends of two-factor authentication. In Contemporary Issues in Communication, Cloud and BigData Analytics; Springer: Singapore, 2022; pp. 285–296. [CrossRef]

21. Khan, A.A. Preventing Phishing Attacks using One Time Password and User Machine Identification. Int. J. Comput. Appl. 2013,68, 7–11.

22. Lee, Y.S.; Kim, N.H.; Lim, H.; Jo, H.K.; Lee, H.J. Online Banking Authentication system using Mobile-OTP with QR-code. InProceedings of the 5th International Conference on Computer Sciences and Convergence Information Technology ICCIT 2010,Seoul, Korea, 30 November–2 December 2010; pp. 644–648. [CrossRef]

23. Patel, Y.; Diana, M.S.C. Fingerprint authentication technique to prevent phishing using pattern matrix. Int. J. Eng. Res. Dev. 2013,6, 88–92.

24. Jepkemboi, C.L. Enhancing Security of Mpesa Transactions by Use of Voice Biometrics. Ph.D. Thesis, United States InternationalUniversity-Africa, Nairobi, Kenya, May 2018.

25. Hassan, M.A.; Shukur, Z. A secure multi factor user authentication framework for electronic payment system. In Proceedings ofthe 2021 3rd International Cyber Resilience Conference (CRC) 2021, Langkawi Island, Malaysia, 29–31 January 2021. [CrossRef]

26. James, D.; Philip, M. A novel anti phishing framework based on visual cryptography. In Proceedings of the 2012 InternationalConference on Power, Signals, Controls and Computation, Thrissur, India, 3–6 January 2012; pp. 207–218.

27. Krishnamoorthy, S.K.; Thankappan, S. A novel method to authenticate in website using CAPTCHA-based validation. Secur.Commun. Netw. 2016, 9, 5934–5942. [CrossRef]

Future Internet 2022, 14, 82 21 of 21

28. Nanglae, N.; Bhattarakosol, P. A study of human bio-detection function under text-based CAPTCHA system. In Proceedings ofthe 11th IEEE/ACIS International Conference on Computer and Information Science, Shanghai, China, 30 May–1 June 2012; pp.139–144. [CrossRef]

29. Costigan, N. The growing pain of phishing: Is biometrics the cure? Biom. Technol. Today 2016, 2016, 8–11. [CrossRef]30. Karnan, M.; Akila, M.; Krishnaraj, N. Biometric personal authentication using keystroke dynamics: A review. Appl. Soft Comput. J.

2011, 11, 1565–1573. [CrossRef]31. Alsultan, A.; Warwick, K. User-friendly free-text keystroke dynamics authentication for practical applications. In Proceedings of

the 2013 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2013, Washington, DC, USA, 13–16 October 2013;pp. 4658–4663. [CrossRef]

32. Alsultan, A.; Warwick, K.; Wei, H. Free-text keystroke dynamics authentication for Arabic language. IET Biom. 2016, 5, 164–169.[CrossRef]

33. Alsuhibany, S.A.; Almushyti, M.; Alghasham, N.; Alkhudier, F. Analysis of free-Text keystroke dynamics for Arabic languageusing Euclidean distance. In Proceedings of the 2016 12th International Conference on Innovations in Information Technology, IIT2016, Al-Ain, United Arab Emirates, 28–30 November 2016; pp. 185–190. [CrossRef]

34. Garrett, P.B. Linear algebra I: Dimension. In Number Theory, Trace Formulas and Discrete Groups; Academic Press: Cambridge, MA,USA, 2014.

35. Rouaud, M. Probability, Statistics and Estimation: Propagation of Uncertainties, p.191. 865 Creative Commons. 2013. Availableonline: http://www.incertitudes.fr/book.pdf (accessed on 23 February 2022).

36. Alsuhibany, S.A. Optimising CAPTCHA generation. In Proceedings of the 2011 Sixth International Conference on Availability,Reliability and Security, Washingot, DC, USA, 22–26 August 2011. [CrossRef]

37. Bursztein, E.; Moscicki, A.; Fabry, C.; Bethard, S.; Mitchell, J.C.; Jurafsky, D. Easy does it: More usable CAPTCHAs. In Proceedingsof the Conference on Human Factors in Computing Systems-Proceedings, Toronto, CA, USA, 26 April–1 May 2014; pp. 2637–2646.[CrossRef]

38. Alsultan, A.; Warwick, K. Keystroke Dynamics Authentication: A Survey of Free-text Methods. Int. J. Comput. Sci. 2013, 10, 1–10.Available online: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=6B582DD715E9CD8F474394CED80C2A56?doi=10.1.1.412.2833&rep=rep1&type=pdf%5Cnhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.412.2833 (accessed on 23February 2022).

39. Killourhy, K.S.; Maxion, R.A. Comparing anomaly-detection algorithms for keystroke dynamics. In Proceedings of the Interna-tional Conference on Dependable Systems and Networks (DSN), Lisbon, Portugal, 29 June 2009; pp. 125–134. [CrossRef]

40. Alsuhibany, S.A.; Alreshoodi, L.A. Detecting human attacks on text-based CAPTCHAs using the keystroke dynamic approach.IET Inf. Secur. 2021, 15, 191–204. [CrossRef]

41. Alsultan, A.; Warwick, K.; Wei, H. Improving the performance of free-text keystroke dynamics authentication by fusion. InProceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, Lisbon, Portugal, 29 June–2July 2009; pp. 1024–1033.

42. Idrus, S.Z.S.; Cherrier, E.; Rosenberger, C.; Schwartzmann, J.J. A Review on Authentication Methods. Aust. J. Basic Appl. Sci. 2013,7, 95–107.