Anti-bot Strategies Based on Human Interactive Proofs

19
Anti-bot Strategies Based on Human Interactive Proofs Alessandro Basso and Francesco Bergadano 15 Contents 15.1 Automated Tools ......................... 273 15.2 Human Interactive Proof ................. 275 15.3 Text-Based HIPs ......................... 276 15.4 Audio-Based HIPs ....................... 278 15.5 Image-Based HIPs ....................... 279 15.5.1 ESP-PIX ........................... 279 15.5.2 Bongo ............................. 279 15.5.3 KittenAuth ......................... 280 15.5.4 Asirra .............................. 280 15.5.5 IMAGINATION .................... 282 15.5.6 ARTiFACIAL ....................... 282 15.5.7 EasyPIC and MosaHIP .............. 283 15.5.8 Issues of Image-Based HIPs .......... 287 15.6 Usability and Accessibility ................ 288 15.7 Conclusion .............................. 289 References .................................... 289 e Authors .................................. 291 Human Interactive Proofs (HIPs) are a class of tests used to counter automated tools. HIPs are based on the discrimination between actions executed by hu- mans and activities undertaken by computers. Sev- eral types of HIPs have been proposed, based on hard-to-solve Artificial Intelligence problems, and they can be classified in three major categories: text- based, audio-based and image-based. In this chap- ter, we give a detailed overview of the currently used anti-bot strategies relying on HIPs. We present their main properties, advantages, limits and effective- ness. 15.1 Automated Tools e rapid and extremely large growth of Internet has determined the necessity of automatize several web- related activities, by means of properly devised tools. Some of these programs are created with the pur- pose of supporting humans in carrying out time- consuming and boring operations. Instead, others are developed with the aim of undertaking activ- ities which are considered illegal or inappropriate with commonly accepted rules and habits of web utilization [15.1]. Being a serious threat to security and data integrity of web applications and Internet sites, automated tools have been constantly fought by the Internet community, through the use of sev- eral, more or less effective, defense strategies. An automated tool, also known as robot (bot) or scanner, is a computer program that executes a se- quence of operations continuously, without the need of human interaction [15.1]. A typical example of a web robot is a mirroring tool, a program that auto- matically performs a copy of a web site by download- ing all its resources. It must traverse the web’s hyper- text structure of a retrieved document and to fetch recursively all the referenced documents. Another common name for such a program is “spider”. How- ever, it should be noted that such a term may be mis- leading, since the word “spider” gives the erroneous impression that the robot itself moves through the Internet. In reality, robots are implemented as a sin- gle soſtware system that retrieves information from remote sites using standard web protocols [15.2]. e increasing complexity of Internet services and the lack of information regarding secure web ap- plication development are among the reasons which motivate the existence of bots. A web bot is gener- 273 © Springer 2010 , Handbook of Information and Communication Security (Eds.) Peter Stavroulakis, Mark Stamp

Transcript of Anti-bot Strategies Based on Human Interactive Proofs

Anti-bot Strategies Basedon Human Interactive Proofs

Alessandro Basso and Francesco Bergadano

15

Contents

15.1 Automated Tools . . . . . . . . . . . . . . . . . . . . . . . . . 273

15.2 Human Interactive Proof . . . . . . . . . . . . . . . . . 275

15.3 Text-Based HIPs . . . . . . . . . . . . . . . . . . . . . . . . . 276

15.4 Audio-Based HIPs . . . . . . . . . . . . . . . . . . . . . . . 278

15.5 Image-Based HIPs . . . . . . . . . . . . . . . . . . . . . . . 27915.5.1 ESP-PIX . . . . . . . . . . . . . . . . . . . . . . . . . . . 27915.5.2 Bongo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27915.5.3 KittenAuth . . . . . . . . . . . . . . . . . . . . . . . . . 28015.5.4 Asirra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28015.5.5 IMAGINATION . . . . . . . . . . . . . . . . . . . . 28215.5.6 ARTiFACIAL . . . . . . . . . . . . . . . . . . . . . . . 28215.5.7 EasyPIC and MosaHIP . . . . . . . . . . . . . . 28315.5.8 Issues of Image-Based HIPs . . . . . . . . . . 287

15.6 Usability and Accessibility . . . . . . . . . . . . . . . . 288

15.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

The Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

Human Interactive Proofs (HIPs) are a class of testsused to counter automated tools. HIPs are based onthe discrimination between actions executed by hu-mans and activities undertaken by computers. Sev-eral types of HIPs have been proposed, based onhard-to-solve Artificial Intelligence problems, andthey can be classified in three major categories: text-based, audio-based and image-based. In this chap-ter, we give a detailed overview of the currently usedanti-bot strategies relying on HIPs. We present theirmain properties, advantages, limits and effective-ness.

15.1 Automated Tools

The rapid and extremely large growth of Internet hasdetermined the necessity of automatize several web-related activities, bymeans of properly devised tools.Some of these programs are created with the pur-pose of supporting humans in carrying out time-consuming and boring operations. Instead, othersare developed with the aim of undertaking activ-ities which are considered illegal or inappropriatewith commonly accepted rules and habits of webutilization [15.1]. Being a serious threat to securityand data integrity of web applications and Internetsites, automated tools have been constantly foughtby the Internet community, through the use of sev-eral, more or less effective, defense strategies.

An automated tool, also known as robot (bot) orscanner, is a computer program that executes a se-quence of operations continuously, without the needof human interaction [15.1]. A typical example ofa web robot is amirroring tool, a program that auto-matically performs a copy of aweb site by download-ing all its resources. It must traverse the web’s hyper-text structure of a retrieved document and to fetchrecursively all the referenced documents. Anothercommon name for such a program is “spider”. How-ever, it should be noted that such a termmay bemis-leading, since the word “spider” gives the erroneousimpression that the robot itself moves through theInternet. In reality, robots are implemented as a sin-gle software system that retrieves information fromremote sites using standard web protocols [15.2].

The increasing complexity of Internet servicesand the lack of information regarding secureweb ap-plication development are among the reasons whichmotivate the existence of bots. A web bot is gener-

273© Springer 2010

, Handbook of Information and Communication Security(Eds.)Peter Stavroulakis, Mark Stamp

274 15 Anti-bot Strategies Based on Human Interactive Proofs

ally devised to exploit commonweaknesses and vul-nerabilities found in web applications, with the pur-pose to harm the integrity and security of online ser-vices. In some cases, however, it limits its actions toa mere gathering of various information, which arelater used for different purposes. An interesting ex-ample is the email harvester, a bot with the specifictask of collecting email addresses from web pages,which are later used for spamming activities (i.e.,sending unsolicited email messages).

Except for operations which are driven by legiti-mate purposes, such as web sites indexing, vulnera-bility assessment, web automation, online auctionsand chat management, there exist other activitiescarried out by automated tools which are considereda security threat. Among themost commonones, wecan identify [15.3]:

• Content mirroring, i.e., automatic download ofall resources from a web site with the purpose ofduplicating its content

• Extraction of specific information by means ofagents which systematically mine databases

• Automatic registration of email accounts usedfor illegal activities

• Harvesting of email addresses from web pages• Performing fraud against web auction systems• Flooding web blogs, forums or wikis with unso-

licited messages• Execution of Denial of Service attacks• Brute-force guessing of web service authentica-

tion credentials• Alteration of data in web applications, like falsi-

fication of product rating, modification of pollsand click-fraud activity.

Together with the evolution of web applications, au-tomated tools have also experienced a continuousimprovement, with the purpose to bypass securitymeasures adopted in web development. Basically,we can identify three generations of automated pro-grams [15.1]:

• the 1st generation includes tools able to automat-ically retrieve a set of predefined resources, with-out trying to interpret the content of downloadedfiles.

• the 2nd generation includes tools able to ana-lyze HTML pages searching for links to otherresources and elaborate simple client-side code.In some cases, they can also record and repeata series of user clicks and data insertion, to

mimic user’s behavior during log in proceduresor repetitive web tasks.

• the 3 generation includes tools able to fully in-terpret client-side languages andunderstandwebcontents in a fashion more similar to what hu-man beings do, being granted of a form of “intel-ligence”.

While automated tools of 1st and 2nd generationsare widespread, programs belonging to the 3rd gen-eration are uncommon.However, since web bots areacquiring an always increasing intelligence and canmimic human actions with a high degree of fidelity,the future of automated tools is quickly movingtowards the development of more complex andsophisticated programs [15.1]. Therefore, propercountermeasures are required to prevent the activityof such intelligent agents.

However, stopping bots activity is not a simplegoal to achieve, due to several constraints whichcharacterize the web context [15.4]. In particular:

(1) Web browsers cannot be modified by in-stalling add-ons.

(2) Schemes based on locking the IP address ofa possible attacker are often not effective, sinceclients are likely to share/reuse the same IPaddress due to Network Address Translation(NAT) nodes, proxies, user mobility and localaddress assignment (DHCP).

(3) It is often impossible to authenticate users be-cause they are unlikely to own cryptographictokens and are not given passwords due topractical limitations imposed in several cir-cumstances.

(4) Techniques based on keystoke dynamics orbiometrics are not always applicable and oftenunreliable.

The main property of automated tools is theability to imitate human behavior in performingweb-related activities. For this reason, web appli-cations are often unable to effectively tell humanactivities and computer actions apart. Several anti-bot strategies relying on the HTTP protocol havebeen proposed, but none can be considered securefrom a practical and theoretical point of view [15.3].Currently, the only effective way to stop automatedtools is through he use of Human Interactive Proofs(HIPs), i.e., a specific class of test devised to tellhumans and computers apart [15.1]. Therefore,human-computer discrimination can be considered

15.2 Human Interactive Proof 275

a key factor in preventing automated accesses toweb resources.

The rest of this work is organized as follows: inSect. 15.2, we describe the concept of HIP and itsproperties. In Sects. 15.3 – 15.5 we give a detailedexplanation about the most common categories ofexisting HIPs: text-based, audio-based and image-based respectively. In Sect. 15.6, we discuss the us-ability issues related to the use of HIPs. Finally, inSect. 15.7, we give our conclusions and point to di-rections for future research.

15.2 Human Interactive Proof

Human Interactive Proof (HIP) are a class of testsbased on the discrimination between actions ex-ecuted by humans and activities undertaken bycomputers [15.5]. Such tests have been defined asa proof that a human being can devise withoutspecial equipment, whereas a computer cannoteasily create [15.6, 7]. More generally, HIPs area specific class of challenge-response protocolswhich allow a human being to perform a secureauthentication process in order to be recognized asa member of a group, without requiring a password,biometric data, electronic key, or any other physicalevidence [15.8].

Another common term used to refer to theconcept of HIP is Completely Automated PublicTuring Test to Tell Computers and Humans Apart(CAPTCHA) [15.9]. Basically, a CAPTCHA isa HIP whose purpose is to distinguish betweenhumans and computers. Another way to refer tosuch a category of tests is Automated Turing Test(ATT) [15.10]. The idea of the Turing Test datesback to 1950, when Turing proposed a test todetermine whether a machine can be considered in-telligent [15.11]. In its original definition, a humanjudge asks a set of questions to both a computer anda human, without previously knowing the identityof the converser. If there is no way for the judgeto decide which of them is human based on theiranswers, the machine can be considered intelligent.Even after more than 50 years of research in thefield of Artificial Intelligence (AI), no machine iscurrently able to pass the Turing Test. This factimplies the existence of a considerable intelligencegap between human and machine, which can bepossibly used to devise effective methods to tellthem apart. Unlike the traditional Turing Test, an

ATT requires that a computer has to decide whetherone is human or not, rather than proving that one isa human to another human.

The idea related to HIP was initially introducedby Naor in an unpublished work dating back to1996 [15.12] and it was implemented for the firsttime by Alta Vista in 1998, with the purpose to pre-vent automatic submission of URLs to its search en-gine [15.13]. In 2000, the concept of CAPTCHAwas formalized by von Ahn, Blum and Langford, af-ter being challenged by Yahoo’s chief scientist UriMamber to devise a method able to keep bots outof Yahoo’s web chat [15.9]. According to the re-searchers, a CAPTCHA is “a cryptographic protocolwhose underlying hardness assumption is based onan AI problem”; more formally [15.9]:

Definition 1. A CAPTCHA is a test C considered(α, β) – human executable if at least an α portionof the human population has success greater than βover C.

Such a statement, however, can only be proven em-pirically, and the success in passing the test dependson specific characteristics of the human population,such as the spoken language or sensory disabilities.

The main concept behind a CAPTCHA is theattempt to exploit an existing gap between humanand computer ability with respect to some problem,which is easy to solve for humans but not yet solvableby computers. Generally, such a gap is identified ina specific area of AI, such as computer vision or pat-tern and speech recognition. It is not important howlarge the gap is, since as long as this gap exists it canbe potentially used as a primitive for security [15.10].In addition, suppose that an adversary improves herskills and eventually finds an effective way to solve it,then a different set of hard AI problems can be usedto devise a new test [15.9]. The intrinsic side effectof using CAPTCHAs is the possibility of inducingadvances in the field of AI. Von Ahn and Blum de-fine this advantage as a win–win situation: if attack-ers cannot defeat a CAPTCHA, it can be an effec-tive anti-bot protection; otherwise, a hard problemis solved, advancing the AI research [15.9].

In the last few years, several types of CAPTCHAshave been proposed, based on hard-to-solve AIproblems [15.10]. A problem is defined hard whenthe general consensus among the community work-ing on it is that there are no effective ways to solveit. This idea is similar to the concept of securityin modern cryptosystems, where a cryptographic

276 15 Anti-bot Strategies Based on Human Interactive Proofs

algorithm is considered secure since cryptographersagree that the problem at the basis of the algorithm(e.g., the factorization of a 1024-bit number) is notsolvable [15.9].

Not all hard AI problems are suitable for thisCAPTCHAs. According to [15.10], the follow-ing properties are required to create an effectiveCAPTCHA:

• There should be an automatic way to generate thetest as well as to check the solution.

• The test should be taken quickly and easily by hu-man users.

• The test should avoid user discrimination, ac-cepting all human users with high reliability.

• Virtually no machine should be able to solve it.• It should resist attacks even if the algorithm and

its data are in the public domain.

Originally, the Turing Test required a conversationalinteraction between a human judge and two play-ers, one of which was a computer. A CAPTCHA dif-fers from the Turing Test in terms of sensory abilitiesinvolved. In particular, we can identify three majorcategories of HIPs: text-based, audio-based and im-age-based, depending on the specific type of task re-quired [15.14].

15.3 Text-Based HIPs

Text-based HIPs are currently the most widespreadand well known anti-bot tests in the Internet com-munity.They exploit the gap between computer pro-grams and human being’s ability to extract distortedand corrupted text from pictures. These images aregenerally easily readable for human beings, but theircontent is supposed to be illegible even to the bestOCR software [15.15].

Text-based CAPTCHAs are generated by ran-domly choosing a sequence of letters or words froma dictionary and rendering a transformed imagecontaining these data. Several transformations canbe applied to the image, such as blur filters, gridsand gradients addition, generation of backgroundnoise or random distortions. The user is requiredto recognize the text displayed by the challenge im-age and type it within an input field next to thepicture. In case the letters typed are wrong, a newCAPTCHA is presented, otherwise the user is rec-ognized as a member of the human group.

a

c

b

Fig. 15.1a–c CAPTCHAs developed at CMU: (a) ex-amples taken from EZ-Gimpy, (b) examples taken fromGimpy, (c) an example of reCAPTCHA

Well known examples of such tests are EZ-Gimpyand the improved version Gimpy, both shown inFig. 15.1. Devised at CMU in 2000 [15.10], theywereextensively adopted on many Internet web sites un-til, some years later, Greg Mori and Jitendra Ma-lik from the UC Berkeley Computer Vision Groupfound an effective method to break them. Usingcomputer vision techniques specifically developedto solve generic object recognition problems, theywere able to correctly identify the words of EZ-Gimpy 92% of the time, whereas the harder Gimpywas beaten 33% of the time [15.16].

Recently, CMU’s CAPTCHAs have been im-proved with the introduction of a new type oftext-based HIP, called reCAPTCHA [15.17]. Thistest is based on two different words, taken fromdigitalized texts: one is not recognizable by OCRsoftware, whilst for the other the answer is known.Both words are inserted in an image which is thenvisually distorted. The user is required to typeboth the words in an input field. If she solves theone for which the answer is known, the systemassumes that the other answer is correct. The sameimage is then presented to other users in differ-

15.3 Text-Based HIPs 277

ent CAPTCHAs to increase the confidence of thecorrect answer.

The human effort in solving CAPTCHAs can beused to improve the digitalization of physical booksand other papery contents. This idea is at the basisof the more general concept of Human Computa-tion, i.e., “wasted” human processing power is usedto solve problems that computers cannot yet solve.Other examples of Human Computation have beenproposed to label images or tag areas of an image, oreven determine the structure of a given protein byusing a computer game [15.18].

Other interesting CAPTCHAs are known as“BaffleText” and “ScatterType”. The former is a testbased on the psychophysics of human reading anduses nonsense English-like “pronounceable words”,which are pseudorandomly generated with the pur-pose of defending against dictionary attacks. In Baf-fleText, the word is typeset using a randomly chosentypeface and a Gestalt-motivated degradation maskis applied to it, to defend from image restorationattacks [15.8]. In ScatterType, no physics-based im-age degradations, occlusions or extranous patternsare performed. Instead, characters of the image arefragmented using horizontal and vertical cuts andthe fragments are pseudorandomly scattered byhorizontal and vertical displacement [15.19]. Themain problem with both these CAPTCHAs it theirhuman legibility, which in the case of ScatterType isonly around 53%. Their security level, however, canbe considered rather high. An example of both testsis shown in Fig. 15.2.

In the last years, new versions of text-basedCAPTCHAs have been proposed as alternativesto the less secure and broken ones. In particular,Yahoo, Google and Microsoft web sites are cur-

a

b

Fig. 15.2a,b HIP examples of BaffleText (a) and Scatter-Type (b) CAPTCHA

rently using harder tests, which are supposed togrant a higher level of security. Some examples ofsuch HIPs are shown in Fig. 15.3. They have beendevised to be resistant to a class of attacks known as“segment-then-recognize” [15.20]. With this termwe refer to a way of bypassing a HIP which is basedon two phases: initially, the word is segmented intoindividual characters by using a proper algorithm.Then, each symbol is compared to a set of previ-ously classified letters, with the purpose of findinga match.

By exploiting properly devised segment-then-recognize attacks, many CAPTCHAs used on theWorld Wide Web have been broken [15.21], asit has also been reported by specific projects like“PWNtcha” [15.22] and “aiCaptcha” [15.23]. In ad-dition, some recent anti-CAPTCHAalgorithmshavebeen able to solve also text-based HIPs displayed inFig. 15.3.This happened despite all countermeasurestaken to prevent text segmentation, such as con-trolled distortions applied to the text, lines strikingthe text and background clutter in form of varioussegmented lines [15.24–27].

At the moment, the only text-based HIP whichcan be considered unbroken is reCAPTCHA. Thereason behind this can be found in the nature ofthe text presented in the challenge. Differently fromconventional text-based HIPs, which generate theirown distorted characters, reCAPTCHAexploits nat-ural distortions that result from books text having

a

b

c

Fig. 15.3a–c Some well-known “second generation”HIPs. (a)Microsoft’sHIP, (b)Google’sHIP, and (c) Yahoo’sHIP

278 15 Anti-bot Strategies Based on Human Interactive Proofs

faded through time. In addition, the introductionof noise in the scanning process futher complicatestext recognition. Finally, the application of artifi-cial transformations, similar to those used by stan-dard CAPTCHAs, makes the challenge difficult forcomputer programs even if improved OCR tools areused [15.18].

Due to recent improvements of OCR softwarein text recognition and the above mentioned spe-cific attacks, textual HIPs have become less effec-tive in discriminating between humans and com-puters [15.28–30]. It is reasonable to suppose thatas computer vision and pattern recognition researchfurther advances, the existing gap between humansand machines in understanding corrupted and dis-torted text is going to inevitably decrease duringthe next years. Since it is unlikely that humans willimprove their ability at solving HIPs in the sametimeframe, text-based HIPs will soon become use-less and new, more effective challenges will be re-quired [15.31].

15.4 Audio-Based HIPs

CAPTCHAs belonging to this category focus oncomputer difficulties in understanding spoken lan-guage in the presence of distortion and backgroundnoise. Basically, audioHIPs exploit the superior abil-ity of the human brain in distinguishing sounds,even if they are similar in frequency and volume. Anexample of such an ability is the “cocktail party” ef-fect. In a noisy party, one can understandwhat a per-son is saying to her, even if sounds come from acrossthe room. On the contrary, a computer is unlikely tobe able to accomplish such a task, being “confused”by too many similar sounds [15.32].

The test works by randomly selecting a word ora sequence of digits, which are then rendered atrandomly spaced intervals in an audio clip. Bothmale and female voices are used in common au-dio CAPTCHAs. The user listens to the clip and isasked to report the content into a text field. If shecorrectly recognizes all spoken letters or digits, theCAPTCHA is considered passed [15.10]. To makeharder the automatic recognition of the audio con-tent, in particular when Automatic Speech Recog-nition (ASR) systems are used, appropriate coun-termeasures must be taken. In particular, the clipmay be altered by means of sound effects, degradedand/or distorted [15.33].

Listening to some of the most used audioCAPTCHAs, like those of Facebook and Google,reveals that several different kinds of effects anddistortions can be exploited to create a sufficientlylarge gap between human and computer soundrecognition ability. For example, common effectsare: reverberation, tempo and pitch change, additionof noise in form of spikes of sound as loud as thevalid spoken data, removal or substitution with whitenoise of parts of the audio signal (usually, 60ms ev-ery to 100ms), or combination of differentmale andfemale voices [15.32]. In addition, recent studies em-phasize the use of background noise in the form ofhuman speech, similar in frequency but different inlanguage from the sequence of spoken digits [15.32].Typical examples are sounds recorded in crowdedpublic places, music or words spelled in languagesdifferent from English. Theoretically, a humanbeing should easily identify such noise from thevalid data. However, according to [15.34], addingbackground noise also augments the difficulty forhuman users.This can be explained considering thatan audio CAPTCHA imposes a cognitive overloadto a human user, i.e., it requires a higher cognitiveload than the effort usually necessary to understandspoken language.

The first audio-based test was implemented byNancy Chan and afterwards presented in [15.35].Further research was conducted by Lopresti, Shih,and Kochanski, who exploited the known limita-tions of ASR systems to generate sounds that hu-mans can understand but automatic programs failto recognize [15.33]. In the following years, someworks studied the use of audioHIPs as a possible so-lution to avoid discrimination of visually impairedusers [15.36–38], and several web sites adopted sim-ple audio CAPTCHAs as a possible alternative totext-basedones, to grant accessibility.Unfortunately,some of them have been proven to be vulnerable toautomated attacks, which take advantage of their lowrobustness against ASR systems [15.39].

Current research in the field of audioCAPTCHAs is still incomplete, and improve-ments from the viewpoint of security and usabilityare still necessary. Recently, some works started toexplore the possible vulnerabilities of existing audioHIPs, suggesting how to improve them [15.32, 34].In particular, it has been proven that traditionalaudio CAPTCHAs based on distorted letters ordigits can be broken by using machine learningalgorithms [15.40]. Due to this fact, reCAPTCHA

15.5 Image-Based HIPs 279

has been recently updated with a new, more robustversion of its audio HIP. The basic idea behind thenew CAPTCHA relies in using old radio recordingsfeaturing different voices within a single clip, whichare known to be difficult to process for an ASRsystem [15.34]. A positive side effect of using sucha test is the help in digitalizing old audio recordings.On the other hand, a lot of old audio clips are quitehard to solve and require several attempts to be cor-rectly passed by a human user. Due to this problem,the test allows a certain amount of misspellings andother possible mistakes. Nevertheless, reCAPTCHAis currently the only audio-based HIP which can bestill considered secure.

Audio CAPTCHAs have not received the samelevel of attention of their visual counterpart mostlybecause they are intrinsically more difficult, hardto internationalize and require specific hardware tohear the sound. In addition, similarly to visual HIPs,an audio CAPTCHA cannot provide full accessibil-ity, since it represents a barrier for audio impairedusers.Therefore, it is currentlymainly used as amea-sure to grant accessibility where visual ones can-not, i.e., as companion to text-basedCAPTCHAs, toavoid the discrimination of visually impaired peo-ple who are otherwise prevented from accessing webcontents [15.13]. Due to this fact, its security levelmust be the same of its visual counterpart, otherwiseit couldpotentiallyweaken theprotection.Currently,this is themain issue affecting hybrid HIP systems.

15.5 Image-Based HIPs

Modern computers are still unable to accomplishmany vision-related processes that humans usuallyconsider easy tasks, like understanding a specificconcept expressed by an image or seeing and per-ceiving the world around. As pointed out by Mar-vin Minsky, “right now there’s no machine that canlook around and tell a dog from a cat” [15.41]. Thestatement dates back to 1998 and it can still be con-sidered valid [15.42]. In general, performing pat-tern recognition when the concepts to recognize aretaken from a large domain or in presence of complexbackground and segmenting an image in order toextract its internal components are some of themostdifficult challenges for a computer program. On thecontrary, those tasks are moderately easy for humanbeings [15.1].

Several image-basedHIPs have been proposed inrecent years; the most interesting ones are reportedin the rest of this section.

15.5.1 ESP-PIX

Exploiting the still large gap between computers andhumans in visual concept detection, image-basedCAPTCHAs require the user to solve a visual pat-tern recognition problem or to understand conceptsexpressed by images. A typical example of such a testwas initially proposed with the name “PIX” by vonAhn et al. on the original CAPTCHA Project’s website, and it is now known as “ESP-Pix” [15.17]. Thisprogram has a large database of labeled pictures,representing real objects. The test is built by ran-domly selecting a specific object and picking fourpictures from the database, which are then randomlydistorted and presented to the user. The test thenasks the question: “what are these pictures of?”, of-fering several options to choose. The fundamentalstep to consider ESP-Pix a CAPTCHA is to mod-ify the image prior sending it to the user. In fact,since the database is publicly available, a simpledatabase search for the images shown would revealtheir correct label, thus allowing an attacker to solvethe challenge. By applying image transformations(e.g., a random distortion), it becomes difficult fora computer program to find a match searching thedatabase [15.1]. An example is shown in Fig. 15.4.Four pictures showing cats are displayed and, to passthe test, a user has to select the option “cat” from thedrop-down menu.

15.5.2 Bongo

Another image-based HIP originally proposed byvon Ahn et al. is “Bongo” [15.10], which asks theuser to solve a pattern recognition problem in theform of two pictures containing two series of blocks.The blocks on the left series differ from those on theright and the user is required to identify the differ-ence (Fig. 15.5). Such a CAPTCHA is based on thefact that most humans should know a common baseof factual knowledge which most computers cannotlearn yet. However, this assumption cannot be con-sidered valid in general. Moreover, this CAPTCHAis vulnerable to the random guessing attack, where

280 15 Anti-bot Strategies Based on Human Interactive Proofs

Fig. 15.4 The ESP-Pix CAPTCHA

Fig. 15.5 The Bongo CAPTCHA

an adversary randomly chooses either the left or theright series. Indeed, in the official Bongo example,a random guesswould be successful 50% of the time,making this HIP highly insecure [15.1].

15.5.3 KittenAuth

Awell-known example of an image-basedHIP simi-lar to ESP-Pix is named “KittenAuth”, which chal-

lenges a visitor to select all animals of a specific spe-cies among the proposed pictures. An example isgiven in Fig. 15.6, where the HIP shows nine pic-tures of different animals and asks the user to selectall “lambs” in order to authenticate. A simple ran-dom guess has only 1 chance of succeeding out of 84,which is the total number of images used. An auto-mated tool can easily pass the test executing a ran-dom guess attack, if enough time is provided andno other countermeasures are taken.The security ofKittenAuth may be increased by adding more pic-tures or requiring the user to choose more images.However, unless a large and dynamic database of la-beled pictures is used, an attacker can successfullyobtain a copy of it, which can be manually classi-fied by a human being. This fact, together with theabsence of any image transformation, makes Kitte-nAuth scarcely effective.

15.5.4 Asirra

An improvedversionofKittenAuthwasproposedbyMicrosoft and is known as Asirra, an acronym for“Animal Species Image Recognition for RestrictingAccess” [15.42]. This HIP asks the user to recognizeall pictures showing cats within a set of 12 images

15.5 Image-Based HIPs 281

Fig. 15.6 The KittenAuth HIP

of cats and dogs (Fig. 15.7). These pictures are ran-domly selected from a database of more than 3 mil-lion pictures obtained by means of a partnershipwith “Petfinder”. It should be noted that the labeleddatabase of pictures is not fully public, since only10% of it is openly accessible through the Petfinder’spublic interface. No particular transformations aretherefore executed on the images, which improvesusability but exposes the HIP to possible attacks. In-deed, Asirra has been devised on the conjecture thatit is difficult to achieve a classification accuracy bet-ter than 60% in recognizing cats from dogs, withouta significant advance in the state of the art of currentcomputer vision techniques. Under this assumption,

the probability to solve an Asirra HIP with the ran-dom guess attack is 0.2% [15.42]. However, a recentstudy on the security of Asirra pointed out that animproved classifier, with an accuracy of 82.7%, cansolve the HIP with a probability of 10.3% [15.43].Such a value is considerably higher than the prob-ability estimated when the random guess attack isexecuted and can pose a serious threat to Asirra.

Further issues arise with the “Partial Credit Al-gorithm” (PCA), which is proposed in [15.42] to in-crease the usability of Asirra for human users. PCAallows the user who succeeds in recognizing 11 ofthe 12 images to be granted a second chance, by tak-ing another test. If the user solves the second HIP,

282 15 Anti-bot Strategies Based on Human Interactive Proofs

Fig. 15.7 The Asirra HIP

she is recognized as human. However, while thisscheme effectively helps humans to solve Asirra, italso considerably increases the probability of suc-cess of the classifier presented in [15.43] to 38%.This fact discourages the use of PCA, since it con-siderablyweakensAsirra. Despite such issues, Asirracan be considered relatively effective in performinghuman–machine discrimination.

15.5.5 IMAGINATION

IMAGINATION (IMAge Generation for INternetAuthenticaTION) is an interesting image-basedCAPTCHA that is composed by two different tests(Fig. 15.8). The system uses a database of imagesof simple concepts and a two-step user-interfacewhich allows quick testing for humans while beingexpensive for machines [15.44]. In the first step,a composite picture that contains a sequence of8 sub-images different in size is presented to theuser, who is asked to click in the center of one ofthe sub-images. This task is generally simple for hu-

Fig. 15.8 The IMAGINATION CAPTCHA

mans, whereas it becomes challenging for machinessince special effects (e.g. Floyd–Steinberg dither-ing) are applied to the image, with the purpose tosoften the boundaries of subimages. In the secondstep, only the selected image is displayed, enlarged.Controlled composite distortions are applied on it,attempting to maintain visual clarity for recognitionby humans while making the same difficult forautomated systems. The user is then asked to selectthe word that represents the concept expressed bythe image from a list of possible candidates, in a si-milar manner to ESP-Pix. According to the resultsin [15.44], the CAPTCHA appears quite effective.However, it requires a human user to solve twodistinct tests, with a consequent increased effortand probability of failure. In addition, from thegiven examples, it turns out that the overall screenspace required to display the image is very large(800 � 600 pixels) [15.1].

15.5.6 ARTiFACIAL

A further image-based HIP is ARTiFACIAL (Au-tomated Reverse Turing test using FACIAL fea-tures) [15.45]. It exploits the superior ability ofhuman beings in recognizing human faces fromimages, even when extreme lighting conditions,shading, distortions, occlusions or cluttered back-ground are applied to the picture. Face recognitionis still a difficult task to accomplish for an automatedface detector under the above mentioned condi-tions. The HIP challenges the user with a compleximage containing an automatically synthesizedand distorted human face embedded in a cluttered

15.5 Image-Based HIPs 283

Fig. 15.9 The ARTiFACIAL CAPTCHA

background, as shown in Fig. 15.9.The test is passedif the user correctly clicks in 6 specific points of theimage, corresponding to 4 eye corners and 2 mouthcorners of the human face. Several different effectsare applied to the basic face image, generated froma 3D wired model of a generic head. Specifically,translation, rotation and scaling of the head, localfacial feature deformation and generation of theconfusion texture map by moving facial features, i.e.,eye, nose andmouth, to different places of the image.

The usability test conducted in [15.45] claimsthat the 99.7% of users could complete the test in anaverage of 14 s. However, the requirement to iden-tify 6 different points on the image contributes to in-crease the probability of failure.

15.5.7 EasyPIC andMosaHIP

“EasyPic” and “MosaHIP” are two HIPs based onthe human ability to recognize a generic object dis-played by a picture. Both schemes require a user topass a CAPTCHA at each attempt to access a re-source.Therefore, any attackwhich demands the au-tomatic download of n resources, necessitates theadversary to correctly answer n CAPTCHAs. Com-pared to common textual HIPs which involve an oc-casional check, these two approaches allow a finercontrol over the entire browsing process and a con-sequent higher level of security. However, the in-

creased overhead caused by the higher number oftests submitted to the user, would probably annoy ormight be not sustainable for a human being. There-fore, usability aspects of both solutions have beentaken in particular consideration, in order to alle-viate the user from the discomfort of typing any textbefore accessing to a web content [15.1].

EasyPIC derives from the CAPTCHA knownas ESP-Pix, which can be considered stronger thantext-based HIPs, but also less user-friendly than themost common HIPs. Therefore, the main contribu-tion of EasyPic is the attempt to devise a protectionschemewhich improves security and provides a sim-ple and intuitive user interface. Indeed, a practicalHIP must not only be secure but also be human-friendly. This fact comprises the visual appeal andannoyance factor of a HIP, but also how well it uti-lizes the different ability between humans and ma-chines at solving difficult vision-related problems,such as segmentation and recognition tasks [15.1].

EasyPic is based on a drag-and-drop approach.The user has to drag the name of the specific re-source shewants to download, expressed in the formof a movable text object rather than a web link, andtodrop it on the boxwith the picture suggested in thepage.This requires theuser to recognize fromanum-ber of different pictures, representing real concepts.When the user drops a resource on the indicated im-age, that content is sent to the web browser. Other-wise, a new test is generated with a doubled numberof images, different fromthepreviousones.Themax-imumnumber of pictures used in the test is eight andis reachedwhen the user fails the test two subsequenttimes. This value is motivated by the requirement tomaintain the CAPTCHA easy enough and appropri-ate to be inserted within a common web page. Theprobability of guessing the correct image with an au-tomated tool is initially (with two images) equal to50% and is reduced to 12.5% when eight images aredisplayed. Each time a user answers correctly to thechallenge, the number of images is linearly decre-mented by one, until the minimum threshold of twopictures is reached.

InFig. 15.10 is shownanexampleof the explainedmethod. A user has to select a resources on the leftand drag-and-drop it on the box showing the chairimage. If she drops it on the image of the chair theresource is added to the “Downloadable resources”list on the right and eventually downloaded.

On the contrary, if the resource is dropped on theother image, a new test with four different pictures is

284 15 Anti-bot Strategies Based on Human Interactive Proofs

Fig. 15.10a,b Example of the EasyPic scheme: (a) text with two pictures, (b) drag-and-drop on the correct image

displayed and, if the user keeps failing the test, a newCAPTCHA with eight images is shown (Fig. 15.11).When the number of consecutive failed attemptsfrom a user exceeds a maximum threshold, specificactions are taken to slow down or stop the activityof a possible attacker. In particular, both server-side(IP-locking) and client-side (hashcash computation)techniques may be employed [15.3].

To improve the resistance to content extractionand image matching algorithms, the system appliesa number of transformations on every selected im-age, such as resize, rotation, flipping, controlled distor-tion, and dynamic shade modification of image pix-

els [15.3]. Besides generic web resources, the Easy-PIC scheme has been successfully applied also tothe specific context of Internet Polls, as reportedin [15.46].

MosaHIP, an acronym for Mosaic-based HumanInteractive Proof, is an image-based CAPTCHA,which improves the EasyPIC scheme from the pointof view of both security and usability. The mainproblem of EasyPic is in the possibility of exe-cuting attacks based on image comparison whichexploits the public nature of the database of images.MosaHIP uses virtually any large collection of im-ages to create a database of pictures, without the

15.5 Image-Based HIPs 285

a

b

Fig. 15.11a,b Details of the EasyPic scheme: (a) test with four images, (b) test with eight images

need of executing a time-consuming and not oftenprecise categorization process.

MosaHIP exploits the current computer’s diffi-culty in performing three specific vision-related ac-tivities: image segmentation in presence of complexbackground; recognition of specific concepts frombackground clutter; and shape-matching when spe-cific transformations are applied to pictures. ThisCAPTCHA challenges the user with a single largeimage (mosaic image) composed by smaller and par-tially overlapping pictures. These are taken fromtwo different categories, i.e., images representingreal, existing concepts and fake images, showingartificially-created, non-existent concepts [15.1].

The user is required to identify a specific pictureamong real images, which are pseudo-randomly po-sitioned within the mosaic image, partially overlay-ing each other. Fake pictures are created by fillingthe background with randomly-chosen colors fromthe color histogram of the real pictures and adding

a mix of randomly-created shapes, lines and vari-ous effects, such as the Floyd–Steinberg dithering.They are used only to generate background clutter,whose purpose is to make difficult the recognitionof images expressing real concepts to non-humanusers [15.1]. A controlled distortion is also appliedat the mosaic image, to increase the robustness ofthe scheme against attacks based on content extrac-tion and image comparison.The interactionwith theuser is based on a drag-and-drop approach, as it hap-pens with the EasyPic scheme.

There exist two versions of the tests. One ver-sion explicitly challenges the user to identify a spe-cific picture among the set of real ones and it iscalled “concept-based” (Fig. 15.12a). The other ver-sion is named “topmost” and asks the user to iden-tify the image representing something existing andlaying upon other pictures, not overlapped by any-thing else (Fig. 15.12b). Despite the visual similarity,the two versions of MosaHIP rely on different ideas.

286 15 Anti-bot Strategies Based on Human Interactive Proofs

a

b

Fig. 15.12a,b TheMosaHIP scheme: (a) concept-based: the resource has to be dropped on the “pinapple”, (b) topmost:the resource has to be dropped on the “scissors”

15.5 Image-Based HIPs 287

The concept-based one challenges the user to iden-tify the picture representing the concept indicatedin the test page. The topmost version of MosaHIP,instead, asks the user to identify an image located ina specific position in relation to other pictures [15.1].A remarkable contribution of the topmost approachis the possibility to avoid the classification processof the database of images, allowing the use of virtu-ally any sufficiently large collection of pictures. Thisis clearly not possible with EasyPic and several otherimage-based CAPTCHAs.

The main issue of MosaHIP is related to its eas-iness of use. Indeed, the usability tests presentedin [15.1] show that 98% of participants could cor-rectly solve the concept-based version, whilst only80% of participants were able to solve the topmostversion.

Both EasyPic andMosaHIP are mostly a first de-fense line against bots, which discriminates betweenhuman users and computers. Indeed, when thenumber of failed attempts to solve the visual chal-lenge from a single IP address exceeds a maximumthreshold, specific countermeasures are taken toslow down or stop the activity of a possible attacker.In particular, the “hashcash” technique is used toimpose a computationally intense process on theclient which has been recognized as a possibleattacker. By initially filtering potential attackersfrom normal users, the EasyPic andMosaHIP avoidthe main issues of using the hashcash method asanti-bot protection. These are identifiable in uselessresources consumption and interruptions in webbrowsing activity when not effectively needed andinsufficient computation capacity afflicting userswith older hardware and mobile devices [15.1].

The database of pictures used in both EasyPicand MosaHIP contains a very large number ofdifferent images and a considerably high numberof categories. The main problem of image-basedCAPTCHAs is the creation of a large, labeled andclassification resistant database. It should preventautomatic picture recognition by means of com-plete or partial classification of images used in thechallenge [15.14]. EasyPic solves such issues by ex-tracting frames fromdifferent videos, taken from themost common video directory services on the web,such as “YouTube”.The extraction process automat-ically selects frames in the video stream and savesthem in different images.The categorization processis done by assigning labels to all images referring tothe content of a specific video. A proper algorithm

for category selection is then used to associate thesame image to different concepts, making the auto-matic classification of all pictures hard to achieve.

Another technique for database creation, pro-posed in [15.14], suggests to download imagethumbnails from one or more image directoriesor indexing services (e.g., “Google Image”), whichusually perform a categorization on indexed im-ages. Such images can be retrieved dynamically,i.e. when a new CAPTCHA has to be generated, orprefetched, in order to increase the creation speedof the visual HIP. However, such an approach isprone to classification errors, known as mislabeling,since the indexing category is estimated by meansof the resource name [15.14]. A further problemwhich characterizes picture-based HIPs is the poly-semy, i.e. the ambiguity of an individual word thatcan be used in different contexts to express two ormore different meanings [15.14]. Both EasyPIC andMosaHIP (concept-based) can minimize the proba-bility of incurring in mislabeling and polysemy and,in general, deal with those situations when the usermay be unsure how to answer. They allow a userto obtain a new set of images, in case the picturesare not clear enough, or a match between the re-quested category and one of the displayed picturescannot be found. Moreover, the user is also allowedto fail one or more tests before being considereda possible attacker. Note that the topmost versionof MosaHIP is immune to both mislabeling andpolysemy.

15.5.8 Issues of Image-BasedHIPs

Despite the security level of image-based tests maybe higher than text-based HIPs, they are not aswidespread as textual CAPTCHAs. This fact is dueto the large image size characterizing image-basedtestswhich implies an increased occupation in termsof screen area andnetwork bandwidth. In Table 15.1,a comparison among different CAPTCHAs with re-spect to the screen area required to display them ispresented [15.1].

In general, text-based HIPs need a relativelysmall display area, which can be quantified as nearlya quarter of the size necessary to visualize an image-based CAPTCHA. Note that, even if the size of theimage containing the text is rather small, the overallspace needed to display the test is considerablybigger, due to the presence of captions, buttons

288 15 Anti-bot Strategies Based on Human Interactive Proofs

Table 15.1 Comparison among different CAPTCHAs,considering both the size of the sole image and the over-all screen area occupied (in pixels) [15.1].

(a) Text-based HIPsName Image area Total area

Google 200 � 70 400 � 200Yahoo 290 � 80 450 � 120Microsoft 218 � 48 350 � 200reCAPTCHA 300 � 57 350 � 200

(b) Image-based HIPsName Image area Total area

ESP-Pix640 � 370 430 � 370

KittenAuth 495 � 375 545 � 530Asirra 356 � 164 450 � 200IMAGINATION 800 � 600 800 � 700ARTiFACIAL 512 � 512 512 � 512EasyPIC 640 � 75 650 � 125MosaHIP 400 � 400 480 � 485

and input boxes. Image-based HIPs may requirea wider area of the screen, as it happens with theIMAGINATION test [15.1].

A further issue characterizing image-based HIPsis the possible inconsistency with the specific topicof aweb site. For example, a test displaying images ofanimals might be inappropriate on web sites of gov-ernment or political institutions, while it would beacceptable on leisure sites.

15.6 Usability and Accessibility

The main issue with both visual and audio HIPs istheir tendency to become always more difficult topass for human beings, as computers improve theirskill at solving them. For example, BaffleText andScatterType possess some of the strongest securityproperties among different text-based CAPTCHAs.However, they are not always quickly and easily solv-able; sometimes, they are extremely hard to pass(e.g., BaffleText). A HIP can become almost unsolv-able by humans as a consequence of the attemptto improve its security level. The new version ofGoogle’s CAPTCHA, e.g., requires a considerableeffort to get solved by a human user than beforeit was proven to be breakable. This is due to the

increased amount of distortion applied to the textstring and the reduction of the distance betweencharacters, which should improve the resistance tosegmentation algorithms. Unfortunately, the diffi-culty to pass it increases also for humans.

It is a well known fact that security is oftenin contrast with usability; this property holds forCAPTCHAs as well. Indeed, the very same protec-tion techniques, which make difficult the automaticidentification of text strings, audio samples or im-ages to computers, also augment the complexity ofthe recognition task for humans [15.1]. This factcharacterizes all categories of current HIPs and isthe main reason behind most of the criticisms ofthe CAPTCHA idea, as it has been pointed out byMatt May, from the World Wide Web Consortium(W3C). In [15.47], the author expresses a negativeopinion about the effective usability of many exist-ing CAPTCHAs, and, in general, on the conceptof CAPTCHA itself. In addition, he stresses thatthe Automatic Turing Test does not allow visuallyimpaired or dyslexic users to correctly performseveral actions, such as creating accounts, writingcomments in web blogs or forums, or makingpurchases on the web. This is motivated by theCAPTCHA’s tendency to discriminate users withdisabilities, since it does not correctly recognizethem as humans. He also sustains that a “number oftechniques are as effective as CAPTCHA, withoutcausing the human interaction step that causesusability and accessibility issues”. In particular, herefers to several HTTP-based techniques, which areeffective in countering general and rather simplebots, such as those of 1st and 2nd generation (seeSect. 15.1). However, they are not sufficient to stopsite-specific automated tools, or those belonging tothe 3rd generation.

A possible solution to the usability and accessi-bility of CAPTCHAs is to give the user the abilityto switch to a different challenge when she does notfeel comfortable with the test currently displayed.In particular, to be considered non-discriminatoryan ATT should involve different sensory abilities ofthe user [15.1]. For example, a test based on a visualrecognition problem should give the user the possi-bility to switch to an audio challenge and vice versa.This way, a visually impaired user is granted the pos-sibility to take the audio test in place of the visualone, whereas people with auditive problems can takethe visual HIP. So, both the categories of users arenot excluded from accessing to the web application.

References 289

Such an approach has been used in many websites which are currently relying on CAPTCHAs toprotect their services from massive automated ac-cess and it has been proven to be an effectivemethodto limit the discriminatory tendency of the Auto-matic Turing Test.

15.7 Conclusion

In this chapter we presented an overview of cur-rently used anti-bot strategies based on HIPs.Despite several practical methods, mostly basedon the HTTP protocol, exist, none of them possessthe same efficacy in countering automated toolswhen compared to HIPs. We described the mainproperties of several text-based and audio-basedtests, discussing their advantages, limits and effi-cacy. Currently, the most effective solution to stopautomated tools is reCAPTCHA, which is activelyused on a large number of Internet sites and ap-plications. A good test should involve differentsensory abilities (e.g. vision and hearing), to avoiddiscrimination of people with disabilities. Indeed,accessibility and usability are themost evident limitsof anti-bot strategies which relies on CAPTCHAsand a successful solution must be able to cope withsuch issues. The most common approach is to useaudio-based CAPTCHAs as an alternative to text-based HIPs, allowing the user to choose which oneshe wants to take.

Since the efficacy of text-based HIPs in fightingweb bots is starting to decrease, we also presentedseveral image-based HIPs. Relying on the stilllarge gap existing between humans and computersin performing vision-related operations, image-based tests offer a possible alternative to text-basedCAPTCHAs. The security of image-based HIPsis undoubtedly higher than in case of commonCAPTCHAs. However there are still some minorissues related to usability and screen occupationwhich are yet to be solved. It is the authors’ opin-ion that, as computers gets faster and improvetheir skills at solving text-based HIPs, image-basedCAPTCHAs may become a viable solution to stopthe improper use of web bots. Further researchis still needed to improve this new generation ofHIPs, as well as to devise more secure and usableaudio-based HIPs. This is indeed a fundamentalrequirement to define a valid and complete anti-botstrategy.

References

15.1. A. Basso, S. Sicco: Preventing massive automatedaccess to web resources, Comput. Secur. 28(3/4),174–188 (2009), doi:10.1016/j.cose.2008.11.002

15.2. M. Koster: Robots in the web: threat or treat?, Con-neXions 9(4), 2–12 (1995)

15.3. A. Basso: Protecting web resources from massiveautomated access, Technical report RT-114/08,Computer Science Departement, University ofTorino (2008), http://www.di.unito.it/basso/papers/captcha-RT-.pdf

15.4. A. Basso, F. Bergadano, I. Coradazzi, P.D. Checco:Lightweight security for internet polls, EGCDMAS(INSTICC Press, 2004) pp. 46–55

15.5. M. Blum, H. S. Baird: First workshop on human in-teractive proofs, Xerox Palo Alto Research Center,CA (2002) http://www.parc.com/istl/groups/did/HIP/

15.6. N.J. Hopper,M. Blum: Secure human identificationprotocols. In: ASIACRYPT, Lecture Notes in Com-puter Science, Vol. 224, ed. by C. Boyd (Springer,London 2001) pp. 56–66

15.7. H.S. Baird, K. Popat: Human interactive proofs anddocument image analisys, IAPR 2002: Workshopon document analisys system, Princeton (2002)

15.8. M. Chew, H.S. Baird: Baffletext: a Human Interac-tive Proof, Proc. SPIE/IS&T Document Recogni-tion and Retrieval X Conference, Vol. 4670, SPIE,Santa Clara (2003)

15.9. L. von Ahn, M. Blum, N.J. Hopper, J. Langford:CAPTCHA: Using hard AI problems for security.In: EUROCRYPT, Lecture Notes in Computer Sci-ence, Vol. 2656, ed. by E. Biham (Springer, Berlin2003) pp. 294–311

15.10. L. von Ahn, M. Blum, J. Langford: Telling hu-mans and computers apart automatically, Com-mun. ACM 47(2), 56–60 (2004)

15.11. A.M. Turing: Computing machinery and intelli-gence, Mind 59(236), 433–460 (1950)

15.12. M. Naor: Verification of a human in the loopor identification via the turing test. unpublishednotes (September 13, 1996), http://www.wisdom.weizmann.ac.il/naor/PAPERS/human.pdf

15.13. R.V. Hall: CAPTCHA as a web security con-trol, available at http://www.richhall.com/captcha/captcha_.htm (last accessed 14 October2009)

15.14. M. Chew, J.D. Tygar: Image recognitionCAPTCHAs, Proc. 7th Int. Information Secu-rity Conference (ISC 2004) (Springer, 2004)pp. 268–279

15.15. A. Coates, H. Baird, R. Fateman: Pessimal print: Areverse turing test, Proc. 6th Intl. Conf. on Doc-ument Analysis and Recognition (Seattle, 2001)pp. 1154–1158

15.16. G. Mori, J. Malik: Recognizing objects in adver-sarial clutter: Breaking a visual CAPTCHA, Proc.

290 15 Anti-bot Strategies Based on Human Interactive Proofs

Conf. Computer Vision and Pattern Recognition,Madison (2003)

15.17. reCAPTCHA: Stop Spam, Read Books: Dept. ofComputer Science, Carnegie Mellon University,http://www.recaptcha.net/ (last accessed 14 Octo-ber 2009)

15.18. L. von Ahn, B. Maurer, C. Mcmillen, D. Abra-ham, M. Blum: reCAPTCHA: Human-basedcharacter recognition via web security mea-sures, Science 321(5895), 1465–1468 (2008),doi:10.1126/science.1160379

15.19. H.S. Baird, M.A. Moll, S.Y. Wang: Scattertype:A legible but hard-to-segment CAPTCHA, IC-DAR ’05: Proc. 8th Int. Conference on Doc-ument Analysis and Recognition (IEEE Com-puter Society, Washington 2005) pp. 935–939,doi:10.1109/ICDAR.2005.205

15.20. H.S. Baird:Complex image recognition andweb se-curity. In: Data Complexity in Pattern Recognition,ed. by M. Basu, T. Kam (Springer, London 2006)

15.21. J. Yan, A.S.E. Ahmad: Breaking visual CAPTCHAswith naive pattern recognition algorithms,Annual Computer Security ApplicationsConference, Vol. 10(14) (2007) pp. 279–291,doi:10.1109/ACSAC.2007.47

15.22. PWNtcha CAPTCHA Decoder: http://sam.zoy.org/pwntcha/ (last accessed 14 October 2009)

15.23. aiCaptcha: Using AI to beat CAPTCHA and postcomment spam: http://www.brains-n-brawn.com/aiCaptcha (last accessed 14 October 2009)

15.24. J. Yan, A.S.E. Ahmad: A low-cost attack ona microsoft CAPTCHA, CCS ’08: Proc. 15thACM conference on Computer and communi-cations security, New York (2008) pp. 543–554,doi:10.1145/1455770.1455839

15.25. Microsoft Live Hotmail under attack by stream-lined anti-CAPTCHA and mass-mailing opera-tions, Websense Security Labs, http://securitylabs.websense.com/content/Blogs/.aspx (last ac-cessed 14 October 2009)

15.26. Googles CAPTCHA busted in recent spammertactics, Websense Securitylabs: http://securitylabs.websense.com/content/Blogs/.aspx (last ac-cessed 14 October 2009)

15.27. Yahoo! CAPTCHA is broken, Network SecurityResearch and AI:http://network-security-research.blogspot.com///yahoo-captcha-is-broken.html(last accessed 14 October 2009)

15.28. K. Chellapilla, P. Simard, M. Czerwinski: Com-puters beat humans at single character recognitionin reading-based human interaction proofs (hips),Proc. 2nd Conference on Email and Anti-Spam(CEAS), Palo Alto (2005)

15.29. K. Chellapilla, P.Y. Simard: Using Machine Learn-ing toBreakVisualHuman InteractionProofs (HIPs)(MIT Press, Cambridge 2005) pp. 265–272

15.30. G. Moy, N. Jones, C. Harkless, R. Potter: Dis-tortion estimation techniques in solving visualCAPTCHAs, Proc. CVPR, Vol. 2 (2004) pp. 23–28

15.31. K. Chellapilla, K. Larson, P.Y. Simard, M. Czerwin-ski: Building segmentation based human-friendlyhuman interaction proofs (hips). In: HIP, Lec-ture Notes in Computer Science, Vol. 3517, ed. byH.S. Baird, D.P. Lopresti (Springer, Berlin 2005)pp. 1–26, doi:10.1007/11427896_1

15.32. S. Bohr, A. Shome, J. Z. Simon: Improving auditoryCAPTCHA security, TR 2008-32, ISR – Institutefor Systems Research (2008), http://hdl.handle.net//

15.33. G. Kochanski, D. Lopresti, C. Shih: A reverse turingtest using speech, Proc. Int. Conferences on Spo-ken Language Processing (Denver, 2002) pp. 1357–1360

15.34. J. Tam, J. Simsa, D. Huggins-Daines, L. von Ahn,M. Blum: Improving audio CAPTCHAs, Proc.4th Symposium on Usability, Privacy and Security(SOUPS ’08), Pittsburgh (2008)

15.35. C. Nancy: Sound oriented captcha, Proc. 1stWork-shop onHuman Interactive Proofs, XeroxPaloAltoResearch Center (2002)

15.36. T.Y. Chan: Using a text-to-speech synthesizer togenerate a reverse turing test, ICTAI ’03: Proc.15th IEEE Int. Conference on Tools with ArtificialIntelligence (IEEE Computer Society, Washington2003) p. 226

15.37. J. Holman, J. Lazar, J.H. Feng, J. D’Arcy: DevelopingusableCAPTCHAs for blindusers,Assets ’07: Proc.9th Int. ACM SIGACCESS conference on Com-puters and accessibility, ACM, New York (2007)pp. 245–246, doi:10.1145/1296843.1296894

15.38. A. Schlaikjer: A dual-use speech CAPTCHA:Aiding visually impaired web users while pro-viding transcriptions of audio streams. Technicalreport cmu-lti-07-014, Carnegie Mellon Uni-versity (2007), http://www.cs.cmu.edu/hazen/publications/CMU-LTI--.pdf

15.39. Breaking Gmail’s Audio CAPTCHA, WintercoreLabs, B.: http://blog.wintercore.com/?p= (last ac-cessed 14 October 2009)

15.40. J. Tam, J. Simsa, S. Hyde, L. von Ahn: Breakingaudio CAPTCHAs with machine learning tech-niques, Neural Information Processing Systems,NIPS 2008, Vancouver (2008)

15.41. M. Minsky: Mind as society. Thinking Allowed:Conversations on the Leading Edge of Knowledgeand Discovery with Dr. Jeffrey Mishlove (1998),http://www.intuition.org/txt/minsky.htm (last ac-cessed 14 October 2009)

15.42. J. Elson, J.R. Douceur, J. Howell, J. Saul: Asirra:a CAPTCHA that exploits interest-aligned man-ual image categorization, CCS ’07: Proc. 14thACM conference on Computer and communica-tions security, ACM,NewYork (2007) pp. 366–374,doi:10.1145/1315245.1315291

The Authors 291

15.43. P. Golle: Machine learning attacks against theAsirra CAPTCHA, CCS ’08: Proc. 15th ACMconference on Computer and communicationssecurity, ACM, New York (2008) pp. 535–542,doi:10.1145/1455770.1455838

15.44. R. Datta, J. Li, J.Z. Wang: Imagination: a robustimage-based CAPTCHA generation system, Proc.13thACM international conference onMultimedia(MULTIMEDIA ’05) (ACM Press, New York 2005)pp. 331–334

15.45. Y. Rui, Z. Liu: Artifacial: automated reverseturing test using facial features, MULTIMEDIA

’03: Proc. 11th ACM Int. Conference on Mul-timedia, ACM, New York (2003) pp. 295–298,doi:10.1145/957013.957075

15.46. A. Basso, M. Miraglia: Avoiding massive auto-mated voting in internet polls. In: STM2007, Elec-tronic Notes in Theoretical Computer Science,Vol. 197(2) (Elsevier, Amsterdam 2008) pp. 149–157, doi:10.1016/j.entcs.2007.12.024

15.47. M. May: Inaccessibility of CAPTCHA: Alterna-tives to visual turing tests on the web, W3CWork-ingGroupNote, http://www.w.org/TR/turingtest/(2005)

The Authors

Alessandro Basso is a postdoctoral researcher at the Computer Science Department, Uni-versity of Torino, where he obtained his PhD. His research interests include web applicationsecurity, digital watermarking of multimedia content andmobile device security. Recently, heturned his attention towards analytical and modeling tools for complex systems to deal withinformation processing and filtering, social networks and recommender systems.

Alessandro BassoComputer Science DepartmentUniversity of Torinoc.so Svizzera 185, 10149Torino, Italyalessandro.basso @di.unito.it

Francesco Bergadano graduated in Computer Science at the University of Torino, and ob-tained a PhD in Computer Science with the Universities of Torino andMilan. He has been anAssociate Professor at the University of Catania and is now Full Professor at the Departmentof Computer Science of the University of Torino. His research activities include ComputerSecurity and Data Analysis.

Francesco BergadanoComputer Science DepartmentUniversity of Torinoc.so Svizzera 185, 10149Torino, [email protected]