GDPR Compliance Assessment for Cross-Border Personal ...

22
Received December 31, 2020, accepted January 16, 2021, date of publication January 20, 2021, date of current version January 28, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3053130 GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps DANNY S. GUAMÁN 1,2 , JOSE M. DEL ALAMO 1 , AND JULIO C. CAIZA 1,2 1 Departamento de Ingeniería de Sistemas Telemáticos, ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain 2 Departamento de Electrónica, Telecomunicaciones y Redes de Información, Escuela Politécnica Nacional, Quito 170517, Ecuador Corresponding author: Jose M. Del Alamo ([email protected]) This work was supported in part by the Comunidad de Madrid through the CLIIP Project under Grant APOYO-JOVENES-QINIM8-72-PKGQ0J, and in part by the Universidad Politécnica de Madrid through the V-PRICIT Research Programme, Apoyo a la Realización de PROYECTOS de I+D Para Jóvenes Investigadores UPM-CAM. The work of Danny S. Guamán and Julio C. Caiza was supported by the Escuela Politécnica Nacional. ABSTRACT The pervasiveness of Android mobile applications and the services they support allow the personal data of individuals to be collected and shared worldwide. However, data protection legislations usually require all participants in a personal data flow to ensure an equivalent level of personal data protection, regardless of location. In particular, the European General Data Protection Regulation constrains cross-border transfers of personal data to non-EU countries and establishes specific requirements to carry them out. This article presents a method to systematically assess compliance of Android mobile apps with the requirements for cross-border transfers established by the European data protection regulation. We have validated the method with one hundred Android apps, finding an outstanding 66% of ambiguous, inconsistent and omitted cross-border transfer disclosures. INDEX TERMS Assurance, Android, application, assessment, compliance, data protection, dynamic analysis, evaluation, GDPR, mobile, privacy, software quality, testing. I. INTRODUCTION The pervasiveness of today’s software systems and services allow the personal data of individuals to be easily collected and shared worldwide, across different countries and juris- dictions [1]. However, cross-border transfers of personal data pose risks to the privacy of individuals, as the organizations receiving personal data may not offer an equivalent level of protection as in their country of residence. For example, in many parts of the world, particularly in Europe, privacy is strenuously protected [2] and assumed as a Human Right [3]. In other regions, such as China, privacy values are often a lesser priority when compared to order and governance [4]. These non-equivalent levels of protection are clearly evi- denced by a recent court resolution in the European Union (EU) [5], which held that the level of data protection in the U.S. is not essentially equivalent to that required under EU data protection law. Thus, transferring personal data across jurisdictions raises data protection compliance concerns. For example, the General Data Protection Regulation (GDPR) [2] The associate editor coordinating the review of this manuscript and approving it for publication was Porfirio Tramontana . constrains personal data transfers outside the European Economic Area (EU) 1 (a.k.a. cross-border transfers or international transfers), and establishes a set of assurance mechanisms to carry them out. Mobile applications (‘‘apps’’) are a particular kind of con- sumer software that exacerbate these data protection com- pliance issues to organizations. The particularities of the app development and distribution ecosystems are major fac- tors underlying these risks [6], [7]. First, apps are distributed worldwide through global app stores, being easily accessible to everyone everywhere. Thus, an app provider can easily reach markets and users beyond its country of residence. Second, once any piece of personal data has been collected by an app, it can be transmitted from the device for pro- cessing across the world [1] or shared between chains of third-party service providers [8], even without the app devel- oper’s knowledge [9]. Thus, app developers and other stake- holders are required to be constantly vigilant to ensure that 1 The EEA includes all the EU Member States plus Norway, Iceland and Linchestein. For the sake of clarity, we will use the term EU from now to refer to all of these countries. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 15961

Transcript of GDPR Compliance Assessment for Cross-Border Personal ...

Received December 31, 2020, accepted January 16, 2021, date of publication January 20, 2021, date of current version January 28, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3053130

GDPR Compliance Assessment for Cross-BorderPersonal Data Transfers in Android AppsDANNY S. GUAMÁN 1,2, JOSE M. DEL ALAMO 1, AND JULIO C. CAIZA 1,21Departamento de Ingeniería de Sistemas Telemáticos, ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain2Departamento de Electrónica, Telecomunicaciones y Redes de Información, Escuela Politécnica Nacional, Quito 170517, Ecuador

Corresponding author: Jose M. Del Alamo ([email protected])

This work was supported in part by the Comunidad de Madrid through the CLIIP Project under GrantAPOYO-JOVENES-QINIM8-72-PKGQ0J, and in part by the Universidad Politécnica de Madrid through the V-PRICIT ResearchProgramme, Apoyo a la Realización de PROYECTOS de I+D Para Jóvenes Investigadores UPM-CAM. The work of Danny S. Guamánand Julio C. Caiza was supported by the Escuela Politécnica Nacional.

ABSTRACT The pervasiveness of Android mobile applications and the services they support allow thepersonal data of individuals to be collected and shared worldwide. However, data protection legislationsusually require all participants in a personal data flow to ensure an equivalent level of personal dataprotection, regardless of location. In particular, the European General Data Protection Regulation constrainscross-border transfers of personal data to non-EU countries and establishes specific requirements to carrythem out. This article presents a method to systematically assess compliance of Android mobile apps withthe requirements for cross-border transfers established by the European data protection regulation. We havevalidated themethodwith one hundredAndroid apps, finding an outstanding 66%of ambiguous, inconsistentand omitted cross-border transfer disclosures.

INDEX TERMS Assurance, Android, application, assessment, compliance, data protection, dynamicanalysis, evaluation, GDPR, mobile, privacy, software quality, testing.

I. INTRODUCTIONThe pervasiveness of today’s software systems and servicesallow the personal data of individuals to be easily collectedand shared worldwide, across different countries and juris-dictions [1]. However, cross-border transfers of personal datapose risks to the privacy of individuals, as the organizationsreceiving personal data may not offer an equivalent levelof protection as in their country of residence. For example,in many parts of the world, particularly in Europe, privacy isstrenuously protected [2] and assumed as a Human Right [3].In other regions, such as China, privacy values are often alesser priority when compared to order and governance [4].These non-equivalent levels of protection are clearly evi-denced by a recent court resolution in the European Union(EU) [5], which held that the level of data protection in theU.S. is not essentially equivalent to that required under EUdata protection law. Thus, transferring personal data acrossjurisdictions raises data protection compliance concerns. Forexample, the General Data Protection Regulation (GDPR) [2]

The associate editor coordinating the review of this manuscript and

approving it for publication was Porfirio Tramontana .

constrains personal data transfers outside the EuropeanEconomic Area (EU)1 (a.k.a. cross-border transfers orinternational transfers), and establishes a set of assurancemechanisms to carry them out.

Mobile applications (‘‘apps’’) are a particular kind of con-sumer software that exacerbate these data protection com-pliance issues to organizations. The particularities of theapp development and distribution ecosystems are major fac-tors underlying these risks [6], [7]. First, apps are distributedworldwide through global app stores, being easily accessibleto everyone everywhere. Thus, an app provider can easilyreach markets and users beyond its country of residence.Second, once any piece of personal data has been collectedby an app, it can be transmitted from the device for pro-cessing across the world [1] or shared between chains ofthird-party service providers [8], even without the app devel-oper’s knowledge [9]. Thus, app developers and other stake-holders are required to be constantly vigilant to ensure that

1The EEA includes all the EU Member States plus Norway, Iceland andLinchestein. For the sake of clarity, we will use the term EU from now torefer to all of these countries.

VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 15961

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

appropriate measures have been taken, there by preventingpotential data protection compliance breaches.

However, testing or auditing mobile applications againstlegal data protection requirements is a non-trivial task. Per-sonal data flows of apps are generally opaque, i.e., the typesof personal data transferred, the recipients, and the recipientcountries are not easily visible, even for developers whencarried out by third-party libraries. According to the EUCybersecurity Agency, there is still a serious gap betweenGDPR legal requirements and translation of these require-ments into practical solutions, and also there is a need fortools to test, verify and audit existing apps, libraries, and ser-vices [7]. Following this direction, significant research effortson data protection compliance assessment in the mobileecosystem have been undertaken by academics [10]–[14] andregulators [6], [15]. Despite the enormous progress achieved,there are still certain gaps that need to be addressed. First,the GDPR defines certain requirements, such as those regard-ing cross-border transfers, for which assessment approacheshave not yet been proposed. Second, most current approachesfocus on assessing the apps’ compliance with their own pri-vacy policies but not with the underlying regulations, thusassuming that the policy already complies with these regu-lations. Unfortunately, this is not often the case, particularlyfor mobile applications [16].

In this context, we aim to advance the fundamental under-standing of data protection practices and GDPR potentialcompliance issues in the Android ecosystem. In particular,we rely on a multidisciplinary approach to better understandthe flows of personal data originating from apps in the EU toworldwide recipients and the implications for GDPR compli-ance. Specifically, we target compliance assessment of bothAndroid apps and their privacy policies with GDPR. To thisend, we check compliance in tandem, i.e. the behavior ofapps against their privacy policy commitments, and these,in turn, against the transparency requirements established inthe GDPR. Thus, we deal with a common drawback high-lighted in a previous work, which states that most complianceassessment approaches focus on assessing that personal dataflows are consistent with its privacy policy, and assume thatthis policy complies with certain regulation [17].

Our two key multidisciplinary contributions are (1) a com-pilation of compliance criteria for assessing cross-bordertransfers in mobile apps, and (2) a method for detectingcross-border transfers in Android apps at runtime and checktheir compliance with their privacy policy. We consider thatour proposal is valuable for app providers, app distribu-tion platforms such as Google Play Store, and supervisoryauthorities to detect potential non-compliance issues withGDPR cross-border transfers. We have validated the methodon a sample of one-hundred popular Android apps andtheir privacy policies. As a result, our third contribution is(3) a corpus of privacy policies annotated with cross-bordertransfer practices. This corpus complements other priorefforts in annotating privacy practices into natural languageprivacy policies [11], [18], which has been released to the

research community.2 Finally, we report the outstandingevidence found on potential GDPR compliance issues inAndroid apps.

The rest of the paper is organized as follows: Section IIpresents the background and analyses the existing relatedwork. The results of a multidisciplinary analysis ofcross-border transfers in mobile apps as for the GDPR andthe compliance criteria for assessing them are presented inSection III. Section IV presents our method for complianceassessment. Section V presents the results and findings of themethod validation with one hundred Android apps. Finally,Section VI concludes the paper.

II. BACKGROUND AND RELATED WORKThis section reviews the background and work related toprivacy analysis in Android apps and highlights the incidenceand gaps of GDPR that motivate our proposal.

A. PRIVACY ANALYSIS IN ANDROID APPSSignificant research efforts on techniques andmethods focus-ing on analyzing the Android apps’ behavior and detectingprivacy-related anomalies have been made across severalresearch communities. The plurality [19] and contestabil-ity [20] of the privacy concept, which is generally approachedfrom a multidisciplinary perspective [21], [22], makes itdifficult to define a closed set of criteria or conditions thatcircumscribe when privacy is preserved or violated in mobileapps. Privacy as confidentiality [23] and privacy as contex-tual integrity [24] are two paradigms mostly used by priorwork to determine when a privacy violation occurs. Priorwork covered by the privacy as confidentiality paradigmrelies on the binary criterion that any exposure of personaldata outside the app (not necessarily external to the mobiledevice) may lead to a privacy violation. To check this crite-rion, these works rely on static analysis techniques carriedout on the apps’ representations or models (e.g., [25]–[28]),on dynamic analysis techniques carried out on the actualimplementation of the apps (e.g., [29]–[31]) or on hybridanalysis techniques that combine static and dynamic analysis(e.g., [32]–[34]). A smaller number of works go a step furtherby collecting and analyzing more contextual information todetermine a potential privacy violation. For example, someof them focus on detecting who is accessing the personaldata (by distinguishing third-party libraries from the appitself [35]–[37], while others focus on discriminating whetheraccess to certain personal data is required for the app’s corefunctionality or another secondary (third-party) task [38]. TheGoogle Play Protect (GPP) approach [39] is also alignedwith this paradigm aiming at detecting potential harmfulbehavior in the Android ecosystem, including the disclosureof personal data off the device via Spyware. All in all, theseworks are useful to alert personal data leaks but fail whencertain flows of personal data are expected or authorized in aparticular context (e.g., when a user consents to it).

2The corpus is available at https://github.com/PrivApp/IT100-Corpus

15962 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

Going a step further, privacy as contextual integrity claimsthat criteria or conditions that circumscribe when privacy ispreserved or violated rather than being absolute require morecontextual information to be considered [24]. This considersfive key elements to define personal data flows: (i) the type ofpersonal data; (ii) the data subject to whom the personal datarefers; (iii) the sender and (iv) the recipients of personal dataacting in a particular capacity or role; and, (v) the transmis-sion principles that constrain a flow (e.g. user consent). Somerecent works rely on the apps’ privacy policies [10], [40]or prevailing standards or regulations [12], [13], [41]–[44]as the source for extracting the appropriate or inappro-priate personal data flows in terms of the aforementionedelements. These are used as ground truth to analyze theiralignment with the personal data flows carried out by apps,and then detect whether a potential privacy violation occurs.Zimmeck et al. [10] propose the Mobile App Privacy System(MAPS), which relies on static code analysis and privacypolicy analysis to check the alignment between the personaldata that are accessed by mobile apps and their collectionpractices disclosed in their privacy policies. A combinationof dynamic code analysis with privacy policy analysis isproposed by Andow et al. [40] to determine the flow-to-policy consistency, but they focus on distinguishing the entity(i.e., first-party vs. third-party) disclosing personal data.Reyes et al. [41] present a method to evaluate whetherAndroid mobile apps comply with the Children’s Online Pri-vacy Protection Act (COPPA), which is a US legislation thatregulates how mobile apps can collect private information ofchildren. Much closer to our work, some research efforts alsoaim at approaches to assess compliance with some GDPRrequirements, as further elaborated in the following section.

B. GDPR ANALYSIS IN ANDROID APPSA recent systematic literature mapping [17] states that there isgrowing interest in assessing GDPR compliance of softwaresystems, evidenced by the 37% of techniques that deal withthem. Compliance with the GDPR is crucial for organiza-tions because a privacy breach can entail significant financialpenalties [45]. This legal framework aims to ensure that alldata processing operations, such as collection or cross-bordertransfers, are lawful, fair and transparent [2], and ultimatelypreserve the privacy of individuals. It, therefore, establishesa set of requirements that organizations, even those basedoutside the EU, should comply with them when offeringservices within the EU.

In the mobile ecosystem, several works also aim to assesscompliance with some GDPR requirements. Ferrara andSpoto [12] focus on the convenient aggregation of the resultsof detecting disclosures of personal data to represent themat different levels of abstraction, i.e. from developer levelto data protection officer level, thus allowing the latter todetect potential GDPR infringements. They relied on staticcode analysis to detect any disclosure of personal informa-tion, although the specific GDPR requirements that are cov-ered are blurred. Jia et al. [13] propose a method to detect

personal data disclosures in network traffic based on asso-ciation mining. Leveraging the fact that the application’snetwork packet payload is represented as a set of key-valuepairs, they used a catalogue of key-value pairs related topersonal data to search for any of them and then detectpersonal data disclosures without the user’s consent. Whileour work focuses on different GDPR requirements, the Jia’swork could be seen as complementary to our work as thedetection method could minimize the false-negative rate.Fan et al. [43] relied on static analysis and privacy pol-icy analysis to carry out an empirical assessment of GDPRcompliance in Android MHealth Apps, focusing on trans-parency, data minimization and confidentiality requirements.In transparency, they checked whether six different prac-tices are informed through privacy policies, but cross-bordertransfer practices are not covered. In data minimization,their efforts focused on collection practices by checkingwhether personal data types collected by apps are disclosedin their privacy policies. Mangset [44] relied on the staticand dynamic analysis to identify personal data disclosuresat rest and in transit, and then check for GDPR compli-ance. The underlying method is largely manual and in-depth,assessing five pharmaceutical and dating apps, focusingon requirements related to transparency, data minimization(collection practices), confidentiality (data at rest in transit),and some user rights (particularly, consent and objectionautomatically individual decision-making). Finally, we con-sider Eskandari et al. [42] as the closest related work. Theypropose PDTLoc, an analysis tool that employs static analysisto detect violations of article 25.1 of the EU Data Protec-tion Directive (European data protection law replaced bythe GDPR). The Directive set requirements for internationaltransfers similar to those laid down in the GDPR. How-ever, this prior work did not consider the recipient type ofpersonal data flows nor the privacy policies as a means ofdisclosing the safeguards that enable cross-border transfers.Therefore, this approach would have incorrectly identifiedpotential compliance issues (false positives), as any trans-fer outside the EU would be presumed to be a regulatoryinfringement.

To fill this gap, we rely on a multidisciplinary approach tobetter understand the flows of personal data originating fromapps in the EU to worldwide recipients and the implicationsfor GDPR compliance. Moving requirements from the legalto the technical domain is challenging mainly because naturallanguage is used to define them. Such textual descriptions areaway from the technical realm, and translate them requiresmitigating ambiguity and subjectivity, and solving the poten-tial conflicts that may arise. In this regard, a comprehensiveanalysis was first undertaken by privacy and data protectionexperts who collectively identified a set of criteria for assess-ing the compliance of mobile apps with GDRP cross-bordertransfers (Section III). Then, based on these criteria, a methodfor detecting cross-border transfers in Android apps at run-time and check their compliance with their privacy policy isproposed (Section IV).

VOLUME 9, 2021 15963

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

FIGURE 1. The GDPR lays down the (a) the criteria for distinguishing cross-border transfers (light-grey boxes), and (b) information to bedisclosed to data subjects, generally through privacy policies, in order to ensure transparency.

III. PERSONAL DATA CROSS-BORDER TRANSFERSAs illustrated in Figure 1(a), there are specific criteria fordetermining what must be considered a cross-border trans-fer, to which the GDPR lays down further constraints orrequirements. These requirements include, in each case thedisclosure of specific and meaningful information to datasubjects, as shown in Figure 1(b).CriterionC1.1: Does the app send personal data to remote

recipients?GDPR defines personal data as ‘‘any informationrelating to an identified or identifiable information’’ [2].Explicitly, the location, contacts, unique device andcustom identifiers, including the International MobileEquipment Identity (IMEI), International Mobile SubscriberIdentity (IMSI), Unique Device Identifier (UDID), andmobile phone number, biometrics, the identity of the phone,phone calls logs, SMS, browsing history, email, and picturesand videos have been determined as personal data accordingto GDPR ()[6].

This work, far from seeking comprehensiveness, focuseson personal data in four specific categories: contact, demo-graphics, identifiers, and location. The first two are charac-terized by being collected directly from the data subject, e.g.through app forms when registering an account, while the lasttwo can be collected directly from the mobile devices. Basedon Zimmeck et al.’s classification [11], we broke down theaforementioned categories into the fine-grained data typeslisted in Table 1.

A mobile device has several identifiers, although we haveincluded the identifiers of high risk for privacy, i.e. thosethat could enable tracking of data subjects across time andacross different apps in Table 1. The scope and persistenceare two dimensions of identifiers, with a number of possiblediscrete points, that suggest the risk involved [46]. The scopedefines which apps can access an identifier: (1) one app only,(2) a group of apps, or (3) all the apps on the device. Thelower the scope, the lower the risk of tracking a data subject

through transactions from different apps. The persistencedefines the lifespan of an identifier and depends mainly onthe mechanism available to reset it: (1) app installation reset,(2) in-app reset, (3) system-setting reset, (4) device factoryreset, and (5) non-reset. Likewise, the longer the persistence,the longer the tracking time. For example, the IMEI (Interna-tional Mobile Equipment Identity) and the AAID (AndroidAdvertising Identifier) are device-level and OS-level identi-fiers, respectively. They, therefore, are consistent across allthe device’s apps and can be used to track a data subjectacross them. The IMEI may also allow tracking a data subjectover time, as it is a non-resettable identifier. On the otherhand, although the AAID is supposed to be a non-persistentidentifier since it can be reset through the system settings,there are two main reasons for also considering it, in prac-tice, as potentially persistent. First, data subjects are proneto preserving default privacy settings [47], especially whensuch settings are hard to discover and understand (the optionto reset the AAID is 6 steps deep) [48]. Second, a recentrelated work [40] found that 74.7% of recipients who receiveAAIDs also receive other persistent identifiers, nullifying thenon-persistence property and also allowing a data subject tobe tracked over time and across resets. In this sense, as shownin Table 1, we considered the identifiers with the greatestscope, i.e. those that are shared by all the apps in a device,and whose removal requires the data subject to necessarilyperform a system-setting or device reset.

For the purposes of this study, except for data types in thegrey cells, any app that processes at least one of the data typeslisted in Table 1 satisfies criterion C1.1.Criterion C1.2: Are EU data subjects being targeted by

app providers? This criterion seeks to determine whetheran app provider is offering a service within the EU. Gen-erally, providers upload their apps to global distributionplatforms, such as Google Play Store, APK Pure, or F-Droid,to make them available to users. Therefore, these

15964 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

TABLE 1. Personal data types in the meaning of GDPR that could betransferred by mobile apps. Personal data types in the grey cellswere not considered in deciding whether criterion C1.1 is met,although the experiments log them when transferred alongwith other personal data types.

distribution platforms play a key role in providing the nec-essary capabilities to set up, inter alia, the countries orregions of targeted users. In the case of Google Play Store,to which this study circumscribes, the publication of anapp involves two main stages: preparation and release.In the preparation stage, the app provider must digitallysign the app with a (self-signed) certificate before uploadingit, including details on the organization such as its nameand locality [49]. In the release stage, the app providershould perform a pre-configuration before uploading theapp and making it available, including the selection ofcountries targeted [50]. This process leads us to fairlyassume that mobile apps available in the Google PlayStore reachable from an EU country are actually targetingEU data subjects (irrespective their nationality), thus satisfy-ing criterion C1.2.

To reinforce the assessment of this criterion, we can relyon the app’s privacy policy and look for statements suggest-ing that an app or service targets EU-resident users [51].For example, when a privacy policy includes referencesto (1) the EU as a whole or one EU member state in relationto the service offered; or, (2) dedicated addresses or phonenumbers to be reached from an EU country.

C1.3: To which countries do the app transfer personaldata? This criterion seeks to determine to which country per-sonal data are sent. Personal data may be disclosed throughmultiple sinks or channels to different recipients. Relevantsinks to this work are those that allow personal data to bedisclosed outside the device. For example, through an SMS(Short Message Service), Bluetooth, NFC (Near Field Com-munication), or network interfaces. Particular focus is givento network interfaces, as previous studies [52] have showntheir high prevalence to communicate externally over otherinterfaces.

In this context, certain metadata of the connectionsbetween the app and the receiving servers, particularly theserver’s IP address and the domain name, can be used togeolocate the country to which personal data are transferred.Based on this information only the first-hop server recipientcan be geolocated, i.e., the destination server directly reach-able via the app connection. Personal data, however, may betransferred from the first-hop server to other servers, althoughtracing it necessarily requires the collaboration of the remoteservers (organizations) involved. The geolocation of recipientservers bifurcates app transfers into (C1.3.1) EU cross-bordertransfers and (C1.3.2) non-EU cross-border transfers.

An EU cross-border transfer (C1.3.1) implies that an app,owned by an EU data subject, transfers personal data torecipients geolocated in another European country. That is,there is a transfer between different countries, but these arelocated within the EU. This type of transfer does not addfurther constraints or requirements, since the GDPR itselfis an effort to unify the different EU country legislationsand thus facilitate the free movement of personal data withinthe EU.3

On the other hand, a non-EU cross-border transfer (C1.3.2)implies that an app, being used by an EU data subject,transfers personal data to recipients located outside the EU.However, this criterion on its own is insufficient to determinewhether an app is carrying out an international transfer in themeaning of GDPR. Rather, based on the non-EU cross-bordertransfers a further criterion needs to be checked (C1.4).C1.4. To which recipients do the app transfer personal

data? This criterion seeks to distinguish whether the serverslocated outside the EU belong to (C1.4.1) the app provider(i.e., first-party recipient or data controller in GDPR terms),or (C1.4.2) another organization (i.e. third-party recipient).First-party recipients are the app providers themselves, whomay be using cloud servers for remote data storage or otherkinds of cloud processing. On the other hand, third-partyrecipients include other organizations delivering a variety ofservices to the apps, such as support for the app operation(e.g., crash management), analytics (e.g. to design optimiza-tion), social network integration or app monetization throughadvertising. These first- and third-party recipients are roughlyaligned with some actors defined in the GDPR, the DataController and Data Processor respectively.

3GDPR Art. 1(3)

VOLUME 9, 2021 15965

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

C1.4.1. Is personal data transferred to a non-EU datacontroller? This criterion seeks to determine whether anapp provider (data controller) is established outside the EU.Transfers of personal data from an app to first-party recipientslocated outside the EU can occur in two possible settings.First, the app provider is indeed based in a non-EU countryand therefore its servers too. Second, the app provider isbased in an EU-country, but its servers are outside the EU.In both cases, the app provider is subject entirely to the GDPRrequirements. However, in the first case, the app providermust also designate in writing a representative in the EUand disclose, generally through its privacy policy, the contactdetails to data subjects4 (S1 in Fig. 1b). A representative is notnecessarily a data forwarding entity but rather an entity thatshould act on behalf of the app provider, addressing mattersof the data subjects and supervisory authorities. The app’sprivacy policy is the main source for extracting the contactdetails of the data controller, including locality.C1.4.2. Is personal data transferred to non-EU

third-party recipients? This criterion seeks to determinewhether a third-party recipient is not established within theEU. If this criterion is met, then we are dealing with aninternational transfer. An international transfer can onlytake place if the non-EU third-party recipients can ensure alevel of protection equivalent to that mandated by the GDPR.In this line, the European Commission (EC) distinguishesthe non-EU countries with an adequacy decision from thoselacking it. Twelve non-EU countries have been recognized sofar by the Commission as providing protection equivalent toGDPR and therefore maintaining an adequacy decision [53].It is worth mentioning that until July 16, 2020, US companiesthat complied with the Privacy Shield legal framework [54]also maintained an adequacy decision. However, this wasinvalidated by the Court of Justice of the EU finding it nolonger provides adequate protection [5].C1.5.1. Is an international transfer covered by an

adequacy decision? This criterion seeks to determinewhether an international transfer is targeted to a third-partyrecipient located in a country covered by an adequacy deci-sion. If so, it can take place without any further safeguards,although to ensure transparency,5 the app provider shouldinform the data subjects about (1) the intention to transferpersonal data to a non-EU country, including (2) the namesof targeted countries, and (3) the existence of an adequacydecision by the Commission [55] (S2 in Fig. 1b).C1.5.2. Is an international transfer not covered by an

adequacy decision? In the absence of an adequacy deci-sion, the app provider, acting as data controller, requiresthe adoption of ‘‘appropriate safeguards’’ before carryingout an international transfer. GDPR defines a set of assur-ance mechanism that enable international transfers in suchcases,6 including Standard Data Protection Clauses, Binding

4GDPR Art. 27(1)5GDPR Art. 13(1)(f) and Art. 14(1)(f)6GDPR Chapter V

Corporate Rules, Approved Codes of Conduct and ApprovedCertification Schemes. These assurance mechanisms shouldbe approved by the European Commission and, in general,allow the app provider to ensure that third-party recipientshave implemented appropriate safeguards to ensure a protec-tion level equivalent to GDPR. So far, the Commission hasadopted four Standard Data Protection Clauses [56], whichshould be incorporated into contracts between the parties con-cerned. Binding Corporate Rules should be established wheninternational transfers take place between companies belong-ing to a corporate group and be signed with the Commission’sapproval. Finally, the Commission has not yet adopted anyCode of Conduct or Certification Scheme for GDPR.

Finally, in order to ensure transparency,7 the app providershould inform the data subjects about (1) the intention totransfer personal data to a non-EU country, including (2) thenames of targeted countries, (3) a reference to the appropriatesafeguard(s) according to the aforementioned options, and(4) the means to obtain a copy of the safeguard(s) [55](S3 in Fig. 1b).Other Exceptions: In the absence of an adequacy

decision or any appropriate safeguards, there are someexceptions8 that allow international transfers in specific situ-ations. We highlight the explicit consent, which requires theconsent through an affirmative action of the data subjects,e.g., ticking a box, to be obtained after providing precisedetails of the international transfers. In this case, the datasubjects should also be able to withdraw consent easily at anytime.

IV. ASSESSING APPS COMPLIANCE WITH GDPRCROSS-BORDER TRANSFERSOn the basis of the criteria defined in Section III, assessingcompliance of an app with cross-border transfers requiresfundamentally two inputs: the app’s personal data flows andthe committed practices in its privacy policy. On the onehand, the former provides information on the type of personaldata, the type of recipient who receives the personal data(i.e., first-party or third-party recipient); and the countryin which the recipient servers are located. Based on thisinformation, the key idea is to determine which type ofcross-border transfer is performed by a target app, if any,according to the criteria summarized in Figure 1a. On theother hand, it is necessary to interpret all pertinent privacypolicy practices to determine whether detected cross-bordertransfers have been properly disclosed, according to trans-parency elements summarized in Figure 1b. It is worth point-ing out that we refer to a privacy policy in the sense ofGDPR, i.e., a document that contains the committed privacypractices of an app provider, including collection, sharing andinternational transfer practices.

Figure 2 illustrates the three underlying process that sup-ports the overall method to analyze apps and their privacy

7GDPR Art. 13(1)(f) and Art. 14-(1)(f)8GDPR Art. 49

15966 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

FIGURE 2. Overall method to assess app compliance with GDPR cross-border transfers.

FIGURE 3. The overall process to extract cross-border personal data flows from apps.

policies and then detect potential non-compliance issues withGDPR cross-border transfers. Each of these processes isexplained in detail in the following subsections: the extrac-tion of the app’s personal data flows that occur through-out the network in Section A, the systematic extraction ofcross-border transfers statements disclosed through the app’sprivacy policy in Section B, and, finally, the compliancechecking between observed personal data flows and policystatements in Section C.

A. EXTRACTION OF PERSONAL DATA FLOWSFROM THE APPWe define an app’s personal data flow in terms of (i) thetype of personal data, as per Table 1; (ii) the type of recipientwho receives the personal data (i.e., first-party or third-party

recipient); and, (iii) the country in which the recipient serversare located. We rely on dynamic analysis to observe theapp behavior and extract its personal data flows. Personal dataflows can also be inferred from the app’s representations ormodels by using static analysis [25]–[28]. However, we favordynamic analysis to prioritize soundness over completeness,as our goal is to extract actual evidence of cross-border trans-fers carried out by an app. Furthermore, we rely on app net-work interfaces as sources of behavior. Previous studies [52]have shown the prevalent usage of network interfaces overSMSs or short-range interfaces such as Bluetooth or NFCby mobile apps to communicate externally. Thus, it is fairto assume that most cross-border transfers occur naturallythrough the network. Figure 3 sets out the overall data flowextraction process and its phases, which are detailed below.

VOLUME 9, 2021 15967

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

1) STIMULATIONThis supports both automated and manual stimulation. Auto-mated stimulation is based on a random strategy providedby the U/I Exerciser Monkey [57], which provides betterperformance in terms of code coverage compared to otherapproaches [58]. As the Monkey can miss some event thatonly humans can trigger, e.g., apps that require a login, man-ual stimulation by analysts or testers is also supported. Whilethis approach allows for uncovering app functionality similarto what a real user can do, it is not scalable and is feasibleonly with a small number of apps.

2) CONFIGURATIONThis component allows the configuration and automateddeployment of the entire experimental environment. Basedon the Google Play API [59], it automatically crawls anddownloads the target mobile app (APK) from Google PlayStore. Once downloaded, it retrieves the metadata from theAPK digital certificate and stores it in a centralized logrepository. It also repackages the APK before installing it onthe mobile device in order to bypass basic TLS protection inthe network traffic, as explained below. Finally, it deploys thenecessary components according to the analyst’s configura-tion, for example, capturing traffic in different stages, such asan idle stage (i.e. without any user interaction) and an activestage (i.e. with user or monkey interaction).

3) INTERCEPTIONThis component is responsible for capturing the app networktraffic and storing it for further analysis. Three key featuresof the state-of-the-practice of mobile apps were consideredto build this component. First, a vast majority of Androidapps have adopted TLS (Transport Layer Security) as the de-facto protocol for internet communication [60]. Second, someapps, and in particular integrated third-party libraries [60],implement further protections such as certificate pinning thusrestricting trusted Certificate Authorities (CA) to a smallknown set [61]. Third, several apps and OS services can runconcurrently and generate network traffic, it being necessaryto distinguish the traffic belonging to the target app from theothers.

Traffic capture from the device’s network interface is builtaround a man-in-the-middle (MITM) proxy [62]. It requires aMITM proxy self-signed CA certificate to be installed in theroot certificate store of the mobile device in order to becomea trusted CA. Thus, instead of establishing a direct TLS con-nection between apps and remote servers, the MITM proxywill set up two different TLS connections: app-proxy andproxy-remote server. After configuring the mobile device toconnect to the Internet through theMITM proxy, it is possibleto capture HTTP and HTTPS traffic from the apps. Note thatthis approach requires the configuration of the mobile deviceonce rather than the configuration of individual applications.Therefore, once the terminal is configured the traffic of anyapplication can be captured.

We also implemented a MITM proxy plugin consistingof two main classes: a TLS failure detector and flow-to-app mapping. The TLS failure detector captures the excep-tions produced by the TLS layer and stores appropriatelogs when a connection cannot be established. For example,an app detecting an untrusted CA due to a certificate pinningimplementation will abort the proxy connection and generatea TLS exception, which will be logged including the targetedIP address and domain. The flow-to-app mapping allows usto filter in the traffic belonging to the target app since weanalyze each app independently. It takes advantage of ADB(Android Debug Bridge) [63] and netstat [64] to retrieve theports opened by apps running on the mobile device. Basedon this information, only the traffic and connection metadatabelonging to the target app is logged.

As already mentioned, several apps and their integratedthird-party libraries implement certificate pinning as fur-ther protection layer against TLS interception. These appstrust in specific CA certificates instead of the CA certifi-cates available in the device. Consequently, they will abortconnections using the MITM proxy’s self-signed CA cer-tificate and prevent certain data flows from being inter-cepted. CA certificates can be hard-coded either (i) inthe network_security_config file of an app dis-tribution package (APK), or (ii) in the app code itself.In the first approach, an app can declare the CA certifi-cates they trust in the network_security_config file,which is located in the APK xml folder. For example,the entries <domain>example.com</domain> and<certificates src="@raw/ pinned_ca"/> pinthe CA certificate pinned_ca to connect to the example.com domain. In the second approach, the CA certificates arehard-coded in the app code itself using the standard AndroidAPI, such as Trust Manager, or third-party specializedlibraries, such as OkHttp3 and TrustKit. Actually, thereare a significant number of third-party libraries for this pur-pose, therefore there are many different ways of implement-ing it (we have identified 23 so far).

We adopted two approaches to bypass these protec-tions. First, to bypass the Manifest-based certificate pin-ning, we modify the app’s network_security_configfile to trust the system’s CA certificates, including theMITM proxy certificate we installed. More specifically,the network_security_config file was extractedfrom the APK using apktool [65]. By crawling this file,the entries that pin the specific CA certificates are replacedby the entry<certificates src="system"/>, thusthe app will trust the system CA certificates. The APKis finally repackaged and installed in the mobile device.Second, to bypass the library-based certificate pinning,we rely on Frida [66] to change the behavior of the tar-get app at runtime when verifying the CA certificates ittrusts. That is, the API library methods that verify thetrusted CA certificates for a connection were hooked, andthen convenient code was injected to bypass the verifi-cation. For example, an app using the OkHttp3 library

15968 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

calls the CertificatePinner.check API method toverify trusted CA certificates. If the connection’s CA cer-tificate does not match with any of the trusted certifi-cates, then it throws an exception, and the connection isaborted. To prevent this, we command Frida to capture theCertificatePinner.check method and then injectour code. In fact, our code overloads the check method toprint a black hole message instead of throwing an exception.In this way, the connection between the app and the MITMproxy is successfully established. So far, we have imple-mented the check bypass for 23 different certificate pinninglibraries.

4) ANALYSISBased on the stored app’s traffic packets and metadata,the main purpose of this component is to analyze them andthen determine (i) the type of personal data transferred byan app, (ii) the country where the personal data recipientis located, and (iii) the type of recipient who receives thepersonal data, i.e. first-party or third-party recipient.

a: PERSONAL DATA TYPEWe use string searching in packet payloads to detect personaldata types, as detailed in Table 1. A key assumption foradopting string searching is that these personal data typesare well-known in advance and will remain unchanged in ourexperimental environment. That is, contact details, identifiersand location information are retrieved in advance from themobile device and fed into the algorithm that searches forthese data in the traffic packets of each app. As app devel-opers may deliberately obfuscate this personal data usingdifferent encoding and hashing mechanisms to evade detec-tion [41], our algorithm also searches for data encoded inBase64,MD5, SHA1 and SHA256. To validate this approach,we wrote a ground-truth test Android app9 that discloses thepersonal data listed in Table 1 to a remote server. We ran thisapp through the pipeline shown in Figure 3 and were able torecover all the types of personal data.

b: RECIPIENT LOCATIONWe relied on ipstack API [67] to determine the location of theservers receiving the app’s connections. In this way, the first-hop server recipient, i.e. the destination server directly reach-able via the app connection, can be geolocated. Admittedly,further connections can be established from the first-hopserver to other servers. However, tracing them would neces-sarily require the collaboration of remote servers (organiza-tions) involved, which is beyond the scope of this work. Thiscomponent ultimately tags each personal data flow whoserecipient server is outside the EU.

c: RECIPIENT TYPEThe objective is twofold. First, to distinguish whether thetarget domain of a connection is a first-party recipient (i.e. the

9It can be found at https://github.com/PrivApp/DeviceData

app provider itself) or a third-party recipient (i.e. third-partyservice provider or other company). Second, if it is a third-party recipient, we determine its owner organization, servicecategory and headquarter country. To this end, a two-iterationprocess was carried out.

In the first iteration, we try to match the owner of theapplication with the owner of the target domain of the net-work connections. To this end, we generate two bags oftokens: one with tokens representing the app and anotherwith tokens representing the target domain. A token match-ing is then made between the two bags, classifying thedomain as a first-party recipient if there is at least one tokenmatch. More specifically, the app’s token bag is built uponthe reverse domain name used to name an APK, which gen-erally contains the name of the app provider. For example,com.netflix.Speedtest consists of the app’s name(Speedtest), the second-level domain (SLD) that commonlyrefers to the organization (netflix), and the Top-Level Domain(TLD) that is global and may be shared between multipleAPKs (com). The SLD and subdomains were included inthe app’s token bag but TLDs were excluded, as they arenot suitable for distinguishing between different domains.We also noted that a small number of APK names do notinclude the organization name, e.g, com.mt.mtxx.mtxxis owned by the Chinese organization Meitu. Therefore,we reinforced our approach by also considering the nameof the app’s controlling organization extracted from thedigital certificate used to sign the app. In the case ofcom.mt.mtxx.mtxx, the Meitu token is certainly con-tained in its digital certificate. Lastly, the domain’s tokenbag consists of the SLD and individual subdomains of thetarget domain. Each flow of personal data then is taggedas a first-party recipient if there is at least one token inboth app and domain bags. In order to validate it, we havemanually checked the domains targeted by 100 apps that weretagged as first-party recipients, and 98.6% were correctlyclassified. A small number of domains were misclassifiedbecause certain APK names and a target domain use a samegeneric token, e.g. ‘‘app.’’

In the second iteration, domains not classified as first-partyrecipients are initially assumed to be third-party recipients.These domains, particularly SLD, are used as proxies to iden-tify their owner organizations, based on Libert’s dataset [68].This allows individual target domains to be mapped to theowner company and even to parent companies, including thecountry in which the headquarters are located and servicecategory. We have used Libert’s dataset as it was createdin the specific context of disclosing personal data to thirdparties, albeit focusing on the web [69]. Later, it was alsoextended [70] and used [71] in the context of mobile apps.However, while this dataset greatly served as a base andstarting point, the last update was in 2018 and thereforeseveral domains of potential third-party recipients (targetedby more than one app) were not found in the dataset. We thenmanually populated it based on the Crunchbase [72] andOpenCorporates [73] databases, which provide, inter alia,

VOLUME 9, 2021 15969

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

FIGURE 4. Overall annotation process to extract the statements on cross-border transfers from the app’sprivacy policies.

information on the headquarter country, category of serviceoffered, and information on company acquisitions that give aninsight into the organizations’ structures. Over 50 new entrieswere added and committed to a public repository maintainedby a research community.10

Finally, after these two iterations, certain domains couldbe classified as neither first-party nor third-party recipientsand therefore are tagged as unknown. In this case, manualinspection is required. For instance, in a traffic dataset gen-erated by 100 apps, 366 unique domains have been targeted,of which 8 SLDs are actually first-party recipients but couldnot be properly classified since they do not use their canonicalorganization names. For example, the domain nflxvideo.netuses nflxvideo instead of netflix. We have manuallycured these tags, as the number is quite handy. However,in order to automate it and improve precision, the digitalcertificates of the targeted domains (if available) can beleveraged in a similar way as explained above. We leavethis for future work. On the other hand, 23 domains looklike they are anonymously registered and could not be foundeither using automated tools such as whois or searching incatalogues such as Crunchbase. For example, the domainsstartappservice.com and smartechmetrics.com are targetedby several apps but we were unable to determine the ownerand other related information. These domains therefore werelabeled as ‘‘unknown’’ in our dataset and were not consideredfor further analysis.

B. EXTRACTION OF CROSS-BORDER TRANSFERSTATEMENTS FROM PRIVACY POLICIESDespite efforts to represent the committed privacy practicesby using machine-readable means (e.g., P3P-Platform forPrivacy Preferences [74]), natural language privacy policiesseem to be the de facto channel of communicating privacypractices of a digital or physical service to different stake-holders [75]. Particularly in mobile apps, following requestsfrom certain European [76] and the US [77] working groups,major app distribution platforms provide functionality thatallows app developers to link their privacy policies. GDPRhas also relied on privacy policies to disclose privacy prac-tices to data subjects and other stakeholders [55].

10The entire Domain Owner dataset is available at https://github.com/PrivApp/webXray_Domain_Owner_List.

Therefore, themain purpose of this task is to systematicallyextract the cross-border transfer statements disclosed in theapps’ privacy policies. To this end, we followed the guidelinesof the codification or annotation method [78], i.e. an inter-pretative and qualitative approach in which one or multipledomain analysts assign a label or code to data. While thisprocess has been completely manual, the systematic processfollowed has allowed the generation of a corpus with struc-tured annotations of cross-border transfers.

This corpus has been released in a public repository,11 thuscontributing other prior efforts in annotating privacy prac-tices in natural language privacy policies [11], [18] and alsoenabling classification models to be trained and to automatethe annotation process.

Figure 4 illustrates the overall process to determine thecross-border transfer statements declared in an app’s privacypolicies.

1) ANNOTATION SCHEME DEFINITIONDomain experts, including privacy experts and legal scholars,were involved in defining the cross-border transfer annotationscheme. Taking the GDPR [2] as reference together with thetransparency guidelines issued by the EDPB [55], we definedan annotation scheme consisting of three levels: cross-bordertransfer type, transparency elements to be disclosed in eachcase, and potential values. While details have been presentedin Section III, Table 2 outlines the annotation scheme usedto capture the transparency elements disclosed by the app’sprivacy policies.

As shown in Table 2, a valid option for each individualtransparency element is ‘Not disclosed’, allowing annotatorsto express its absence. This is because, after carrying out apilot annotation on a subset of privacy policies, we found thatsome transparency elements are ambivalent or just missing.For example, the privacy policy of the app org.mozilla.firefoxstates that ‘‘This means that your information might end up onone of those computers in another country and that countrymay have a different level of data protection regulation thanyours’’. This declares, somehow, the transfer intention butlacks other transparency elements.

Another key task carried out by domain experts wasthe definition of some simplifying assumptions about

11The corpus is available at https://github.com/PrivApp/IT100-Corpus

15970 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

TABLE 2. Cross-border transfer annotation scheme. (∗) Explicit consent is not a safeguard in itself but an exception that enables a cross-border transfer.

cross-border transfer statements to resolve the ambiguitiesof natural language. Discrepancies in interpreting privacypolicies are very common [79] since, inter alia, details,structure and language can vary from one app to another.Thus, through an iterative process, a subset of privacy poli-cies was annotated in order to identify alternative statementsreferring to certain transparency elements or to give thema more precise meaning. For example, to refer to StandardData Protection Clauses the privacy policies use alternativestatements such as ‘‘contracts approved by the EuropeanCommission’’, ‘‘data protection model contracts’’ or ‘‘dataprocessing addendum.’’ These simplifying assumptions wereprovided as tooltips during the annotation process.

2) TOOL-BASED ANNOTATIONAweb-based tool12 was developed to support thewhole anno-tation process, seeking to simplify the process and preventmistakes. As detailed in Figure 5, it provides functionalityto set up different privacy practices in a three-level structure(i.e., practice, transparency elements, and potential values),annotate the privacy practices at the segment and sentencelevel, calculate the inter-coder agreement, and display incon-sistencies between coders.

Each privacy policy was broken down into segments(roughly paragraphs). They are presented sequentially tocoders who annotate them by selecting a value for eachtransparency element. In addition, the coder can select a text

12http://anotacion.herramientapoliticasprivacidad.com/home

span (e.g. a sentence) for finer-grained annotations. Once apolicy has been coded, the tool outputs segment and sentencelevel annotations represented in YAML format, a human- andmachine-readable format.

3) INTER-ANNOTATOR RELIABILITY CHECKINGIn order to control the inter-annotator reliability during theannotation process, we randomly selected a subset of privacypolicies to annotate using another coder. Then, we rely onKrippendorff’s α coefficient for evaluating the reliabilityof the annotations, which indicates that agreement betweenboth coders is ‘‘good’’ (α > 0.8), ‘‘tolerable’’ (0.67 - 0.8),or ‘‘suspected’’ (α < 0.67) [80]. We analyzed disagree-ments between the two coders in order to (1) amend theannotations performed once an agreement has been reached,and (2) identify potential systematic disagreements, such asmisunderstanding of the annotation criteria, and then refinethem. For example, in a set of 100 annotated privacy poli-cies, we evaluated the inter-coder reliability over 15% ofpolicies, as other related studies [10]. Transfer intention,appropriate safeguards (particularly, adequacy decision andstandard data protection clauses), copy means, and targetcountry reached a ‘‘good’’ agreement. The agreement on Notdisclosed appropriate safeguard is ‘‘tolerable’’ (0.67 -0.8),while Explicit Consent and Binding Corporate Rules fallinto ‘‘suspected’’ annotations. We analyzed disagreementsand identified non-systematic disagreements since one ofthe coders overlooked the disclosure of certain cross-border

VOLUME 9, 2021 15971

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

FIGURE 5. Web-based tool to annotate privacy practices in privacy policies.

transfer statements. More significantly, we identified system-atic disagreements due to misunderstandings, e.g. in codingexplicit consent. Tacit consent, i.e. the mere use of service isassumed as a user consenting to an international transfer, wascoded as explicit consent by one coder, which is not right.Coders mutually harmonized their criteria and curated theannotations properly. The same procedure was followed forall other disagreements.13

C. COMPLIANCE CHECKINGThe final process aims to check whether the apps performingcross-border transfers properly disclose, through their privacypolicies, the transparency elements according to Table 2.To this end, we consider four consistency types betweenthe app’s personal data flows and privacy policy statements,as illustrated below.

1) FULL CROSS-BORDER TRANSFER DISCLOSUREIt implies that a privacy policy discloses all transparencyelements according to the type of cross-border transfer actu-ally carried out by an app. For illustrative purposes, considerthe ‘‘Battle Bay’’com.rovio.battlebay app, owned byRovio Mobile Ltd. It has been installed +10,000,000times from the Google Play Store. For analytics purpose,it transfers the AAID (Android Advertisement Identifier)to, inter alia, outcome-ssp.supersonicads.com,whose servers are located in the United States (US).The domain supersonicads.com is owned by thethird-party recipient IronSource Ltd, which is based inthe US.

The privacy policy of the app discloses the statementshown in Figure 6, which fully disclosing their cross-bordertransparency elements. That is, the transfer intention (green);target country (yellow); appropriate safeguard - Standard

13Details can be found in the replication package available athttp://dx.doi.org/10.17632/ws6cx9p65d

FIGURE 6. International transfer statement of thecom.rovio.battlebay app.

Data Protection Clause and Adequacy decision (blue); andthe means to get a copy of the safeguards (grey). Therefore,in this specific case, we classify this app as a full cross-bordertransfer disclosure.

2) AMBIGUOUS CROSS-BORDER TRANSFER DISCLOSUREIt implies that a privacy policy discloses only a subset ofthe transparency elements required by the GDPR accord-ing to the type of cross-border transfer carried out bythe app. The missing transparency elements are either dis-closed in an ambivalent manner or not disclosed at all.For example, consider the ‘‘ASKfm – Anonymous Ques-tions’’ (com.askfm) app, owned by Ask.fm, that hasbeen installed +50,000,000 times from the Google PlayStore. For analytics purposes, it transfers the AAID to, interalia, startup.mobile.yandex.net, whose serversare located in Russia. This domain is owned by thethird-party recipient Yandex LLC that is based in Russia.

The privacy policy of the app discloses the statementshown in Figure 7, which states the intention to transferpersonal data (green) and the target countries (yellow). How-ever, target countries are stated in an ambivalent manner‘‘various countries around the world’’. Besides, appropriatesafeguards and themeans to get a copy of such safeguards are

15972 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

FIGURE 7. International transfer statement of the com.askfm app.

missing. It should be pointed out that the AAID is transferredtostartup.mobile.yandex.net before the user inter-acts with the app, nullifying any attempt to underpin thetransfer by explicit consent, as explained in Section III.

3) INCONSISTENT CROSS-BORDER TRANSFER DISCLOSUREIt implies that a privacy policy discloses statements thatcontradict the cross-border transfers actually carried out byan app. For illustrative purposes, consider the ‘‘Mondly-Learn 33 Languages Free’’ (com.atistudios.mondly.languages) app, owned by ATi Studios, that has beeninstalled +10,000,000 times from the Google Play Store. Foranalytics and advertisement purposes, it transfers theAAID toapp.adjust.com and onesignal.com, whose serversare located in the US. These domains are owned by thethird-party recipients Adjust and OneSignal, Inc.,respectively, both based in the US.The privacy policy of the app discloses the statement

shown in Figure 8. It discloses the intention to transferpersonal data (green) to countries covered by an adequacydecision (blue). Although the target countries have not beenexplicitly disclosed, a conservative assumption leads us toconsider the twelve countries that maintain an adequacy deci-sion [53] as possible targets (yellow). However, the US doesnot fall into this group of countries. Therefore, in this specificcase, we classify this app as inconsistent cross-border trans-fer disclosure since an international transfer is actually madeto the US.

FIGURE 8. International transfer statement of thecom.atistudios.mondly.languages app.

4) OMITTED CROSS-BORDER TRANSFER DISCLOSUREIt implies that a privacy policy does not disclose any trans-parency element when an app performs a cross-border trans-fer. For example, consider the ‘‘Camera360-Photo Editor’’(vStudio.Android.Camera360) app that has beeninstalled +100,000,000 times from the Google Play Store.It transfers the MAC Address to t.growingio.com,whose servers are located in China. The domaingrowingio.com is owned by the third-party recipient

Alibaba, which is based in China. Also, it transfers theAAID and the GPS Location to i.camera360.com, afirst-party recipient located in China. After searching for apolicy statement describing at least the intention to make across-border transfer, it was not found at all. We, therefore,classify this app as omitted cross-border transfer disclosure.

V. VALIDATIONThe main motivation for the method was to analyze whetherAndroid app providers properly disclose the transparencyelements required by GDPR when apps make cross-bordertransfers. In this section, we present the results of analyzing100 apps from the Google Play Store and identify the extentto which these apps are potentially non-compliant, as a wayof validating the applicability of the method.

A. EXPERIMENTAL ENVIRONMENT SET-UPWe developed a controlled experiment by using the assess-ment method presented in Section IV. Details on the avail-ability of underlying resources can be found in the repli-cation package.14 Both the apps and their privacy policieswere downloaded in March 2020 and tested between 7 and14 April 2020. The apps were implemented on a mobiledevice Xiaomi Redmi 7a with Android 9 (API 28). As illus-trated in Figure 9, each app was run for 10 minutes consid-ering two phases: idle stage (i.e., without user interaction)and active stage (i.e., with user interaction). Since apps maytransfer personal data immediately after they are launched,traffic is captured for 2 minutes without any user interaction(idle stage). Then, a tester interacts with the application for8 minutes (active stage).

FIGURE 9. Testing timeline.

During the active stage, one of the paper authors acted asa tester by interacting with each of the targeted apps. He fol-lowed the overall operational testing plan shown in Table 3,seeking to navigate through the main functionalities as anactual user would. Although a detailed UI navigation spec-ification is not feasible since each app provides differentfunctionality, the tester read the app description taken fromthe Google Play Store before starting to navigate through it,to get an understanding of the app’s core functionality

1) APPS SELECTIONThe dataset consists of 100 popular Android mobile appsdownloaded from the Google Play Store in March 2020from Spain. Three main criteria guided the app selection:

14The replication package is available at http://dx.doi.org/10.17632/ws6cx9p65d

VOLUME 9, 2021 15973

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

TABLE 3. Overall operational testing plan.

(i) top-free apps in an EU country; (ii) apps included in theAPP-350 corpus [10], and; (iii) apps offering diversity in thegeographical location of app providers. The first criterionis transversal to the two other criteria and is intended tobring us closer to the most popular apps in an EU country.To this end, we rely on the categorization of top-free appsin Spain made by Google Play Store in terms of the numberof downloads. The second criterion sought to contribute priorefforts in annotating privacy practices in privacy policies, par-ticularly the APP-350 corpus [10], by extending it with cross-border transfer annotations. To this end, we selected a subsetof 44 apps coming from that corpus that also fall into thetop-free app category. The third criterion aims at ensuring thetop-free apps selected represent providers offering serviceswithin the EU but established across the world, thus enablingus to understand how cross-border transfers are addressedin the wild. To this end, as other related works [1], [8],we relied on the Country field in the digital certificate usedto sign the app to obtain the country organization. As a result,a set of 56 additional apps from 25 different countries wereincluded, most of them from the US, China, Spain, Franceand Australia.15

2) PRIVACY POLICY CORPUSWe annotated the set of 100 privacy policies by fol-lowing the process described in Section IV-B. Each pri-vacy policy was entirely read, irrespective of whethercross-border transfers were detected during the app testingtime. Each policy segment was assigned a label with the trans-parency elements disclosed. Within the segment, a text span(i.e., sentence) was assigned also an appropriate label,to make finer-grained annotations available. The corpus16

consists of a total of 3,715 segments. Of these, 315 contain

15Details on selected apps can be found in the metadata_ds.csv availableat http://dx.doi.org/10.17632/ws6cx9p65d

16The corpus is available at https://github.com/PrivApp/IT100-Corpus

transparency elements of cross-border transfers distributed asshown in Table 4. Some transparency elements are redundantwithin a privacy policy. For example, the cross-border transferintention is disclosed in 127 different segments from only57 apps’ privacy policies.

TABLE 4. Number of apps and statements declaring the transparencyelements of cross-border transfer statement.

3) APP’S DATA FLOW DATASETA total of 29,910 flows from 100 apps have been logged.Each log includes the app name, app version, capture stage(idle or active), target port, target domain, target country, andpersonal data type disclosed (if any). The dataset contains6,281 flows (21%) disclosing personal data to 195 uniquefully-qualified domain names (FQDNs) with 117 uniquesecond-level domains (SLDs). These SLDs are hosted onservers located in 13 different countries, 6 EU countries and7 non-EU countries hosting 19 and 106 domains respectively.A few domains (8) are hosted on servers located in both EUand non-EU countries.17

B. COMPLIANCE RESULTSFigure 10 details the number of apps that fulfil the criterialeading to cross-border transfers discussed in Section III.A subset of 85 apps transferred personal data during the test-ing time. From them, 10 apps transferred personal data solelyto other EU countries. While these are also a type of cross-border transfer, they do not imply further constraints in themeaning of GDPR. The remaining 75 apps transfer personaldata outside the EU. This subset of apps has been brokendown into three groups that imply different transparencyrequirements: 27 apps transferring personal data to non-EUbased data controllers (Section 1), 59 apps transferring per-sonal data to third-party recipients covered by an adequacydecision (Section 2), and 8 apps transferring personal data tothird-party recipients not covered by an adequacy decision(Section 3).

Note that some apps have performed more than onecross-border transfer type. For example, out of 75 appsthat transferred personal data to a non-EU country

17Details can be found in the replication package available athttp://dx.doi.org/10.17632/ws6cx9p65d

15974 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

FIGURE 10. Number of apps performing cross-border transfers. Each app transfer has been classified as full cross-border transferdisclosures (FD), ambiguous cross-border transfer disclosures (AD), inconsistent cross-border transfer disclosures (ID) or omittedcross-border disclosures (OD).

(C1.3.2 in Figure 10), a subset of 16 apps transferred personaldata to both non-EU third-party recipients and non-EU first-party recipients. This is why the sum up of both subsets ofapps (64 and 27) exceeds its input in 16. Details of these appscan be found in Annex A.

1) NON-EU DATA CONTROLLER CROSS-BORDERTRANSFERSWe found 27 apps transferring personal data to first-partydomains hosted in servers outside the EU; details are pro-vided in Table 5 of Appendix section. We used the IssuerCountry field of the apps’ digital certificates to identify theorganizations’ location [49], and 22 are in fact from outsidethe EU. We were unable to identify the location behind thedigital certificates for the remaining 5 apps since they aremissing. The privacy policy of one of these apps (Duolingo)does indicate that it is an organization established in the USA,but the other 4 do not provide any information.

Only 11 of the 27 apps have been classified asfull cross-border transfer disclosures, i.e., their pri-vacy policies explicitly provide contact details of eitherthe data controller or a representative within the EU.For example, Duolingo (com.duolingo) and Reddit(com.reddit.frontpage) explicitly disclose to datasubjects that their organizations are based outside the EU, butthey have defined a data controller and representative withinthe EU, respectively.

The remaining 16 apps have been classified as omittedcross-border transfer disclosures, i.e. their privacy policiesdo not provide information on a data controller or a repre-sentative within the EU. Although admittedly 6 of them domention GDPR as the privacy legislation taken into accountand even state the intention to transfer personal data to theUSA, they do not provide information on a representative ordata controller established within the EU.

2) ADEQUACY DECISION-BASED CROSS-BORDERTRANSFERSWe found 64 apps that transferred personal data to third-partydomains hosted on servers located in the US and Canada.

Although the organization’ locations (based on digital cer-tificates) indicate 18 different countries, third party servicesare predominantly located in the US (63) and only one inCanada. Canada maintains an adequacy decision, thereforeany transfer can be made without any further safeguards [53].On the other hand, apps that transferred personal data tothird-party recipients located in the US could be coveredby an adequacy decision as long they complied with thePrivacy Shield Framework.18 As already mentioned, PrivacyShield was invalidated by the Court of Justice of the EUon July 16, 2020, finding that it does not provide adequateprotection [5]. However, given that this experiment wascarried out in April 2020, Privacy Shield was still valid.In Section B-4 we analyze whether the apps involved haveadopted changes.

As explained in Section III, three transparency elementsshould be disclosed to data subjects when such transfers takeplace: transfer intention, names of targeted countries, and theexistence of an EU adequacy decision.

A smaller subset of 9 apps has been classified as fullcross-border transfer disclosures, i.e. their privacy policiesdisclose the transfer intention, the specific target countryand rely on third-party service providers listed in the Pri-vacy Shield List, thus maintaining an adequacy decision.See Table 6 of the Appendix for details.

A subset of 23 apps has been classified as ambiguouscross-border transfer disclosures.All their privacy policiesdisclose the intention to transfer personal data. The specifictarget countries are disclosed by 20 apps’ privacy policies,while the remaining 3 apps disclose ambiguous statementssuch as ‘‘any country in which we do business.’’ The existenceof an adequacy decision (i.e., compliance with the PrivacyShield Framework) or any other appropriate safeguard ismissing from 20 privacy policies. Three other apps do rely onPrivacy Shield; however, the policies state that the apps them-selves are Privacy Shield compliant, but say nothing about

18Since Privacy Shieldwas invalidated by the Court of Justice of the EU onJuly16, 2020, finding it does not provide adequate protection [5], we analyzein Section B-4 whether the apps involved adopted changes.

VOLUME 9, 2021 15975

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

the third-party service providers to whom they transfer thepersonal data. These were therefore also tagged as ambiguousdisclosure. Finally, it is worth pointing out that more thanhalf of this subset (13) do not mention GDPR statements,suggesting that this privacy legislation is not considered byapp developers. See Table 6 of the Appendix for details.

A considerable number of 27 apps have been classi-fied as omitted cross-border transfer disclosure, i.e.,their privacy policies do not disclose any of the aforemen-tioned transparency elements. In fact, most of these privacypolicies (21) do not even mention GDPR statements, sug-gesting this privacy legislation is not taken into account byapp providers when informing data subject of their rights. SeeTable 6 of the Appendix for details.

The remaining 5 apps rely on other safeguards that arediscussed in the next section.

3) NON-ADEQUACY DECISION-BASED CROSS-BORDERTRANSFERSWe found three apps that transferred personal data to China(1), Malaysia (1), and Russia (1), as well as 5 apps that trans-ferred personal data to the US by using other appropriate safe-guards than adequacy decisions. Apps that made internationaltransfers to China and Malaysia were classified as omittedcross-border transfer disclosure.Neither the transfer inten-tion, the recipient countries, nor the appropriate safeguardswere disclosed by their privacy policies. The app that made aninternational transfer to Russia was classified as inconsistentcross-border transfer disclosure. Although it discloses theintention to make a cross-border transfer, it states only theUS as a target country, but personal data are also transferredto Russia.

Three of the remaining 5 apps that performed internationaltransfers to the US were classified as full cross-bordertransfer disclosures, i.e. their privacy policies disclose thetransfer intention, the specific target country, the appropriatesafeguards (in this case standard data protection clauses) andthe means of obtaining a copy of such safeguards. The othertwo apps rely on standard data protection clauses and bindingcorporate rules, although they are disclosed ambiguously anddo not provide the means to obtain a copy of them. They,therefore, were classified as ambiguous cross-border trans-fer disclosures. The privacy policies of these last two appsalternatively mention that they could rely on the explicit con-sent from the user to make an international transfer. However,the method detected that both apps carried out the transfersduring the idle stage. Therefore, this exception to making thetransfer is nullified. See Table 6 of the Appendix for details.

4) ADEQUACY DECISION-BASED CROSS-BORDERTRANSFERS AFTER THE INVALIDATION OF PRIVACY SHIELDSince the Privacy Shield framework was invalidated by theCourt of Justice of the EU on July 16, 2020 [5], the cross-border transfers to the US that rely on this framework mayface potential compliance issues. In October 2020, we down-loaded the new versions of the 9 apps that were classified as

full cross-border transfer disclosuresand relied on the Pri-vacy Shield as an adequacy decision (Section B-1). We thenre-analyzed the apps and their privacy policies that weresupported in Privacy Shield using the method described inthis article to observe the changes adopted, if any. These9 updated apps still perform transfers to the same third-partyrecipients and countries, except for one recipient that wasnot detected during this new analysis. Their privacy policiesstill maintain Privacy Shield as the adequacy decision thatenables international transfers. In this case, these applicationshave been classified as inconsistent cross-border transferdisclosures.

As a result, a total of 66% of apps have been found asambiguous, inconsistent or omitted cross-border transfer dis-closures. As detailed in Table 7 of the Appendix, we foundthat some apps have performed more than one type of cross-border transfer. Therefore, an app has been classified as fullycompliant only if all individual performed practices havebeen classified as full cross-border transfer disclosures.

C. VALIDITY OF RESULTSThis section discusses the potential threats to the validity ofthe potential non-compliance results presented in Section Balong with the actions we have taken to minimize them.

1) CONSTRUCT VALIDITYSince the overall compliance assessment method deals withGDPR requirements, there is a risk that the study settingwill not reflect the construct under study when moving thoserequirements from the legal to the technical domain. That is,since natural language is used to define legal requirements,translate them into technical ones requires mitigating ambi-guity and subjectivity, and solving the potential conflicts thatmay arise. To address this validity threat, much effort has beenmade to characterize the cross-border transfer of GDPR onthe basis of multidisciplinary work. A comprehensive anal-ysis was undertaken by privacy and data protection expertswho collectively identified a set of criteria for assessing thecompliance of mobile applications with GDRP cross-bordertransfers. These criteria were developed using the GDPR andthe relevant opinions in the field of the mobile ecosystemissued by the EuropeanData Protection Board, which are dulyreferred to in Section III. This multidisciplinary approachalso comprehensively guided the building of the annotationprocess, annotation scheme, and the simplified assumptionsused during the annotation process of cross-border transferpractices in privacy policies.

2) INTERNAL VALIDITYThe internal validity of the non-compliance results dependsprimarily on (1) the reliability of the annotation process ofthe cross-border transfer practices in the privacy policies, and(2) the accuracy in detecting the cross-border transfers per-formed by apps. One the one hand, a threat to (1) is the anno-tator’s bias in deciding whether or not to annotate a policysegment and labeling each cross-border transfer transparency

15976 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

TABLE 5. Cross-border transfers to first-party recipients located outside the EU (Target countries). An app is tagged as full cross-border disclosure (•)when the Data Controller (C) or Representative (R) contact information is explicitly disclosed; otherwise, it is tagged as omitted cross-border transferdisclosure (-). Further details can be found at the replication package http://dx.doi.org/10.17632/ws6cx9p65d

element according to the annotation scheme. The annotationprocess is based on a qualitative and interpretative method,which naturally implies an internal threat that could lead tobiased or erroneous results. We took two main actions tominimize this threat. First, privacy experts and legal scholarswere involved in normalizing the criteria to annotate eachtransparency element. Thus, a detailed annotation procedureand a set of simplifying assumptions were defined for eachtransparency element.19 Second, we validated the reliabilityof annotations by ensuring that a subset of privacy policieswas annotated by two privacy experts. As detailed in sectionIV-B-3, most of the transparency elements annotated (5 of8) ensured a good Krippendorff’s alpha coefficient (>0.8),one had a tolerable agreement (0.74), and two had a suspi-cious agreement (<0.67). A discussion was held on the latterthree in order to align the criteria and amend the annotations,if necessary.20

On the other hand, a threat to (2) is a high rate offalse positives that could lead to misdetection of potentialnon-compliance issues in the analyzed apps. We took onemain action to minimize this thread: a conservative analy-sis is performed prioritizing precision over recall. So, any

19The procedure and simplifying assumptions can be found athttps://github.com/PrivApp/IT100-Corpus

20Details on the annotations and inter-annotator agreement can be foundat https://data.mendeley.com/datasets/ws6cx9p65d/1

cross-border transfer detected during the experiments was anobservation of the real app behavior and no false positiveswere generated. As detailed in Section IV-A, we relied ondynamic analysis techniques in order to capture networkpackets generated by the real execution of apps rather thanother techniques that analyze approximate models that, whileensuring high recall, could generate a high rate of false posi-tives (e.g., static analysis [81]). In the following subsections,we further elaborate on the actions carried out in detecting thepersonal data types, the recipient location and the recipienttype (see Section IV-A-4 for details).Personal data type detection algorithm is based on search-

ing for pre-defined strings. It was validated by analyzing thebehavior of a ground-truth testing app that we developed,being able to detect all predefined personal data types. Yet,app developers may deliberately obfuscate data flows byusing different encoding and hashing algorithms. Althoughwe also considered different encoding and hashing mech-anisms (Base64, MD5, SHA1 and SHA256), customizedencodings can result in false negatives and detect theman in-depth manual analysis could be necessary. The mainthreat in recipient location detection is that further connec-tions can be established from the first-hop server to otherservers geo-located in other countries. However, tracingthem would necessarily require the collaboration of remoteservers (organizations) involved. We, therefore, simplify the

VOLUME 9, 2021 15977

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

TABLE 6. Cross-border transfers to third-party recipients located outside the EU (Target countries) classified as Full (•), Ambiguous ( ), Inconsistent(X) and Omitted (-) cross-border transfer disclosures. Safeguards options are Adequacy Decision (AD), Standard Data Protection Clauses (SDPC), BindingCorporate Rules (BCR), and Explicit Consent (EC). Further details can be found at the replication package http://dx.doi.org/10.17632/ws6cx9p65d

15978 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

TABLE 6. (Continued.) Cross-border transfers to third-party recipients located outside the EU (Target countries) classified as Full (•), Ambiguous ( ),Inconsistent (X) and Omitted (-) cross-border transfer disclosures. Safeguards options are Adequacy Decision (AD), Standard Data Protection Clauses(SDPC), Binding Corporate Rules (BCR), and Explicit Consent (EC). Further details can be found at the replication packagehttp://dx.doi.org/10.17632/ws6cx9p65d

analysis for the entities receiving the personal data on thefirst-hop server, and alert for potential non-compliant cross-border transfers. Afterwards, the organizations themselvesor independent supervisory authorities can perform furtheranalysis to re-check non-compliance issues by requestinginformation from the remote servers involved. Finally, as perrecipient type detection, we manually checked the domainstagged as first-party recipients, and 98.6% were correctlyclassified. A small subset of 23 domains are anonymouslyregistered and could not be classified as a first nor third-partyrecipient. However, these domains were not considered forfurther analysis, thus preventing the detection of false non-compliance issues.

3) EXTERNAL VALIDITYAdmittedly, the personal data flow extraction in our approachis subject to the same limitations faced by all dynamicanalysis techniques in Android apps, so it cannot ensurecompleteness. The use of non-standard encodings mech-anisms [41], unusual TLS certificate pinning implemen-tations [60], and sub-optimal coverage of app executionpaths [82] are some particular open orthogonal challenges toour proposal. Thus, we have to recognize that the results offully compliant apps should not be misleading generalized,as false negatives can be generated. That is, the fact that wehave not observed a cross-border transfer during our testingperiod does not mean that an app will not definitely do so ifits developers, e.g., use customized encoding mechanisms.

All in all, potential false negatives will not change thevalidity of results on non-compliant apps, which is remark-ably high (66%). As mentioned above, the strength of ourproposal is that any non-compliance issue with cross-bordertransfer detected during the experiments was an observation

of the real app behavior and did not generate false positives.Therefore, we consider that our proposal, as well as theresults, are valuable for app providers, app distribution plat-forms such as Google Play Store, and supervisory authoritiesto detect the lower bound of the GDPR cross-border transfernon-compliance issues.

VI. CONCLUSION AND FUTURE WORKIn this work, we presented a method for compliance assess-ment of GDPR cross-border transfers in mobile applications.Based on amultidisciplinary approach, it moves towards trulydata protection compliance assessment by checking first thecross-border transfers made by apps against their privacypolicy commitments and these, in turn, against the trans-parency requirements established in the GDPR. We appliedthe method to systematically analyze the potential compli-ance issues of one hundred popular Android apps, as percross-border transfers. We found evidence showing that 66%of the apps analyzed present ambiguous or inconsistent cross-border transfer disclosures, or just omit them, leading topotential compliance issues. The invalidation of the PrivacyShield Framework in mid-2020 worsened the problem, lead-ing to some cross-border transfers becoming potentially non-compliant.

Our future work points towards the full automation ofthe compliance assessment method to better understandthe dimension of this issue. For that, we are leveragingupon Machine Learning and Natural Language Processingtechniques to automate the extraction of cross-border trans-parency elements from the privacy policies.

APPENDIXSee Table 5, Table 6, and Table 7.

VOLUME 9, 2021 15979

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

TABLE 7. Consolidated compliance results (C) for cross-border transfers considering transfers to non-EU first-party recipients (FP) and non-EU third-partyrecipients (TP). A transfer can be classified as Full (•), Ambiguous (), Inconsistent (X) and Omitted (-) cross-border transfer disclosures. Only if TP and FPare Full disclosures, the app has been classified as compliant (Yes); otherwise, as non-compliant (No). Apps highlighted in grey were re-analyzed afterinvalidation of Privacy Shield becoming inconsistent disclosures. A blank field denotes that that transfer type was not detected.

MOBILE APPS ANALYZED

ACKNOWLEDGMENTThanks are extended to Anna Haselbacher from the Karl-Franzens-Universität Graz, who provided them the initiallegal support for this research.

REFERENCES

[1] A. Razaghpanah, R. Nithyanand, N. Vallina-Rodriguez, S. Sundare-san, M. Allman, C. Kreibich, and P. Gill, ‘‘Apps, trackers, privacy,and regulators: A global study of the mobile tracking ecosystem,’’ inProc. Netw. Distrib. Syst. Secur. Symp., San Diego, CA, USA, 2018,pp. 1–15.

[2] European Parliament and the Council of the European Union. (2016).General Data Protection Regulation. Accessed: Oct. 28, 2020. [Online].Available: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN

[3] European Parliament and the Council of the European Union. (2002).Charter of Fundamental Rights of the European Union. Accessed:Oct. 28, 2020. [Online]. Available: https://www.europarl.europa.eu/charter/pdf/text_en.pdf

[4] T. Li and Z. Zhou. Do You Care About Chinese Privacy Law? Well,You Should. Accessed: Oct. 28, 2020. [Online]. Available: https://iapp.org/news/a/do-you-care-about-chinese-privacy-law-well-you-should/

[5] Court of Justice of the European Union. (2020). Decision 2016/1250on the Adequacy of the Protection Provided by the EU-US DataProtection Shield. Accessed: Oct. 22, 2020. [Online]. Available:https://curia.europa.eu/jcms/upload/docs/application/pdf/2020-07/cp200091en.pdf

15980 VOLUME 9, 2021

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

[6] Opinion 02/2013 on Apps on Smart Devices, European Commission,Brussels, Belgium, 2013.

[7] C. Castelluccia, S. Guerses, M. Hansen, J.-H. Hoepman, J. van Hoboken,and B. Vieira, ‘‘A study on the app development ecosystem and the tech-nical implementation of GDPR,’’ in Proc. Union Agency Netw. Inf. Secur.(ENISA), 2017, pp. 1–69.

[8] J. Gamba, M. Rashed, A. Razaghpanah, J. Tapiador, and N. Vallina-Rodriguez, ‘‘An analysis of pre-installed Android software,’’ in Proc.IEEE Symp. Secur. Privacy (SP), San Francisco, CA, USA, May 2020,pp. 1039–1055.

[9] R. Balebako, A. Marsh, J. Lin, J. Hong, and L. Faith Cranor, ‘‘The privacyand security behaviors of smartphone app developers,’’ in Proc. WorkshopUsable Secur., San Diego, CA, USA, 2014, pp. 1–10.

[10] S. Zimmeck, P. Story, D. Smullen, A. Ravichander, Z.Wang, J. Reidenberg,N. Cameron Russell, and N. Sadeh, ‘‘MAPS: Scaling privacy complianceanalysis to a million apps,’’ Proc. Privacy Enhancing Technol., vol. 2019,no. 3, pp. 66–86, Jul. 2019.

[11] P. Story, S. Zimmeck, and A. Ravichander, ‘‘Natural language processingfor mobile app privacy compliance,’’ in Proc. AAAI Spring Symp. Privacy-Enhancing Artif. Intell. Lang. Technol., Los Angeles, CA, USA, 2019,pp. 1–9.

[12] P. Ferrara and F. Spoto, ‘‘Static analysis for GDPR compliance,’’ in Proc.Italian Conf. Cybersecur., Milan, Italy, vol. 2058, 2018, pp. 1–10.

[13] Q. Jia, L. Zhou, H. Li, R. Yang, S. Du, and H. Zhu, ‘‘Who leaks my privacy:Towards automatic and association detection with GDPR compliance,’’ inProc. 14th Int. Conf. Wireless Algorithms, Syst., Appl., Honolulu, HI, USA,2019, pp. 137–148.

[14] J. C. Caiza, Y.-S. Martin, D. S. Guaman, J. M. Del Alamo, and J. C. Yelmo,‘‘Reusable elements for the systematic design of privacy-friendly informa-tion systems: A mapping study,’’ IEEE Access, vol. 7, pp. 66512–66535,2019.

[15] Opinion 13/2011 on Geolocation Services on Smart Mobile Devices, Euro-pean Commission, Brussels, Belgium, 2003.

[16] S. Kununka, N. Mehandjiev, and P. Sampaio, ‘‘A comparative study ofAndroid and IOS mobile applications’ data handling practices versuscompliance to privacy policy,’’ in Proc. IFIP Int. Summer School PrivacyIdentity Manage., Vienna, Austria, 2018, pp. 301–313.

[17] D. S. Guaman, J. M. Del Alamo, and J. C. Caiza, ‘‘A systematic map-ping study on software quality control techniques for assessing pri-vacy in information systems,’’ IEEE Access, vol. 8, pp. 74808–74833,2020.

[18] W. B. Tesfay, P. Hofmann, T. Nakamura, S. Kiyomoto, and J. Serna,‘‘PrivacyGuide: Towards an implementation of the EU GDPR on Internetprivacy policy evaluation,’’ in Proc. 4th ACM Int. Workshop Secur. PrivacyAnal., New York, NY, USA, Mar. 2018, pp. 15–21.

[19] D. J. Solove, ‘‘A taxonomy privacy,’’ Univ. PA. Law Rev., vol. 154, no. 3,p. 477, Jan. 2006.

[20] C. Koopman and D. K. Mulligan, ‘‘Theorizing privacy’s contestability?:A multi-dimensional analytic of privacy,’’ in Proc. Special Workshop Inf.Privacy, Fort Worth, TX, USA, 2013, pp. 1026–1029.

[21] D. S. Guamán, J. M. del Alamo, H. Veljanova, S. Reichmann, andA. Haselbacher, ‘‘Value-based core areas of trustworthiness in online ser-vices,’’ in Proc. IFIP Adv. Inf. Commun. Technol., Copenhagen, Denmark,2019, pp. 81–97.

[22] D. S. Guamán, J. M. del Alamo, H. Veljanova, A. Haselbacher, andJ. C. Caiza, ‘‘Ranking online services by the core areas of trustworthiness,’’RISTI—Rev. Iber. Sist. Tecnol. Inf., pp. 465–478, 2019.

[23] S. Gárses, ‘‘Can you engineer privacy?’’ Commun. ACM, vol. 57, no. 8,pp. 20–23, Aug. 2014.

[24] H. Nissenbaum, ‘‘Privacy as contextual integrity,’’ Wash. L. Rev., vol. 79,pp. 101–139, Jun. 2004.

[25] L. Batyuk, M. Herpich, S. A. Camtepe, K. Raddatz, A.-D. Schmidt, andS. Albayrak, ‘‘Using static analysis for automatic assessment and mitiga-tion of unwanted and malicious activities within Android applications,’’in Proc. 6th Int. Conf. Malicious Unwanted Softw., Fajardo, Puerto Rico,Oct. 2011, pp. 66–72.

[26] M. Junaid, D. Liu, and D. Kung, ‘‘Dexteroid: Detecting malicious behav-iors in Android apps using reverse-engineered life cycle models,’’Comput.Secur., vol. 59, pp. 92–117, Jun. 2016.

[27] G. Barbon, A. Cortesi, P. Ferrara, and E. Steffinlongo, ‘‘DAPA:Degradation-aware privacy analysis of Android Apps,’’ in Proc. 12thInt. Workshop Secur. Trust Manage., Crete, Greece, Balkans, 2016,pp. 32–46.

[28] S. Kelkar, T. Kraus, D. Morgan, J. Zhang, and R. Dai, ‘‘Analyzing HTTP-based information exfiltration of malicious Android applications,’’ in Proc.17th IEEE Int. Conf. Trust, Secur. Privacy Comput. Commun., New York,NY, USA, Aug. 2018, pp. 1642–1645.

[29] H. Xu, Y. Zhou, C. Gao, Y. Kang, andM. R. Lyu, ‘‘SpyAware: Investigatingthe privacy leakage signatures in app execution traces,’’ in Proc. IEEE 26thInt. Symp. Softw. Rel. Eng. (ISSRE), Washington, DC, USA, Nov. 2015,pp. 348–358.

[30] D. Sun, C. Guo, D. Zhu, and W. Feng, ‘‘Secure HybridApp: A detectionmethod on the risk of privacy leakage in HTML5 hybrid applicationsbased on dynamic taint tracking,’’ in Proc. 2nd IEEE Int. Conf. Comput.Commun. (ICCC), Chengdu, China, Oct. 2016, pp. 2771–2775.

[31] M. Pistoia, O. Tripp, P. Centonze, and J. W. Ligman, ‘‘Labyrinth: Visuallyconfigurable data-leakage detection in mobile applications,’’ in Proc. 16thIEEE Int. Conf. Mobile Data Manage., Pittsburgh, PA, USA, Jun. 2015,pp. 279–286.

[32] M. Spreitzenbarth, F. Freiling, F. Echtler, T. Schreck, and J. Hoffmann,‘‘Mobile-sandbox: Having a deeper look into Android applications,’’ inProc. 28th Annu. ACM Symp. Appl. Comput., Coimbra, Portugal, 2013,pp. 1808–1815.

[33] A. Ali-Gombe, I. Ahmed, G. G. Richard, and V. Roussev, ‘‘AspectDroid:Android app analysis system,’’ in Proc. 6th ACM Conf. Data Appl. Secur.Privacy, New York, USA, Mar. 2016, pp. 145–147.

[34] J. C. J. Keng, L. Jiang, T. K. Wee, and R. K. Balan, ‘‘Graph-aided directedtesting of Android applications for checking runtime privacy behaviours,’’in Proc. 11th Int. Workshop Autom. Softw. Test, New York, USA, 2016,pp. 57–63.

[35] M. Diamantaris, E. P. Papadopoulos, E. P. Markatos, S. Ioannidis, andJ. Polakis, ‘‘REAPER: Real-time app analysis for augmenting the Androidpermission system,’’ in Proc. 9th ACM Conf. Data Appl. Secur. Privacy,Houston, TX, USA, Mar. 2019, pp. 37–48.

[36] Y. He, X. Yang, B. Hu, and W. Wang, ‘‘Dynamic privacy leakage analysisof Android third-party libraries,’’ J. Inf. Secur. Appl., vol. 46, pp. 259–270,Jun. 2019.

[37] X. Liu, S. Zhu, W. W. B. , and J. Liu, ‘‘Alde: Privacy risk analysisof analytics libraries in the Android ecosystem,’’ in Proc. 12th EAI Int.Conf. Secur. Privacy Commun. Netw., Guangzhou, China, vol. 2017,pp. 655–672.

[38] L. L. Zhang, C.-J.-M. Liang, Z. L. Li, Y. Liu, F. Zhao, and E.-H.Chen, ‘‘Characterizing privacy risks of mobile apps with sensitivityanalysis,’’ IEEE Trans. Mobile Comput., vol. 17, no. 2, pp. 279–292,Feb. 2018.

[39] Android Developer’s Documentation. Google Play Protect. Accessed:Dec. 28, 2020. [Online]. Available: https://developers.google.com/android/play-protect/phacategories

[40] B. Andow, S. Mahmud, J. Whitaker, and W. Enck, ‘‘Actions speak louderthan words: Entity-sensitive privacy policy and data flow analysis withpolicheck,’’ in Proc. 29th USENIX Secur. Symp., Online-Event, 2020,pp. 985–1002.

[41] I. Reyes, P. Wijesekera, J. Reardon, A. E. B. On, A. Razaghpanah,N. Vallina-Rodriguez, and S. Egelman, ‘‘Won’t somebody think of the chil-dren examining COPPA compliance at scale,’’ Proc. Privacy EnhancingTechnol., vol. 2018, no. 3, pp. 63–83, Jun. 2018.

[42] M. Eskandari, B. Kessler, M. Ahmad, A. S. D. Oliveira, and B. Crispo,‘‘Analyzing remote server locations for personal data transfers in mobileapps,’’ Proc. Privacy Enhancing Technol., vol. 2017, no. 1, pp. 118–131,Jan. 2017.

[43] M. Fan, L. Yu, S. Chen, H. Zhou, X. Luo, S. Li, Y. Liu, J. Liu, andT. Liu, ‘‘An empirical evaluation of GDPR compliance violations inAndroid mHealth apps,’’ 2020, arXiv:2008.05864. [Online]. Available:http://arxiv.org/abs/2008.05864

[44] P. L. Mangset, ‘‘Analysis of mobile application’s compliance with the gen-eral data protection regulation (GDPR),’’ Norwegian Univ. Sci. Technol.,Trondheim, Norway, 2018, pp. 1–55.

[45] GDPR Fines & Data Breach Penalties. Accessed: Dec. 22, 2020. [Online].Available: https://www.gdpreu.org/compliance/ fines-and-penalties/

[46] Android Developer’s Documentation. (2020). Best Practices for UniqueIdentifiers. Accessed: Sep. 20, 2020. [Online]. Available: https://developer.android.com/training/articles/user-data-ids#identifier-characteristics.

[47] A. Acquisti, M. Sleeper, Y. Wang, S. Wilson, I. Adjerid, R. Balebako,L. Brandimarte, L. F. Cranor, S. Komanduri, P. G. Leon, N. Sadeh, andF. Schaub, ‘‘Nudges for privacy and security,’’ACMComput. Surv., vol. 50,no. 3, pp. 1–41, 2017.

VOLUME 9, 2021 15981

D. S. Guamán et al.: GDPR Compliance Assessment for Cross-Border Personal Data Transfers in Android Apps

[48] K. M. Ramokapane, A. C. Mazeli, and A. Rashid, ‘‘Skip, skip, skip,accept!!!: A study on the usability of smartphone manufacturer provideddefault features and user privacy,’’ Proc. Privacy Enhancing Technol.,vol. 2019, no. 2, pp. 209–227, Apr. 2019.

[49] Android Developer’s Documentation. Sign Your App. Accessed:Aug. 31, 2020. [Online]. Available: https://developer.android.com/studio/publish/app-signing

[50] Android Developer’s Documentation. Publish Your App. Accessed:Aug. 31, 2020. [Online]. Available: https://developer.android.com/studio/publish

[51] European Data Protection Board. (2020). Guidelines on the TerritorialScope of the GDPR. Accessed: Oct. 28, 2020. [Online]. Available:https://edpb.europa.eu/our-work-tools/our-documents/riktlinjer/guidelines-32018-territorial-scope-gdpr-article-3-version_en

[52] M. Lindorfer, M. Neugschwandtner, L. Weichselbaum, Y. Fratanto-nio, V. V. D. Veen, and C. Platzer, ‘‘ANDRUBIS—1,000,000 apps later:A view on current Android malware behaviors,’’ inProc. 3rd Int. WorkshopBuilding Anal. Datasets Gathering Exper. Returns Secur. (BADGERS),Wroclaw, Poland, Sep. 2014, pp. 3–17.

[53] European Commission. Adequacy Decisions. Accessed: Oct. 22, 2020.[Online]. Available: https://ec.europa.eu/info/law/law-topic/data-protection/international-dimension-data-protection/adequacy-decisions_en

[54] International Trade Administration. Privacy Shield Framework.Accessed: Oct. 22, 2020. [Online]. Available: https://www.privacyshield.gov/welcome

[55] Article 29 Data Protection Working Party. (2018). Guidelines onTransparency Under Regulation 2016/679. Accessed: Jun. 3, 2020.[Online]. Available: https://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=622227

[56] European Commission. Standard Contractual Clauses (SCC). Accessed:Oct. 22, 2020. [Online]. Available: https://ec.europa.eu/info/law/law-topic/data-protection/international-dimension-data-protection/standard-contractual-clauses-scc_en

[57] Android Developer’s Documentation. UI/Application Exerciser Monkey.Accessed: Oct. 28, 2020. [Online]. Available: https://developer.android.com/studio/test/monkey

[58] S. R. Choudhary, A. Gorla, and A. Orso, ‘‘Automated test input generationfor android: Are we there yet? (E),’’ in Proc. 30th IEEE/ACM Int. Conf.Automated Softw. Eng. (ASE), Lincoln, NE, USA, Nov. 2015, pp. 429–440.

[59] I. Reyes. Google Play Unofficial API. Accessed: Oct. 26, 2020. [Online].Available: https://github.com/io-reyes/googleplay-api

[60] A. Razaghpanah, A. A. Niaki, N. Vallina-Rodriguez, S. Sundaresan,J. Amann, and P. Gill, ‘‘Studying TLS usage in Android apps,’’ in Proc.Appl. Netw. Res. Workshop, New York, NY, USA, 2018, p. 5.

[61] Android Developer’s Documentation. Security with HTTPS and SSL.Accessed: Oct. 26, 2020. [Online]. Available: https://developer.android.com/training/articles/security-ssl.html

[62] A. Cortesi and M. Hils. An Interactive TLS-Capable Intercepting.Accessed: Oct. 26, 2020. [Online]. Available: https://github.com/mitmp.roxy/mitmproxy

[63] Android Developer’s Documentation. Android Debug Bridge. Accessed:Oct. 25, 2020. [Online]. Available: https://developer.android.com/studio/command-line/adb

[64] M. Kerrisk. Netstat-Network Stats. Accessed: Oct. 26, 2020. [Online].Available: https://man7.org/linux/man-pages/man8/netstat.8.html

[65] R. Winiewski and C. Tumbleson. Apktool. Accessed: Oct. 27, 2020.[Online]. Available: https://ibotpeaches.github.io/Apktool/

[66] A. Ravnas. Frida-Dynamic Instrumentation Framework. Accessed:Oct. 27, 2020. [Online]. Available: https://frida.re/docs/android/

[67] ApiLayer. IPStack. Accessed: Oct. 25, 2020. [Online]. Available:https://ipstack.com/documentation

[68] T. Libert.WebXray: A Tool for Analyzing Third-Party Content onWebpagesCompanies Which Collect User Data. Accessed: Oct. 28, 2020. [Online].Available: https://github.com/natrix-fork/webXray

[69] T. Libert, ‘‘An automated approach to auditing disclosure of third-partydata collection in Website privacy policies,’’ in Proc. World Wide WebConf., New York, NY, USA, 2018, pp. 207–216.

[70] M. Van Kleek, I. Liccardi, R. Binns, J. Zhao, D. J. Weitzner, andN. Shadbolt, ‘‘Better the Devil You Know,’’ in Proc. CHI Conf. Hum.Factors Comput. Syst., Denver, CO, USA, 2017, pp. 5208–5220.

[71] R. Binns, U. Lyngs, M. Van Kleek, J. Zhao, T. Libert, and N. Shadbolt,‘‘Third party tracking in the mobile ecosystem,’’ in Proc. 10th ACM Conf.Web Sci., New York, NY, USA, May 2018, pp. 23–31.

[72] Crunchbase Inc. Crunchbase Data Base. Accessed: Oct. 25, 2020.[Online]. Available: https://www.crunchbase.com/

[73] OpenCorporates. Open Database Of The Corporate World. Accessed:Oct. 25, 2020. [Online]. Available: https://opencorporates.com/

[74] L. Faith Cranor, ‘‘Platform for Privacy Preferences (P3P),’’ in Encyclo-pedia of Cryptography and Security Boston, MA, USA: Springer, 2011,pp. 940–941.

[75] M. Gallé, A. Christofi, and H. Elsahar, ‘‘The case for a GDPR-specificannotated dataset of privacy policies,’’ in Proc. AAAI Symp. Privacy-Enhancing AI HLT Technol., Los Angeles, CA, USA, vol. 2335, 2019,pp. 21–23.

[76] D. Watts. (2014). Joint Open Letter to App Marketplaces. Accessed:Sep. 29, 2020. [Online]. Available: https://www.priv.gc.ca/media/nr-c/2014/let_141210_e.asp

[77] C. A. General. (2019). Agreement to Strengthen Privacy Protectionsfor Users of Mobile Applications. Accessed: Sep. 29, 2020. [Online].Available: https://www.oag.ca.gov/news/press-releases/attorney-general-kamala-d-harris-secures-global-agreement-strengthen-privacy.

[78] J. Saldana, The Coding Manual for Qualitative Researchers, vol. 1., 3rded. Newbury Park, CA, USA: Sage, 2015.

[79] J. R. Reidenberg, T. Breaux, L. F. Cranor, B. French, A. Grannis, J. Graves,F. Liu, A.McDonald, T. Norton, R. Ramanath, N. C. Russell, N. Sadeh, andF. Schaub, ‘‘Disagreeable privacy policies: Mismatches between meaningand userss understanding,’’ SSRN Electron. J., vol. 7, pp. 1–68, Jan. 2014.

[80] K. Krippendorff, ‘‘Testing the reliability of content analysis data: What isinvolved and why,’’ in The Content Analysis Reader, 1st ed. Newbury Park,CA, USA: Sage, 2009, pp. 350–357.

[81] A. Sadeghi, H. Bagheri, J. Garcia, and S. Malek, ‘‘A taxonomy and qualita-tive comparison of program analysis techniques for security assessment ofAndroid software,’’ IEEE Trans. Softw. Eng., vol. 43, no. 6, pp. 492–530,Jun. 2017.

[82] P. Patel, G. Srinivasan, S. Rahaman, and I. Neamtiu, ‘‘On the effectivenessof random testing for android: Or how i learned to stop worrying and lovethe monkey,’’ in Proc. 13th Int. Workshop Autom. Softw. Test, May 2018,pp. 34–37.

DANNY S. GUAMÁN received the degree inelectronics and network engineering from theEscuela Politécnica Nacional, Ecuador, in 2010,and the M.Sc. degree in networking and telematicsengineering from the Universidad Politécnica deMadrid, Spain, in 2013, where he is currently pur-suing the Ph.D. degree. He is currently anAssistantProfessor with the Escuela Politécnica Nacional.His main research interests include the analysisof data disclosure and the assessment of privacycompliance in information systems.

JOSE M. DEL ALAMO is currently an Asso-ciate Professor (with tenure) with the UniversidadPolitécnica de Madrid (DIT-UPM). His researchinterests include issues related to personal datamanagement, including personal data disclosure,identity, privacy, and trust management, and con-sidering these aspects to advance the softwareand systems engineering methodologies apply-ing approaches, such as privacy-by-design andprivacy-by-default.

JULIO C. CAIZA received the Engineering degreein electronics and networking from the EscuelaPolitécnica Nacional, Quito, Ecuador, in 2010, andthe Ph.D. degree in telematics system engineer-ing from the Universidad Politécnica de Madrid,Madrid, Spain, in 2020. He is currently anAssistant Professor with the Escuela PolitécnicaNacional. His main research interest includes thedesign of privacy-friendly information systems.

15982 VOLUME 9, 2021