What are Weak Links in the npm Supply Chain? - arXiv

10
What are Weak Links in the npm Supply Chain? Nusrat Zahan Laurie Williams NC State University [email protected] [email protected] Thomas Zimmermann Patrice Godefroid Microsoft Research [email protected] [email protected] Brendan Murphy Chandra Maddila Microsoft Research [email protected] [email protected] ABSTRACT Modern software development frequently uses third-party pack- ages, raising the concern of supply chain security attacks. Many at- tackers target popular package managers, like npm, and their users with supply chain attacks. In 2021 there was a 650% year-on-year growth in security attacks by exploiting Open Source Software’s supply chain. Proactive approaches are needed to predict package vulnerability to high-risk supply chain attacks. The goal of this work is to help software developers and security spe- cialists identify weak links in a software supply chain by empirically studying npm package metadata. In this paper, we analyzed the metadata of 1.63 million JavaScript npm packages. We propose six signals of a security weakness in a software supply chain, such as the presence of install scripts, maintainer accounts associated with an expired email domain, and inactive packages with inactive maintainers. Our analysis identified 11 malicious packages from the install scripts signal. We also found 2,818 maintainer email addresses associated with expired domains, allowing an attacker to hijack 8,494 packages by taking over the npm accounts. We obtained feedback on our weak link signals through a survey responded to by 470 npm package developers. The majority of the developers supported three out of our six proposed weak link signals. The developers also indicated that they would want to be notified about weak links signals before using third-party packages. Additionally, we discussed eight new signals suggested by package developers. ACM Reference Format: Nusrat Zahan, Laurie Williams, Thomas Zimmermann, Patrice Godefroid, Brendan Murphy, and Chandra Maddila. 2021. What are Weak Links in the npm Supply Chain?. In Proceedings of The 44th International Conference on Software Engineering (Submitted to ICSE 2022). ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn 1 INTRODUCTION Modern software development frequently uses third-party pack- ages, raising the concern of supply chain security attacks. According to Snyk [1], 96% of applications use third-party packages, and 80% of the code in the software supply chain comes from third-party packages. The scope and scale of the expanding supply chain also come with high-security risks [2, 3]. Large package managers, like npm, maintain a centralized repository where developers can access and add third-party packages to their dependency tree easily [2, 4]. Unfortunately, attackers can also leverage the same features of npm and inject malicious package updates into a software supply chain. Submitted to ICSE 2022, May 21–29, 2022, Pittsburgh, PA, USA 2021. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn Each weak link in a supply chain is an opportunity for an at- tacker to abuse the targeted supply chain. To address this, software developers and security specialists must detect possible weak links in the software supply chain to protect it against attacks. A recent report from Sonatype shows supply chain attacks has increased 650% in 2021 on top of year-over-year growth of 430% in 2020 [3] where attackers injected malicious code into benign packages [510]. An example of a sophisticated supply chain attack is the SolarWinds attack [11]. SolarWinds is a proprietary cyber- security monitoring solution [1, 12] whose customers include 425 Fortune 500 companies and at least nine U.S. federal agencies [12]. More than 100 companies and federal agencies were exposed to the breach [13]. The attacker gained access to customer networks, systems, and data through a malicious package update [12]. An incident of this magnitude raises significant concerns about the consequences of supply chain attacks. In supply chain attacks, instead of exploiting latent vulnerabili- ties in source code, attackers inject malware directly into benign code that is likely to be deployed by users [3, 14]. A common strat- egy is to target the most commonly used packages in a dependency chain to infect a maximum number of users [3]. In that way, bad actors can execute an attack that will propagate throughout the supply chain. Thus, popular large registries like npm, which hosts 1.8 million JavaScript packages as of 2021, are a highly targeted malware distribution channel for attackers due to heavy growth and dependence on JavaScript packages [3, 14–17]. One way attackers can develop such supply chain attacks is by following a data-driven attack strategy. For example, an attacker can collect and analyze metadata (e.g., maintainer and contributor information, list of upstream or downstream dependencies) from the package registry to find the weakest links (e.g., less secure mod- ule, maintainers) in the targeted supply chain. Then, the attacker can exploit the weakest link and execute an attack by compromising accounts, credentials [6, 1820] or gain maintainer support through social engineering [21]. The dynamic nature of such attacks chal- lenges conventional detection methods [4, 14, 16, 17] because the attacks spread fast without the victim knowing where bad actors planted the malware in the supply chain. Hence, practitioners need proactive approaches based on empirical data to predict package vulnerability to supply chain attacks. The goal of this work is to help software developers and security spe- cialists identify weak links in a software supply chain by empirically studying npm package metadata. In this paper, we propose six weak link signals for package de- pendencies. We define a signal as a weak link if the signal exposes a package to a higher risk of a supply chain attack. We focus on the relationship between neutral and suspicious meta- data in a supply chain. To that end, we perform an empirical study arXiv:2112.10165v1 [cs.CR] 19 Dec 2021

Transcript of What are Weak Links in the npm Supply Chain? - arXiv

What are Weak Links in the npm Supply ChainNusrat ZahanLaurie WilliamsNC State Universitynzahanncsuedulawilli3ncsuedu

Thomas ZimmermannPatrice GodefroidMicrosoft Research

tzimmermicrosoftcompgmicrosoftcom

Brendan MurphyChandra MaddilaMicrosoft Research

bmurphymicrosoftcomchmaddilmicrosoftcom

ABSTRACTModern software development frequently uses third-party pack-ages raising the concern of supply chain security attacks Many at-tackers target popular package managers like npm and their userswith supply chain attacks In 2021 there was a 650 year-on-yeargrowth in security attacks by exploiting Open Source Softwarersquossupply chain Proactive approaches are needed to predict packagevulnerability to high-risk supply chain attacks

The goal of this work is to help software developers and security spe-cialists identify weak links in a software supply chain by empiricallystudying npm package metadata

In this paper we analyzed the metadata of 163 million JavaScriptnpm packages We propose six signals of a security weakness ina software supply chain such as the presence of install scriptsmaintainer accounts associated with an expired email domain andinactive packages with inactive maintainers Our analysis identified11 malicious packages from the install scripts signal We also found2818 maintainer email addresses associated with expired domainsallowing an attacker to hijack 8494 packages by taking over thenpm accounts We obtained feedback on our weak link signalsthrough a survey responded to by 470 npm package developers Themajority of the developers supported three out of our six proposedweak link signals The developers also indicated that they wouldwant to be notified about weak links signals before using third-partypackages Additionally we discussed eight new signals suggestedby package developers

ACM Reference FormatNusrat Zahan Laurie Williams Thomas Zimmermann Patrice GodefroidBrendan Murphy and Chandra Maddila 2021 What are Weak Links in thenpm Supply Chain In Proceedings of The 44th International Conference onSoftware Engineering (Submitted to ICSE 2022) ACM New York NY USA10 pages httpsdoiorg101145nnnnnnnnnnnnnn

1 INTRODUCTIONModern software development frequently uses third-party pack-ages raising the concern of supply chain security attacks Accordingto Snyk [1] 96 of applications use third-party packages and 80of the code in the software supply chain comes from third-partypackages The scope and scale of the expanding supply chain alsocome with high-security risks [2 3] Large package managers likenpm maintain a centralized repository where developers can accessand add third-party packages to their dependency tree easily [2 4]Unfortunately attackers can also leverage the same features of npmand inject malicious package updates into a software supply chain

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA2021 ACM ISBN 978-x-xxxx-xxxx-xYYMM $1500httpsdoiorg101145nnnnnnnnnnnnnn

Each weak link in a supply chain is an opportunity for an at-tacker to abuse the targeted supply chain To address this softwaredevelopers and security specialists must detect possible weak linksin the software supply chain to protect it against attacks

A recent report from Sonatype shows supply chain attacks hasincreased 650 in 2021 on top of year-over-year growth of 430in 2020 [3] where attackers injected malicious code into benignpackages [5ndash10] An example of a sophisticated supply chain attackis the SolarWinds attack [11] SolarWinds is a proprietary cyber-security monitoring solution [1 12] whose customers include 425Fortune 500 companies and at least nine US federal agencies [12]More than 100 companies and federal agencies were exposed tothe breach [13] The attacker gained access to customer networkssystems and data through a malicious package update [12] Anincident of this magnitude raises significant concerns about theconsequences of supply chain attacks

In supply chain attacks instead of exploiting latent vulnerabili-ties in source code attackers inject malware directly into benigncode that is likely to be deployed by users [3 14] A common strat-egy is to target the most commonly used packages in a dependencychain to infect a maximum number of users [3] In that way badactors can execute an attack that will propagate throughout thesupply chain Thus popular large registries like npm which hosts18 million JavaScript packages as of 2021 are a highly targetedmalware distribution channel for attackers due to heavy growthand dependence on JavaScript packages [3 14ndash17]

One way attackers can develop such supply chain attacks is byfollowing a data-driven attack strategy For example an attackercan collect and analyze metadata (eg maintainer and contributorinformation list of upstream or downstream dependencies) fromthe package registry to find the weakest links (eg less secure mod-ule maintainers) in the targeted supply chain Then the attackercan exploit the weakest link and execute an attack by compromisingaccounts credentials [6 18ndash20] or gain maintainer support throughsocial engineering [21] The dynamic nature of such attacks chal-lenges conventional detection methods [4 14 16 17] because theattacks spread fast without the victim knowing where bad actorsplanted the malware in the supply chain Hence practitioners needproactive approaches based on empirical data to predict packagevulnerability to supply chain attacks

The goal of this work is to help software developers and security spe-cialists identify weak links in a software supply chain by empiricallystudying npm package metadata

In this paper we propose six weak link signals for package de-pendencies We define a signal as a weak link if the signalexposes a package to a higher risk of a supply chain attackWe focus on the relationship between neutral and suspicious meta-data in a supply chain To that end we perform an empirical study

arX

iv2

112

1016

5v1

[cs

CR

] 1

9 D

ec 2

021

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

on 163 million npm packagesrsquo metadata to identify weak link sig-nals in the npm supply chain Our work addresses the followingresearch questions

bull RQ1 (Identification) What weak link signals can be iden-tified in the npm package metadata

bull RQ2 (Agreement) How do practitioners perceive the iden-tified weak link signals in the npm supply chain

This paper makes the following contributionsbull Six proposed weak link signals three of which are confirmedas strong signals from a survey completed by 470 npm pack-age maintainers

bull Eight new weak link signals suggested by survey partici-pants

bull A framework1 to collect categorize and analyze packagemetadata in the entire npm registry to identify weak linksignals

2 BACKGROUND AND RELATEDWORKA software supply chain attack is a cyber-attack that aims to infectorganizations and end-users by targeting less-secure componentsin the supply chain [15] The supply chain encompasses everythingthat goes into or affects code from development through CICDpipeline until production deployment Package registries like npmplay an essential role in automating the software supply chainUnfortunately increased automation comes with a higher securityrisk Supply chain attacks are considered critical because of the in-creasing reliance on third-party packages as a direct and transitivedependency A single package dramatically increases a systemrsquosattack surface due to the ldquonestedrdquo nature of dependencies [2] There-fore we need to predict and prevent such weak links before themalicious code is distributed in the supply chain Our paper aimsto raise awareness by quantifying weak link signals and discussingthe necessity of weak link identification in the supply chain Thissection discusses the terminology needed to understand our studyand then presents an overview of supply chain research

21 Key Terminology211 npm is a platform for publishing and hosting JavaScriptpackages which is the largest ecosystem to date npm has a CLItool for publishing and installing packages and separately it hasan online repository to host all the package and their metadataAll npm packages contain a file called packagejson - a centralplace to configure and describe how to interact with and run apackage and npm used this data to manage a package installationor handle the projectrsquos dependencies [22] This paper exclusivelystudies the packagejson file which provides visibility on the wholenpm supply chain

212 Primary Stakeholders Here we discussed the relevantstakeholder roles deeply connected with the software supply chainthat can benefit from our research Package Maintainers are re-sponsible for developing and maintaining packages They mayreceive and review pull requests from contributors and have writeaccess to make changes in different stages of package development

1httpsgithubcomnzahansnpm-supply-chain-weak-link

Package Contributors can view the source code and can sug-gest code changes and package maintainers approve their changesEcosystem Administrators manage the package registry frame-work and are responsible formaintaining thewhole software ecosys-tem

22 Attack VectorHere we overview the different attack vectors an attacker may useto introduce a supply chain attack 1)Malicious package releaseAn attacker may publish malicious packages and hence trick otherusers into installing or depending on such packages[4 14] 2) SocialEngineering An attacker may manipulate a maintainer to handover sensitive information [2] 3) Account Takeover An attackermay compromise the credentials of a maintainer to inject mali-cious code under the maintainerrsquos name 4) Ownership transferAn attacker can show enthusiasm to maintain popular abandonedpackages and transfer the ownership of a package [14] 5)Remoteexecution An attacker may target a package by compromisingthe third-party services or remote server used by that package

23 Research on Supply Chain WorkMany prior works have leveraged past supply chain records to showhow magnificent a supply chain attack can be Zimmermann et alstudy provide evidence that popular packages and highly activedevelopers in npm suffer from single points of failure and manyunmaintained packages in the npm repository which no longerreceive patches indirectly threaten the security of the npm supplychain Ohm et alrsquos study [14] investigated different supply chainattacks from 174 malicious packages and contributed an enricheddataset for future research on malicious package detection Duanet al [4] identified 339 malicious packages from their proposed un-supervised learning framework Gonzalez et al[17] used repositoryand commit metadata to detect malicious packages automatically

Although security researchers in academia and industry are ac-tively investigating attacks on registries and proposing solutionsthese approaches seem to be based on specific instances of mali-cious attacks They are especially effective to prevent maliciouscode distribution Recent attacks show evidence that out-of-the-box exploit strategies will appear again and again [10 23] Anyad-hoc solution is not enough to prevent an attack that we have notwitnessed yet A better approach is needed to embrace a proactivestrategy that predicts package susceptibility as a potential threatinstead of maliciousness and then adopt the best security practiceto stay ahead of the attackers

3 RQ1 WEAK LINK SIGNALIn this section we describe our process of identifying and quan-tifying weak link signals to answer RQ1 We define a signal asa weak link if the signal exposes a package to a higher riskof a supply chain attack If a bad actor targets the supply chainthey are going to follow the path of least resistance [24] for exam-ple finding weak links from publicly available package metadatawithout penetrating the whole source code In this study we fol-lowed an approach the attackers might take to find the weakest linkin the npm registry One of the critical challenges in identifyingthe weak link signals was selecting the suspicious candidate of

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

metadata for further investigations To overcome this we took ad-vantage of the authorrsquos domain expertise and prior reported supplychain attack patterns [4 14 25] Four authors are from industryand two are from academia Therefore we focused on metadataassociated with human involvement (eg maintainer contributorinformation) since many past attacks involved human actors as aweak link to the package [6 18ndash21] Then we selected functionalmetadata of a package (eg dependency dependents) to understandthe underlying reason that might lead to an attack attempt

31 Data Process311 Data Collection We captured a snapshot of 1630101 pack-agejson files on June 7 2021 through the npm public API2 EachJSON file contains metadata properties of an npm package Weused the package ID (package name and version together form aunique identifier called ldquoIDrdquo) as the primary reference while storingcorresponding metadata for each package As a root package IDmaps to all other metadata like dependency information packagelast modified time scripts versions license repository unpackedpackage size a total number of files maintainers and contribu-tors names and email addresses We used Python libraries (JSONNumPy pandas amp psycopg2) to extract all the relevant metadatafrom each packagejson file which we stored systematically in aPostgreSQL database Each packagejson file contained multipleversions history of package metadata against one unique ID In thisstudy we only used the most recent version of the metadata to an-alyze our weak link indicators Apart from metadata extraction weanalyzed the maintainer reach and package reach to understand theimpact of a package and its maintainers in the npm registry Belowwe explain how we queried and measured package and maintainerreachndashPackage Reach We measured the package reach to evaluate a pack-agersquos popularity in terms of dependants and downloads Hence weconsidered two metrics (1) the number of packages that depend ona package (dependants) which we computed from a set of all thepackages that have a direct dependency on a package and (2) thenumber of downloads of a package in the past 12 months which wecollected from public npm API3Maintainer Reach We measured the number of packages owned bya maintainer to evaluate the maintainer reach in npm We queriedagainst npm packages to compute maintainer reach a set of pack-ages where a maintainer is listed as a package maintainer

312 Exclusion Criteria We removed packages that might in-troduce noise (for example wrong ownership invalid metadataetc) In this study we removed a package if the package has nodependents and it meets with any one of the following IF condi-tions

First If a package is a security holding packageand is re-moved from the npm registry by the npm security team due tomalicious activities Hence the JSON file we received for such pack-ages consisted of dummy metadata filled by the npm team Weremoved 8344 packages as security holding packages The ldquodescrip-tionsrdquo and ldquodis-tagsrdquo property of the packagejson file define thesecurity holding status2httpsreplicatenpmjscom_all_docsinclude_docs=true3httpsapinpmjsorgdownloadspointperiod[package]

Second If a package is deprecatedmeaning the package is notactively maintained by the maintainers The packagejson file con-tains a separate property called ldquodeprecatedrdquo to indicate packagedeprecated status which is assigned by the npm administrators orthe maintainers themselves Though a deprecated package does notalways indicate an unusable package npm and maintainers recom-mend using alternative packages or versions While the deprecatedpackages are not actively maintained end-users might still usethem directly or transitively Hence we only removed deprecatedpackages in the recent version if no other packages in the npmregistry used them in their dependency tree We removed 37917deprecated packages that used by none

Third If a package hasno repository andno license A reposi-tory is essential to track organize and validate the source coderightsand a license allows the OSS community to reuse the code A pack-age without a license indicates that the authors retain all sourcecode [22] and no repository is attached to verify otherwise Wehave identified packages where both repository and license prop-erty was null or filled with invalid values (eg ldquoUNLICENSEDrdquoldquoXYZrdquo ldquopersonal userdquo etc) Hence we only removed the packagesfrom further considerations if they meets all three conditions 1)no dependent and 2) no license and 3) no repository We found89893 such packages and removed them from the database

In total we removed 9(135996) packages that fits into our pro-posed exclusion criteria and our final dataset contained 1494105packages which were used for further analysis

32 Weak Link SignalsWe briefly discuss our six weak link signals their attack modelsand specific data analysis in the following six subsections

321 W1 Expired Maintainer Domain An attacker can hi-jack a component if a maintainerrsquos domain is expired anddoes not have 2FA authentication set up on their account Ingeneral any domain name can be purchased from a domain regis-trar allowing the purchaser to connect to an email hosting serviceto get a personal email address An attacker can hijack a userrsquosdomain to take over an account associated with that email addressTypically a domain hijacking attack occurs by 1) gaining unau-thorized access to the registrar or 2) gaining access to the ownerrsquosemail address and then resetting the password Domain hijackingis not a new notion We have seen many attacks in the past suchas 119901119890119903119897 119888119900119898 hijacking [26] where the attacker changed the domainregistrar and renewed the domain expiration date until 2029 andthen changed the DNS address The time to discover the attack wasfour months even though 119901119890119903119897 119888119900119898 is a well-renowned site

In the npm registry an attacker can execute a more simplisticapproach to hijack an email address An attacker can track thedomain of a maintainer in the domain registrar site If the domain isexpired and available for sale the attacker can register and alter theDNS ldquomail exchangerdquo (MX) records to hijack the maintainerrsquos emailaddress In most cases maintainer accounts are associated withan email address in the package registry One could reset a npmaccount directly by email address unless the maintainer activated2FA authentication or used different email address in user account

Analysis of npm maintainers domain To collect the main-tainer email address we extracted and stored the maintainer name

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

and email address from each packagejson file We then queriedand split the domain name from each email address and countedthe number of times a domain is used across all npm packages Outof 93K domains in the 163M npm packages 86 were unique (ap-peared once) while the rest were from the public or organizationaldomain We hypothesize that all maintainer domains in the npmregistry are up-to-date and none of them are available for sale Tothat end we performed a bulk query of 93K domains in Godaddy adomain registrar site We found 5346 domains are available for saleWe picked a random sample of 50 domains to determine the truepositive rate and checked each domain independently in GodaddyWe found 33 of them are not for sale Due to the high false-positiverate we manually verified 5346 domains in Godaddy

We found 2818 maintainerrsquos domains are available for saleand can be purchased These maintainers own 8494 pack-ages in the npm registry with average direct dependents of243 packages and average downloads of 53K in the past 12months

We reported our findings to the npm security team to validatethe email address for npm user accounts We note that our bulkquery may have a high false-negative rate since we found 47 falsepositive from the 5346 domain Hencenpm may have more than2818 maintainers associated with expired domains

322 W2 Installation Script An attacker can use installa-tion scripts to run commands that perform malicious actsthrough the package installation step [27] Install scripts runautomatically either before during or after package installationwhen certain events are triggered These scripts are used to makethe installation process easy since they are automatically run bynpm However for an attacker such scripts create opportunitieswhere ldquosky is the limitrdquo The attacker could steal user-sensitive dataor execute a new child process to create backdoor access or gainaccess to execute a series of commands remotely [4 5 7ndash9 25]Alternatively the attacker can infiltrate the third-party dependencesince that installation script will run automatically by the targetedpackage and its users during installation The CCleaner [28] So-larwinds [12] NotPetya [29] and Adverline [30] data breachesare examples of such attacks where attackers targeted the remoteserver third-party vendors to execute a large supply chain attackThough the presence of installation scripts themselves are not a di-rect indication of maliciousness the privilege to run automaticallymakes installation scripts a weak link signal in the supply chainAs best practices even npm registry recommends avoiding installscript- ldquoDonrsquot use installYou should almost never have to explicitlyset a preinstall or install scriptrdquo4While evidence of such an attackthrough the installations script is not rare [4 5 7ndash9 23 25 28ndash30]similar or perhaps even worse attacks may happen in the future

Analysis of npm packages To collect script details we ex-tracted and stored the script key which is the scriptrsquos name (egpreinstall)and the script value which contains the script pathshell commandsfrom the packagejson file Then we query and separate all thepackages that have script keys like ldquoInstallrdquo

4httpsdocsnpmjscomcliv7using-npmscripts

we found 22 (33249) of packages use install scripts indicat-ing that 978 of packages may follow npm recommendationof not using the install script as best security practices

Additionally we collected 3635 malicious packagesjson filesfrom npm They are similar to the security holding packagejson file(see Section 312) with all dummy metadata The only differenceis that these JSON files have actual malicious script key and valuepair for example- the original directory of malicious code files ormalicious shell script embedded in JSON files with other dummymetadata To analyze the malicious JSON file two researchers sepa-rately reviewed these scripts and compared results for verificationSince the scope of the project was limited to packagejson files onlywe were only able to analyze 419 (out of 3635) packages wheremalicious shell command was embedded directly in the JSON fileFrom our analysis we found three types of attack patterns thatattackers use frequently

bull Transfer Users data to third party server (eg- hostnameetcshadow etcpasswdhomeltusergtssh ) We found 350packages that communicate and transfer data with thirdparty server

bull Download malicious tool and run it to user machine Forexample- download a crypto miner software and run it onuser machine We found 82 packages

bull Spawning new process to interact with operating-systemprocesses particularly the native module child_process (eg-use of reverse shell) We found 212 packages that start a newprocess and transfer data to third party server

We found 939 (3412) of unique malicious packages useinstall scripts indicating that malicious attackers use installscripts frequently

323 W3UnmaintainedPackage Attackers can target pack-ages that are more likely to take over and sneak in malwaredue to lack of maintenance Differentiating between unmain-tained and feature-complete packages that require no further re-leases is complex and ambiguous [31] Even if the package maynot require any maintenance by itself it may need maintenancedue to security issues in its dependencies or use new syntax toimprove performance bug fixing amp documentation improvementsIn 2020 the average time to remediate security issues was 68 daysin open source projects [32] and 66 of security vulnerabilities innpm packages remain unpatched [33] Hence the time requiredto remediate a security issue in unmaintained packages remainsunknown We considered unmaintained packages as a weak linksignal from two directions 1) Inactive packages and 2) Inactivemaintainers

Inactive PackageWe considered a package inactive if the pack-agersquos last modification time in the packagejson file is past two yearsOther prior work [31 33] has defined inactivity as one year gapHowever many npm packages have low complexity with a fewlines of code and may not require any recent update Thus weconsider two years as an inactivity period

An attacker can exploit vulnerabilities in all applications thatdirectly or transitively depend on vulnerable code as long as the

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

vulnerable dependency or the package itself remains unfixed [2]The response time to fix such issues is undetermined for inactivepackages and end-usersmay remain infected due to unawareness ofthe security threat A deprecated package (see Section 312) exposeseven higher security risk since the maintainers no longer maintainthe package to fix security issues An attacker can inject malwarein widely used but unmaintained packages A prominent exampleof such an attack is the ldquoMailparserrdquo attack [34] an old deprecatedpackage An attacker indirectly reaches the ldquoMailParserrdquo through arelatively new package named ldquogetcookiesrdquo which had indirectlybeen made into the nested dependency chain of MailParser

Inactive MaintainersWe defined an inactive maintainer if themaintainer had no active packages in the past two yearsAn attackercan target packages with inactive maintainer(s) because any attackwill remain undetected due to the inactivity of maintainers There-fore distinguishing between active and inactive maintainers is nec-essary to identify the packages that require maintenanceAlthoughinactive maintainerrsquos packages are a subset of inactive packageswe must identify them separately Because an inactive packagemay have maintainers who are active in other packages whereasinactive maintainers indicate the maintainer is inactive in the entirenpm registry

Analysis of npmpackagesWe extracted and stored the ldquotimerdquoproperty of the packagejson file to measure the number of pack-ages that have been inactive for the past two years We identifiedinactive maintainers by evaluating the last modified time proper-ties for all packages corresponding to an individual maintainer Apackage where none of the maintainers are active elsewhere inthe entire package registry is determined as inactive maintainersof unmaintained packages We also considered deprecated pack-ages as unmaintained since they are unmaintained officially by themaintainer We separated the deprecated package where the lastmodification time passed our threshold value because the depreca-tion was declared later

We found 587 of packages and 443 of maintainers areinactive in the npm registry There are 5532 additional dep-recated packages where the deprecation date passed ourthreshold value

The packagejson does not provide individual maintainer activi-ties history Hence distinguishing inactive maintainers from activepackages was not plausible

324 W4 Too many Maintainers An attacker can target apackage in a dependency chain to exploit the possible over-sight due to too many maintainers in chargeWhen a projecthas many people it lacks ownership there may not be proper vet-ting in code changes because all of them are owners and no one isresponsible Alternatively many people in a team may lack appro-priate communication which increases the chance of oversight inmaintainer activities Bad actors may hide their identity by com-promising one maintainer profile or performing social engineeringto access the package

Our hypothesis is supported by previous work of Meneely etal [35] where they empirically showed projects with more develop-ers has more vulnerabilities ndashno one was ldquoin chargerdquo of the security

of the project A recent attack on php-src [36] supports the bene-fits of better communication between maintainers The attack wasidentified within hours because attackers tried to impersonate twoprimary PHP maintainers One of them spotted the unusual commitimmediately and reverted the commit An opposite plot of suchan attack could be if the impersonated maintainers were not theprimary maintainers and did not have proper communication withother team members The unusual commit may have taken days tobe discovered

Analysis of npmpackages Though the exact number of main-tainers varies from project to project in case of npm the numberof maintainers is more likely to be less because npm is buildingupon reusing small JavaScript libraries 15 million npm packageshave an average cost of 17 maintainers which makes sense assmall packages may not require many maintainers We extractedand stored the list of maintainers corresponding to each packagein a SQL database and then ranked them by the total number ofmaintainers in the package As discussed the entire npm had anaverage of 17 maintainers analyzing the whole data set is not prac-tical to identify the unusual package with too many maintainersTherefore we picked the top 1 (14941) packages in npm rankedby the total number of maintainers

We found that our selected 1 packages had an average of324 maintainers per package which was 19 times more thanaverage package maintainers in the entire registry

Our concern was also addressed by Zimmermann et al [2] wherethey suggested that the value of over 20 maintainers in a packageis questionable

325 W5 Too many contributors An attacker can sneakin malicious code bypassing the maintainerrsquos radar when amaintainer is responsible for many contributorsMany priorresearch shows that when multiple contributors change a file thefile is more likely to have more failures [35 37ndash39] which mayinclude security issues These observations motivated our nextsignal A maintainer-to-contributors ratio or increased number ofcontributors increases the security risks

Contributors may vary in knowledge skill and experience Pack-age quality will inevitably suffer if the maintainers do not payenough attention to review the pull request from contributors Es-pecially where an average of 17 maintainers maintain JavaScriptpackages many contributors bring additional responsibility formaintainers in product functionality or security If the maintainersdo not pay enough attention contributors with malicious purposescan include potential backdoors into code and malicious code willmerge An attacker can target packages with many contributorswhere the attacker can do social engineering to become a trustedcontributor and make some minor contribution to gain trust andthen sneak in malicious code

Analysis of npm packagesWe extracted and stored the con-tributor(s) list from packagejson files We measured the ratio be-tween the total number of maintainers to the total number of con-tributors of corresponding packages where a higher ratio indicateseach maintainer is responsible for fewer contributors Out of 15million npm packages only 26 (38913) of the packages have listedcontributors and the average maintainer to contributors ratio is

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

32 Since JavaScript packages are small it makes sense that theymay not need many contributors in one package To understandthe extreme cases where package maintainers added many contrib-utors we picked the top 1 packages in npm where the maintainerto contributor ratio was minimum

We found that the selected 1 (389) packages had an averagemaintainer to contributorrsquos ratio of 140

326 W6 Overloaded Maintainer An attacker may targeta maintainer who owns many packages because the main-tainer may not have enough time to maintain security of allthe packages Overloaded maintainers are weak links if the main-tainers 1) have a large number of upstream packages 2) use a largedependency chain in packages attackers may inject malware intheir dependency or 3) if they have many inactive packages anattacker may try to take over those packages A well-known attackon an overloading maintainer is event-stream attack [6] where anoriginal maintainer handed over a popular npm package ownershipto a malicious maintainer simply because the package was inactiveand the original maintainer does not need that package anymoreWe queried the same maintainer in our database We found that heis an overloaded maintainer and he owns 62 inactive packageswith more than 30k direct dependents Although one may arguethat a maintainer who owns many packages may consider as a signof stability over any new maintainer while we agree with the state-ment we want to include such a maintainer to the supply chainadministratorrsquos security radar Attackers are more likely to targetsuch maintainers if the maintainer is overloaded and does not haveenough time to maintain all of his packages equally We propose thesupply chain administrator should follow up the security measureof overloading maintainers such as minimum package dependencyownership transfer of inactive packages two-factor authorizationset up on their account active domain registration

Analysis of overloaded maintainersWe analyzed the over-loaded maintainer using our maintainer reach metric (section31)We found that 482 maintainers own more than one packagewhereas only 248 have downstream users We again picked thetop 1(4743) maintainers ranked by higher maintainer reach tounderstand the extreme cases

We found that the top 1 maintainers own an average num-ber of 1803 packages with direct dependents of 4010 averagepackages

We analyzed further to understand risk factors associated withinactive packages and downstream dependencies We found that30 (1442) of maintainers do not hold any inactive packages how-ever 70 shows otherwise In terms of package dependency wefound that 80 of packages have dependencies which indicatesthat overloading maintainers are responsible for their dependencychain security to protect the downstream users

4 CASE STUDYThis section provides three case studies of our proposed weaklink signals First we present our analysis in popular packages Wehypothesize that the popular package maintainers are aware of such

weak links and will avoid them to protect the downstream userrsquossecurity Second we present a case study on malicious packagesidentified by our signals and at the end we present an example ofa data-driven attack

41 Case study 1 Popular packagesConsidering 650 growth in supply chain attacks as an indica-tion [3] attackers will likely continue to target popular packagesand maintainers as a preferred path to exploiting downstream vic-tims at scale Hence we analyzed whether weak link signals existin popular packages

To measure package popularity we used the package reach met-rics from Section 31 We picked the top 10000 packages from the(1) package dependents and (2) package downloads analysis andcombined the two lists removing duplicates into a combined pop-ular package sample The sample included 14892 packages (1 oftotal packages) as popular with an average of 9374 dependents and885 million downloads in 12 months We note that the popularpackage analysis does not consider transitive dependents Hencethe impact of a weak link in a popular package may present a highersupply chain risk than presented

Expired Domain (W1) Among the popular packages 33 pack-ages (average direct dependents 3829 and average downloads 111million) have at least one maintainer with an expired domain Theseaccounts can be used to compromise the package unless two-factorauthentication is enabled

W1 12637 packages that depend on popular packages wereexposed to higher supply chain risk due to expired domains

Install Scripts (W2) Among the popular packages 362 pack-ages (average direct dependents 14163 and downloads average346 million) have install scripts This is an encouraging resultshowing that 975 of popular packages are aware of the risks andavoid using install scripts

W2 362 popular packages had install scripts and exposed1416 packages on average to attacks through install scripts

Unmaintained Package (W3) Of the popular packages 38(5645) packages (average direct dependents 4224 and downloadsaverage 761 million) had an inactive status Interestingly 560 ofthem (average direct downstream dependents 13693 and down-loads average 328 million) were deprecated Deprecated packages(Section 312) are a classic example of unmaintained packageswhere maintainers officially declare the package unmaintainedWe also found 619 inactive maintainers who still own 645 popularpackages

W3 38 of popular packages were inactive and 560 of themwere deprecated 645 popular packages did not have any ac-tive maintainers An inactive package exposed 422 packageson average to higher supply chain risk

Too many Maintainers (W4) Among the popular packages421 packages (average direct dependents 2692 and downloadsaverage 416 million) had an average of 346 maintainers

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Large packages may need more maintainers Hence we hypoth-esize that 421 popular packages are large compared to the otherpopular packages To test the hypothesis we extracted and storedthe ldquounpacked sizerdquo and ldquofile countrdquo property to measure the pack-age size We created two samples (1) 14471 popular packageswithout 421 packages (average of 54 files 6968 MB unpacked sizeand 24 maintainers) and 2) 14892 popular packages (average of55 files 6978 MB unpacked size and 34 maintainers) We foundthe 421 packages added an average cost of one maintainer withoutany significant difference between package size and file countsHence having too many maintainers in a JavaScript package doesnot necessarily indicate packages are large

W4 421 popular packages had an average of 346 maintainersand potentially exposed their dependents to higher supplychain risk

Too many contributors (W5) Among the popular packages23 packages (average direct dependents 645804 and downloadsaverage 642 million) had an average maintainer to contributorrsquosratio of 137 Therefore on average one maintainer has to manage37 contributorsrsquo to secure packages against malicious code andmalicious contributors

W5 23 popular packages are exposed to higher supply chainrisk due to maintainers having to manage an average of 37contributors

Overloaded Maintainers (W6) Among the popular packages9871 (663) are owned by popularmaintainers whomanagemanypackages (average direct dependents 103902 and downloads aver-age 1103 million) Attackers can try to compromise these main-tainers to exploit downstream victims at scale

W6 2491 popular maintainers own 9871 popular packages

42 Case study 2 Installation scriptsThe scope of this work is to analyze package metadata to identifyweak links that lead to possible supply chain attacks

Even though the focus was not on detecting malicious packagesour analysis of installation scripts revealedmalicious activity withinpackages Some package have shell commands embedded directlyin the packagejson file We investigated the shell to verify whetherthey could help detect malicious packages We queried keywordsrelated to read or write user-sensitive information from the filesystem (eg etcpasswd etcshadow) or HTTP requests to transferuser data to a remote server (eg curl wget) Our keyword searchis motivated by Duan et al [4] who proposed a framework andterminologies derived from existing supply chain attacks

We identified 74 packages where the installation scripts includedkeywords like curl etcshadow etcpasswd hostname Of those11 were found to be malicious packages and rest were benign Allthe malicious packages performed DNS lookups and send user-sensitive data to a specific URL Three of the malicious packageincluded shell commands for both windows and UNIX OS

Table 1 A subset of combination signals quantified fromourpopular package analysis

Signal combination packagecount

Signal combination packagecount

W6 cap W3 capW4 32 W2 cap W3 86W6 cap W3 capW1 5 W6 cap W1 17W6 cap W3 capW2 38 W6 cap W2 187W3 cap W4 32 W6 cap W3 3356

After three months of our initial data collection we collecteda new security holding package list from npm to validate our re-sult Our findings aligned with the npm security specialists theyidentified 10 out of the 11 packages as malicious We reported theremaining malicious package to the npm security team

This analysis demonstrates the risk of having a dependency onpackages that include installation scripts

43 Case study 3 Data-Driven AttackerTo illustratewe used the sample of popular packages (14892) sampleFigure 1 shows an example of a data-driven attack that combinestwo of the signals Expired Maintainer Domain (W1) and Unmain-tained packages (W3) An attacker can scan the npm registry forpopular packages and download relevant metadata from the pack-agejson file Then an attacker can easily extract inactive packages(5645) and maintainer email addresses (1108) from package meta-data The next step would be looking for domain availability in thedomain registrar site (eg GoDaddy)

Figure 1 Data-driven Attack by combining W1 and W3

In this stage we have identified 15 domains available for sale inGodaddy An attacker can purchase these domains and alter theMX record to hijack the maintainerrsquos email addresses (section 321)In general npm requires an email address to set up an user accountattackers can reset npm accounts by using those 15 email addressesto access 899 npm packages

Another example of a data-driven attack would be looking foroverloaded (W6) inactivemaintainers (W3) in popular packages Anattacker can perform social engineering to take over these packagesWe have found 25 popular packages associated with 16 overloadedinactive maintainers Out of 25 for three packages (average packagedependents 117 and downloads 6906) the last update year was 2013

Table 1 shows a subset of other possible combinations and howthey would reduce a large number of 1M+ npm packages to a smallnumber of candidate packages for potential supply chain attacks

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

5 RQ2 SURVEYIn this section we discuss our RQ2How do practitioners perceivethe identified weak links in the npm supply chain To that end weconducted a survey on npm package maintainers We selected thetop 10 (47433) of the maintainers ranked by the number of ownedpackages as survey candidates We chose this selection criterionfor the following two reasons The maintainers are 1) experiencedwith JavaScript packages since they own many packages in npmand 2) part of a large supply chain as many packages use thesemaintainerrsquos packages as dependencies

Our survey is designed to capture practitionersrsquo views on ourproposed weak links and whether they want to be notified if any ofthe proposed weak links exist in their package dependency graph

Table 2 is a complete summary of our survey We attached theagreement and disagreement of practitioners regarding each weaklink signal in the second and third columns The last column pro-vides the percentage of the practitioner who wants to be notifiedabout such signals We observed that package maintainers sup-ported three of our proposed signals as a weak link and they wouldwant to be notified about these signals existence in the packagedependency graph

Agreement Out of six signals more than 50 of practitionerssupported W1 (Expired Maintainer Domain)W2 (Install Scrips) andW3 (Unmaintained Packages) as a weak link signal We observedthat practitioners indicated a desire to be notified of weak linksignals despite not supporting the weak link signal directly Asshown in Table 2 the percentage in the want to be notified columnexceeds the percentage of rdquoAgreed as a weak link for W1 W2 andW3

Disagreement Our survey findings show that practitioners didnot support W4 (Total Number of Maintainers) W5 (Maintainerto contributors ratio) and W6 (Packages per maintainer) as weaklink signals More than 40 of people disagreed with these signalswhereas less than 20 supported this as a weak link To understandthe context behind why practitioners think otherwise we analyzedpractitionerrsquos comments in the open-ended questions For examplendashldquoHistorically I would have agreed with ldquoToo many maintainersrdquo beinga risk but as long as they are known people to youI believe it tobe okrdquo or Many contributors is not a signal itrsquos a desired state ofopen source and if all are reviewed by a reliable maintainer they areno risk Both of the statements indirectly support our weak linksassumptions under certain condition like ldquoreliable maintainerrdquo orldquoas long as maintainer are known peoplerdquo Also practitioners raiseconcerns about these three signals being subjective and may varyfrom project to project Moreover we do not claim that having anyof these signals means a bad package instead our proposed metricsare guidelines for practitioners to make informed decisions aboutthe use of the package

New signals proposed by Maintainers We asked an open-ended question for the respondents to recommend additional signalsthat we should consider in future work Out of 470 practitioners wereceived 213 responses for new signals To label the new signals tworesearchers separately reviewed the 213 responses and comparedresults We included the new signal if the signal was raised by ldquoatleastrdquo two respondents In some cases the practitionerrsquos intent was

unclear and the comment was discarded We summarize the twomost frequently mentioned concepts

Maintainers 41 practitioners in some waymentioned maintain-ers being a risk We have identified the most frequent discussionon maintainers and propose the following four signals

bull Ownership transfer or adding new maintainers Anysudden change in a package maintainers list is proposed tobe a weak link from practitioners Practitioners would wantto know about such changes in the package dependencygraph if a package transfers the ownership or adds any newmaintainers

bull Maintainer Identity Practitioners commented on the roleof maintainer expertise and identity verification in the supplychain A maintainer with a real picture organizational back-ground and email address linked social media or repositoryhistory of co-authoring with other maintainers will makea maintainer reliable over any new maintainers Althoughnpm provides the list of packages owned by maintainersenforcing maintainers to add real identity or experience maybe a big security improvement in the community

bull MaintainerTwo-FactorAuthenticationAmaintainermiss-ing two factored authentication (2FA) for package hostingor releasing a new version or login to the npm account isa weak link 2FA authentication should be enforced for allmaintainers to publish a package

Integration of version control software 52 (244) of the re-sponses were related to version control software(VCS) packagerepository and npm integration

bull No source code repository When a package has no orwrong public source code repositoryhomepageVCS or thelinked repository is archived the access to review sourcecode is restricted forcing users to trust a package blindly

bull npm package vs source code repository The practition-ers raise the concern to validate the published npm packageagainst the code on the source code repository Hence allfiles inside a given package must match the exact contentsin the repository

bull CICD pipeline Missing CICD infrastructure to test codeand build of npm packages The practitioners also mentionedthat the type of CICD services matters Whether CICD ser-vice providers or self-hosted infrastructure the practitionersprefer details on testing code coverage or alerts on the useof compromised CICD systems from past security incidents

bull Open pull request A package with many open issues andpull requests (PR) indicates a poorly maintained packageOne can view if a package has open issues in npm onlinerepository However practitioners commented on addingsuch information in the package dependency graph

Since our analysis is limited to package metadata from npm we didnot consider any repository-related weak link signal in this studybut can be considered as a future research direction

6 LIMITATIONSWe proposed and studied several weak link signals inferred fromnpm metadata and which can be used to evaluate security risksassociated with npm packages However we do not claim that these

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

on 163 million npm packagesrsquo metadata to identify weak link sig-nals in the npm supply chain Our work addresses the followingresearch questions

bull RQ1 (Identification) What weak link signals can be iden-tified in the npm package metadata

bull RQ2 (Agreement) How do practitioners perceive the iden-tified weak link signals in the npm supply chain

This paper makes the following contributionsbull Six proposed weak link signals three of which are confirmedas strong signals from a survey completed by 470 npm pack-age maintainers

bull Eight new weak link signals suggested by survey partici-pants

bull A framework1 to collect categorize and analyze packagemetadata in the entire npm registry to identify weak linksignals

2 BACKGROUND AND RELATEDWORKA software supply chain attack is a cyber-attack that aims to infectorganizations and end-users by targeting less-secure componentsin the supply chain [15] The supply chain encompasses everythingthat goes into or affects code from development through CICDpipeline until production deployment Package registries like npmplay an essential role in automating the software supply chainUnfortunately increased automation comes with a higher securityrisk Supply chain attacks are considered critical because of the in-creasing reliance on third-party packages as a direct and transitivedependency A single package dramatically increases a systemrsquosattack surface due to the ldquonestedrdquo nature of dependencies [2] There-fore we need to predict and prevent such weak links before themalicious code is distributed in the supply chain Our paper aimsto raise awareness by quantifying weak link signals and discussingthe necessity of weak link identification in the supply chain Thissection discusses the terminology needed to understand our studyand then presents an overview of supply chain research

21 Key Terminology211 npm is a platform for publishing and hosting JavaScriptpackages which is the largest ecosystem to date npm has a CLItool for publishing and installing packages and separately it hasan online repository to host all the package and their metadataAll npm packages contain a file called packagejson - a centralplace to configure and describe how to interact with and run apackage and npm used this data to manage a package installationor handle the projectrsquos dependencies [22] This paper exclusivelystudies the packagejson file which provides visibility on the wholenpm supply chain

212 Primary Stakeholders Here we discussed the relevantstakeholder roles deeply connected with the software supply chainthat can benefit from our research Package Maintainers are re-sponsible for developing and maintaining packages They mayreceive and review pull requests from contributors and have writeaccess to make changes in different stages of package development

1httpsgithubcomnzahansnpm-supply-chain-weak-link

Package Contributors can view the source code and can sug-gest code changes and package maintainers approve their changesEcosystem Administrators manage the package registry frame-work and are responsible formaintaining thewhole software ecosys-tem

22 Attack VectorHere we overview the different attack vectors an attacker may useto introduce a supply chain attack 1)Malicious package releaseAn attacker may publish malicious packages and hence trick otherusers into installing or depending on such packages[4 14] 2) SocialEngineering An attacker may manipulate a maintainer to handover sensitive information [2] 3) Account Takeover An attackermay compromise the credentials of a maintainer to inject mali-cious code under the maintainerrsquos name 4) Ownership transferAn attacker can show enthusiasm to maintain popular abandonedpackages and transfer the ownership of a package [14] 5)Remoteexecution An attacker may target a package by compromisingthe third-party services or remote server used by that package

23 Research on Supply Chain WorkMany prior works have leveraged past supply chain records to showhow magnificent a supply chain attack can be Zimmermann et alstudy provide evidence that popular packages and highly activedevelopers in npm suffer from single points of failure and manyunmaintained packages in the npm repository which no longerreceive patches indirectly threaten the security of the npm supplychain Ohm et alrsquos study [14] investigated different supply chainattacks from 174 malicious packages and contributed an enricheddataset for future research on malicious package detection Duanet al [4] identified 339 malicious packages from their proposed un-supervised learning framework Gonzalez et al[17] used repositoryand commit metadata to detect malicious packages automatically

Although security researchers in academia and industry are ac-tively investigating attacks on registries and proposing solutionsthese approaches seem to be based on specific instances of mali-cious attacks They are especially effective to prevent maliciouscode distribution Recent attacks show evidence that out-of-the-box exploit strategies will appear again and again [10 23] Anyad-hoc solution is not enough to prevent an attack that we have notwitnessed yet A better approach is needed to embrace a proactivestrategy that predicts package susceptibility as a potential threatinstead of maliciousness and then adopt the best security practiceto stay ahead of the attackers

3 RQ1 WEAK LINK SIGNALIn this section we describe our process of identifying and quan-tifying weak link signals to answer RQ1 We define a signal asa weak link if the signal exposes a package to a higher riskof a supply chain attack If a bad actor targets the supply chainthey are going to follow the path of least resistance [24] for exam-ple finding weak links from publicly available package metadatawithout penetrating the whole source code In this study we fol-lowed an approach the attackers might take to find the weakest linkin the npm registry One of the critical challenges in identifyingthe weak link signals was selecting the suspicious candidate of

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

metadata for further investigations To overcome this we took ad-vantage of the authorrsquos domain expertise and prior reported supplychain attack patterns [4 14 25] Four authors are from industryand two are from academia Therefore we focused on metadataassociated with human involvement (eg maintainer contributorinformation) since many past attacks involved human actors as aweak link to the package [6 18ndash21] Then we selected functionalmetadata of a package (eg dependency dependents) to understandthe underlying reason that might lead to an attack attempt

31 Data Process311 Data Collection We captured a snapshot of 1630101 pack-agejson files on June 7 2021 through the npm public API2 EachJSON file contains metadata properties of an npm package Weused the package ID (package name and version together form aunique identifier called ldquoIDrdquo) as the primary reference while storingcorresponding metadata for each package As a root package IDmaps to all other metadata like dependency information packagelast modified time scripts versions license repository unpackedpackage size a total number of files maintainers and contribu-tors names and email addresses We used Python libraries (JSONNumPy pandas amp psycopg2) to extract all the relevant metadatafrom each packagejson file which we stored systematically in aPostgreSQL database Each packagejson file contained multipleversions history of package metadata against one unique ID In thisstudy we only used the most recent version of the metadata to an-alyze our weak link indicators Apart from metadata extraction weanalyzed the maintainer reach and package reach to understand theimpact of a package and its maintainers in the npm registry Belowwe explain how we queried and measured package and maintainerreachndashPackage Reach We measured the package reach to evaluate a pack-agersquos popularity in terms of dependants and downloads Hence weconsidered two metrics (1) the number of packages that depend ona package (dependants) which we computed from a set of all thepackages that have a direct dependency on a package and (2) thenumber of downloads of a package in the past 12 months which wecollected from public npm API3Maintainer Reach We measured the number of packages owned bya maintainer to evaluate the maintainer reach in npm We queriedagainst npm packages to compute maintainer reach a set of pack-ages where a maintainer is listed as a package maintainer

312 Exclusion Criteria We removed packages that might in-troduce noise (for example wrong ownership invalid metadataetc) In this study we removed a package if the package has nodependents and it meets with any one of the following IF condi-tions

First If a package is a security holding packageand is re-moved from the npm registry by the npm security team due tomalicious activities Hence the JSON file we received for such pack-ages consisted of dummy metadata filled by the npm team Weremoved 8344 packages as security holding packages The ldquodescrip-tionsrdquo and ldquodis-tagsrdquo property of the packagejson file define thesecurity holding status2httpsreplicatenpmjscom_all_docsinclude_docs=true3httpsapinpmjsorgdownloadspointperiod[package]

Second If a package is deprecatedmeaning the package is notactively maintained by the maintainers The packagejson file con-tains a separate property called ldquodeprecatedrdquo to indicate packagedeprecated status which is assigned by the npm administrators orthe maintainers themselves Though a deprecated package does notalways indicate an unusable package npm and maintainers recom-mend using alternative packages or versions While the deprecatedpackages are not actively maintained end-users might still usethem directly or transitively Hence we only removed deprecatedpackages in the recent version if no other packages in the npmregistry used them in their dependency tree We removed 37917deprecated packages that used by none

Third If a package hasno repository andno license A reposi-tory is essential to track organize and validate the source coderightsand a license allows the OSS community to reuse the code A pack-age without a license indicates that the authors retain all sourcecode [22] and no repository is attached to verify otherwise Wehave identified packages where both repository and license prop-erty was null or filled with invalid values (eg ldquoUNLICENSEDrdquoldquoXYZrdquo ldquopersonal userdquo etc) Hence we only removed the packagesfrom further considerations if they meets all three conditions 1)no dependent and 2) no license and 3) no repository We found89893 such packages and removed them from the database

In total we removed 9(135996) packages that fits into our pro-posed exclusion criteria and our final dataset contained 1494105packages which were used for further analysis

32 Weak Link SignalsWe briefly discuss our six weak link signals their attack modelsand specific data analysis in the following six subsections

321 W1 Expired Maintainer Domain An attacker can hi-jack a component if a maintainerrsquos domain is expired anddoes not have 2FA authentication set up on their account Ingeneral any domain name can be purchased from a domain regis-trar allowing the purchaser to connect to an email hosting serviceto get a personal email address An attacker can hijack a userrsquosdomain to take over an account associated with that email addressTypically a domain hijacking attack occurs by 1) gaining unau-thorized access to the registrar or 2) gaining access to the ownerrsquosemail address and then resetting the password Domain hijackingis not a new notion We have seen many attacks in the past suchas 119901119890119903119897 119888119900119898 hijacking [26] where the attacker changed the domainregistrar and renewed the domain expiration date until 2029 andthen changed the DNS address The time to discover the attack wasfour months even though 119901119890119903119897 119888119900119898 is a well-renowned site

In the npm registry an attacker can execute a more simplisticapproach to hijack an email address An attacker can track thedomain of a maintainer in the domain registrar site If the domain isexpired and available for sale the attacker can register and alter theDNS ldquomail exchangerdquo (MX) records to hijack the maintainerrsquos emailaddress In most cases maintainer accounts are associated withan email address in the package registry One could reset a npmaccount directly by email address unless the maintainer activated2FA authentication or used different email address in user account

Analysis of npm maintainers domain To collect the main-tainer email address we extracted and stored the maintainer name

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

and email address from each packagejson file We then queriedand split the domain name from each email address and countedthe number of times a domain is used across all npm packages Outof 93K domains in the 163M npm packages 86 were unique (ap-peared once) while the rest were from the public or organizationaldomain We hypothesize that all maintainer domains in the npmregistry are up-to-date and none of them are available for sale Tothat end we performed a bulk query of 93K domains in Godaddy adomain registrar site We found 5346 domains are available for saleWe picked a random sample of 50 domains to determine the truepositive rate and checked each domain independently in GodaddyWe found 33 of them are not for sale Due to the high false-positiverate we manually verified 5346 domains in Godaddy

We found 2818 maintainerrsquos domains are available for saleand can be purchased These maintainers own 8494 pack-ages in the npm registry with average direct dependents of243 packages and average downloads of 53K in the past 12months

We reported our findings to the npm security team to validatethe email address for npm user accounts We note that our bulkquery may have a high false-negative rate since we found 47 falsepositive from the 5346 domain Hencenpm may have more than2818 maintainers associated with expired domains

322 W2 Installation Script An attacker can use installa-tion scripts to run commands that perform malicious actsthrough the package installation step [27] Install scripts runautomatically either before during or after package installationwhen certain events are triggered These scripts are used to makethe installation process easy since they are automatically run bynpm However for an attacker such scripts create opportunitieswhere ldquosky is the limitrdquo The attacker could steal user-sensitive dataor execute a new child process to create backdoor access or gainaccess to execute a series of commands remotely [4 5 7ndash9 25]Alternatively the attacker can infiltrate the third-party dependencesince that installation script will run automatically by the targetedpackage and its users during installation The CCleaner [28] So-larwinds [12] NotPetya [29] and Adverline [30] data breachesare examples of such attacks where attackers targeted the remoteserver third-party vendors to execute a large supply chain attackThough the presence of installation scripts themselves are not a di-rect indication of maliciousness the privilege to run automaticallymakes installation scripts a weak link signal in the supply chainAs best practices even npm registry recommends avoiding installscript- ldquoDonrsquot use installYou should almost never have to explicitlyset a preinstall or install scriptrdquo4While evidence of such an attackthrough the installations script is not rare [4 5 7ndash9 23 25 28ndash30]similar or perhaps even worse attacks may happen in the future

Analysis of npm packages To collect script details we ex-tracted and stored the script key which is the scriptrsquos name (egpreinstall)and the script value which contains the script pathshell commandsfrom the packagejson file Then we query and separate all thepackages that have script keys like ldquoInstallrdquo

4httpsdocsnpmjscomcliv7using-npmscripts

we found 22 (33249) of packages use install scripts indicat-ing that 978 of packages may follow npm recommendationof not using the install script as best security practices

Additionally we collected 3635 malicious packagesjson filesfrom npm They are similar to the security holding packagejson file(see Section 312) with all dummy metadata The only differenceis that these JSON files have actual malicious script key and valuepair for example- the original directory of malicious code files ormalicious shell script embedded in JSON files with other dummymetadata To analyze the malicious JSON file two researchers sepa-rately reviewed these scripts and compared results for verificationSince the scope of the project was limited to packagejson files onlywe were only able to analyze 419 (out of 3635) packages wheremalicious shell command was embedded directly in the JSON fileFrom our analysis we found three types of attack patterns thatattackers use frequently

bull Transfer Users data to third party server (eg- hostnameetcshadow etcpasswdhomeltusergtssh ) We found 350packages that communicate and transfer data with thirdparty server

bull Download malicious tool and run it to user machine Forexample- download a crypto miner software and run it onuser machine We found 82 packages

bull Spawning new process to interact with operating-systemprocesses particularly the native module child_process (eg-use of reverse shell) We found 212 packages that start a newprocess and transfer data to third party server

We found 939 (3412) of unique malicious packages useinstall scripts indicating that malicious attackers use installscripts frequently

323 W3UnmaintainedPackage Attackers can target pack-ages that are more likely to take over and sneak in malwaredue to lack of maintenance Differentiating between unmain-tained and feature-complete packages that require no further re-leases is complex and ambiguous [31] Even if the package maynot require any maintenance by itself it may need maintenancedue to security issues in its dependencies or use new syntax toimprove performance bug fixing amp documentation improvementsIn 2020 the average time to remediate security issues was 68 daysin open source projects [32] and 66 of security vulnerabilities innpm packages remain unpatched [33] Hence the time requiredto remediate a security issue in unmaintained packages remainsunknown We considered unmaintained packages as a weak linksignal from two directions 1) Inactive packages and 2) Inactivemaintainers

Inactive PackageWe considered a package inactive if the pack-agersquos last modification time in the packagejson file is past two yearsOther prior work [31 33] has defined inactivity as one year gapHowever many npm packages have low complexity with a fewlines of code and may not require any recent update Thus weconsider two years as an inactivity period

An attacker can exploit vulnerabilities in all applications thatdirectly or transitively depend on vulnerable code as long as the

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

vulnerable dependency or the package itself remains unfixed [2]The response time to fix such issues is undetermined for inactivepackages and end-usersmay remain infected due to unawareness ofthe security threat A deprecated package (see Section 312) exposeseven higher security risk since the maintainers no longer maintainthe package to fix security issues An attacker can inject malwarein widely used but unmaintained packages A prominent exampleof such an attack is the ldquoMailparserrdquo attack [34] an old deprecatedpackage An attacker indirectly reaches the ldquoMailParserrdquo through arelatively new package named ldquogetcookiesrdquo which had indirectlybeen made into the nested dependency chain of MailParser

Inactive MaintainersWe defined an inactive maintainer if themaintainer had no active packages in the past two yearsAn attackercan target packages with inactive maintainer(s) because any attackwill remain undetected due to the inactivity of maintainers There-fore distinguishing between active and inactive maintainers is nec-essary to identify the packages that require maintenanceAlthoughinactive maintainerrsquos packages are a subset of inactive packageswe must identify them separately Because an inactive packagemay have maintainers who are active in other packages whereasinactive maintainers indicate the maintainer is inactive in the entirenpm registry

Analysis of npmpackagesWe extracted and stored the ldquotimerdquoproperty of the packagejson file to measure the number of pack-ages that have been inactive for the past two years We identifiedinactive maintainers by evaluating the last modified time proper-ties for all packages corresponding to an individual maintainer Apackage where none of the maintainers are active elsewhere inthe entire package registry is determined as inactive maintainersof unmaintained packages We also considered deprecated pack-ages as unmaintained since they are unmaintained officially by themaintainer We separated the deprecated package where the lastmodification time passed our threshold value because the depreca-tion was declared later

We found 587 of packages and 443 of maintainers areinactive in the npm registry There are 5532 additional dep-recated packages where the deprecation date passed ourthreshold value

The packagejson does not provide individual maintainer activi-ties history Hence distinguishing inactive maintainers from activepackages was not plausible

324 W4 Too many Maintainers An attacker can target apackage in a dependency chain to exploit the possible over-sight due to too many maintainers in chargeWhen a projecthas many people it lacks ownership there may not be proper vet-ting in code changes because all of them are owners and no one isresponsible Alternatively many people in a team may lack appro-priate communication which increases the chance of oversight inmaintainer activities Bad actors may hide their identity by com-promising one maintainer profile or performing social engineeringto access the package

Our hypothesis is supported by previous work of Meneely etal [35] where they empirically showed projects with more develop-ers has more vulnerabilities ndashno one was ldquoin chargerdquo of the security

of the project A recent attack on php-src [36] supports the bene-fits of better communication between maintainers The attack wasidentified within hours because attackers tried to impersonate twoprimary PHP maintainers One of them spotted the unusual commitimmediately and reverted the commit An opposite plot of suchan attack could be if the impersonated maintainers were not theprimary maintainers and did not have proper communication withother team members The unusual commit may have taken days tobe discovered

Analysis of npmpackages Though the exact number of main-tainers varies from project to project in case of npm the numberof maintainers is more likely to be less because npm is buildingupon reusing small JavaScript libraries 15 million npm packageshave an average cost of 17 maintainers which makes sense assmall packages may not require many maintainers We extractedand stored the list of maintainers corresponding to each packagein a SQL database and then ranked them by the total number ofmaintainers in the package As discussed the entire npm had anaverage of 17 maintainers analyzing the whole data set is not prac-tical to identify the unusual package with too many maintainersTherefore we picked the top 1 (14941) packages in npm rankedby the total number of maintainers

We found that our selected 1 packages had an average of324 maintainers per package which was 19 times more thanaverage package maintainers in the entire registry

Our concern was also addressed by Zimmermann et al [2] wherethey suggested that the value of over 20 maintainers in a packageis questionable

325 W5 Too many contributors An attacker can sneakin malicious code bypassing the maintainerrsquos radar when amaintainer is responsible for many contributorsMany priorresearch shows that when multiple contributors change a file thefile is more likely to have more failures [35 37ndash39] which mayinclude security issues These observations motivated our nextsignal A maintainer-to-contributors ratio or increased number ofcontributors increases the security risks

Contributors may vary in knowledge skill and experience Pack-age quality will inevitably suffer if the maintainers do not payenough attention to review the pull request from contributors Es-pecially where an average of 17 maintainers maintain JavaScriptpackages many contributors bring additional responsibility formaintainers in product functionality or security If the maintainersdo not pay enough attention contributors with malicious purposescan include potential backdoors into code and malicious code willmerge An attacker can target packages with many contributorswhere the attacker can do social engineering to become a trustedcontributor and make some minor contribution to gain trust andthen sneak in malicious code

Analysis of npm packagesWe extracted and stored the con-tributor(s) list from packagejson files We measured the ratio be-tween the total number of maintainers to the total number of con-tributors of corresponding packages where a higher ratio indicateseach maintainer is responsible for fewer contributors Out of 15million npm packages only 26 (38913) of the packages have listedcontributors and the average maintainer to contributors ratio is

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

32 Since JavaScript packages are small it makes sense that theymay not need many contributors in one package To understandthe extreme cases where package maintainers added many contrib-utors we picked the top 1 packages in npm where the maintainerto contributor ratio was minimum

We found that the selected 1 (389) packages had an averagemaintainer to contributorrsquos ratio of 140

326 W6 Overloaded Maintainer An attacker may targeta maintainer who owns many packages because the main-tainer may not have enough time to maintain security of allthe packages Overloaded maintainers are weak links if the main-tainers 1) have a large number of upstream packages 2) use a largedependency chain in packages attackers may inject malware intheir dependency or 3) if they have many inactive packages anattacker may try to take over those packages A well-known attackon an overloading maintainer is event-stream attack [6] where anoriginal maintainer handed over a popular npm package ownershipto a malicious maintainer simply because the package was inactiveand the original maintainer does not need that package anymoreWe queried the same maintainer in our database We found that heis an overloaded maintainer and he owns 62 inactive packageswith more than 30k direct dependents Although one may arguethat a maintainer who owns many packages may consider as a signof stability over any new maintainer while we agree with the state-ment we want to include such a maintainer to the supply chainadministratorrsquos security radar Attackers are more likely to targetsuch maintainers if the maintainer is overloaded and does not haveenough time to maintain all of his packages equally We propose thesupply chain administrator should follow up the security measureof overloading maintainers such as minimum package dependencyownership transfer of inactive packages two-factor authorizationset up on their account active domain registration

Analysis of overloaded maintainersWe analyzed the over-loaded maintainer using our maintainer reach metric (section31)We found that 482 maintainers own more than one packagewhereas only 248 have downstream users We again picked thetop 1(4743) maintainers ranked by higher maintainer reach tounderstand the extreme cases

We found that the top 1 maintainers own an average num-ber of 1803 packages with direct dependents of 4010 averagepackages

We analyzed further to understand risk factors associated withinactive packages and downstream dependencies We found that30 (1442) of maintainers do not hold any inactive packages how-ever 70 shows otherwise In terms of package dependency wefound that 80 of packages have dependencies which indicatesthat overloading maintainers are responsible for their dependencychain security to protect the downstream users

4 CASE STUDYThis section provides three case studies of our proposed weaklink signals First we present our analysis in popular packages Wehypothesize that the popular package maintainers are aware of such

weak links and will avoid them to protect the downstream userrsquossecurity Second we present a case study on malicious packagesidentified by our signals and at the end we present an example ofa data-driven attack

41 Case study 1 Popular packagesConsidering 650 growth in supply chain attacks as an indica-tion [3] attackers will likely continue to target popular packagesand maintainers as a preferred path to exploiting downstream vic-tims at scale Hence we analyzed whether weak link signals existin popular packages

To measure package popularity we used the package reach met-rics from Section 31 We picked the top 10000 packages from the(1) package dependents and (2) package downloads analysis andcombined the two lists removing duplicates into a combined pop-ular package sample The sample included 14892 packages (1 oftotal packages) as popular with an average of 9374 dependents and885 million downloads in 12 months We note that the popularpackage analysis does not consider transitive dependents Hencethe impact of a weak link in a popular package may present a highersupply chain risk than presented

Expired Domain (W1) Among the popular packages 33 pack-ages (average direct dependents 3829 and average downloads 111million) have at least one maintainer with an expired domain Theseaccounts can be used to compromise the package unless two-factorauthentication is enabled

W1 12637 packages that depend on popular packages wereexposed to higher supply chain risk due to expired domains

Install Scripts (W2) Among the popular packages 362 pack-ages (average direct dependents 14163 and downloads average346 million) have install scripts This is an encouraging resultshowing that 975 of popular packages are aware of the risks andavoid using install scripts

W2 362 popular packages had install scripts and exposed1416 packages on average to attacks through install scripts

Unmaintained Package (W3) Of the popular packages 38(5645) packages (average direct dependents 4224 and downloadsaverage 761 million) had an inactive status Interestingly 560 ofthem (average direct downstream dependents 13693 and down-loads average 328 million) were deprecated Deprecated packages(Section 312) are a classic example of unmaintained packageswhere maintainers officially declare the package unmaintainedWe also found 619 inactive maintainers who still own 645 popularpackages

W3 38 of popular packages were inactive and 560 of themwere deprecated 645 popular packages did not have any ac-tive maintainers An inactive package exposed 422 packageson average to higher supply chain risk

Too many Maintainers (W4) Among the popular packages421 packages (average direct dependents 2692 and downloadsaverage 416 million) had an average of 346 maintainers

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Large packages may need more maintainers Hence we hypoth-esize that 421 popular packages are large compared to the otherpopular packages To test the hypothesis we extracted and storedthe ldquounpacked sizerdquo and ldquofile countrdquo property to measure the pack-age size We created two samples (1) 14471 popular packageswithout 421 packages (average of 54 files 6968 MB unpacked sizeand 24 maintainers) and 2) 14892 popular packages (average of55 files 6978 MB unpacked size and 34 maintainers) We foundthe 421 packages added an average cost of one maintainer withoutany significant difference between package size and file countsHence having too many maintainers in a JavaScript package doesnot necessarily indicate packages are large

W4 421 popular packages had an average of 346 maintainersand potentially exposed their dependents to higher supplychain risk

Too many contributors (W5) Among the popular packages23 packages (average direct dependents 645804 and downloadsaverage 642 million) had an average maintainer to contributorrsquosratio of 137 Therefore on average one maintainer has to manage37 contributorsrsquo to secure packages against malicious code andmalicious contributors

W5 23 popular packages are exposed to higher supply chainrisk due to maintainers having to manage an average of 37contributors

Overloaded Maintainers (W6) Among the popular packages9871 (663) are owned by popularmaintainers whomanagemanypackages (average direct dependents 103902 and downloads aver-age 1103 million) Attackers can try to compromise these main-tainers to exploit downstream victims at scale

W6 2491 popular maintainers own 9871 popular packages

42 Case study 2 Installation scriptsThe scope of this work is to analyze package metadata to identifyweak links that lead to possible supply chain attacks

Even though the focus was not on detecting malicious packagesour analysis of installation scripts revealedmalicious activity withinpackages Some package have shell commands embedded directlyin the packagejson file We investigated the shell to verify whetherthey could help detect malicious packages We queried keywordsrelated to read or write user-sensitive information from the filesystem (eg etcpasswd etcshadow) or HTTP requests to transferuser data to a remote server (eg curl wget) Our keyword searchis motivated by Duan et al [4] who proposed a framework andterminologies derived from existing supply chain attacks

We identified 74 packages where the installation scripts includedkeywords like curl etcshadow etcpasswd hostname Of those11 were found to be malicious packages and rest were benign Allthe malicious packages performed DNS lookups and send user-sensitive data to a specific URL Three of the malicious packageincluded shell commands for both windows and UNIX OS

Table 1 A subset of combination signals quantified fromourpopular package analysis

Signal combination packagecount

Signal combination packagecount

W6 cap W3 capW4 32 W2 cap W3 86W6 cap W3 capW1 5 W6 cap W1 17W6 cap W3 capW2 38 W6 cap W2 187W3 cap W4 32 W6 cap W3 3356

After three months of our initial data collection we collecteda new security holding package list from npm to validate our re-sult Our findings aligned with the npm security specialists theyidentified 10 out of the 11 packages as malicious We reported theremaining malicious package to the npm security team

This analysis demonstrates the risk of having a dependency onpackages that include installation scripts

43 Case study 3 Data-Driven AttackerTo illustratewe used the sample of popular packages (14892) sampleFigure 1 shows an example of a data-driven attack that combinestwo of the signals Expired Maintainer Domain (W1) and Unmain-tained packages (W3) An attacker can scan the npm registry forpopular packages and download relevant metadata from the pack-agejson file Then an attacker can easily extract inactive packages(5645) and maintainer email addresses (1108) from package meta-data The next step would be looking for domain availability in thedomain registrar site (eg GoDaddy)

Figure 1 Data-driven Attack by combining W1 and W3

In this stage we have identified 15 domains available for sale inGodaddy An attacker can purchase these domains and alter theMX record to hijack the maintainerrsquos email addresses (section 321)In general npm requires an email address to set up an user accountattackers can reset npm accounts by using those 15 email addressesto access 899 npm packages

Another example of a data-driven attack would be looking foroverloaded (W6) inactivemaintainers (W3) in popular packages Anattacker can perform social engineering to take over these packagesWe have found 25 popular packages associated with 16 overloadedinactive maintainers Out of 25 for three packages (average packagedependents 117 and downloads 6906) the last update year was 2013

Table 1 shows a subset of other possible combinations and howthey would reduce a large number of 1M+ npm packages to a smallnumber of candidate packages for potential supply chain attacks

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

5 RQ2 SURVEYIn this section we discuss our RQ2How do practitioners perceivethe identified weak links in the npm supply chain To that end weconducted a survey on npm package maintainers We selected thetop 10 (47433) of the maintainers ranked by the number of ownedpackages as survey candidates We chose this selection criterionfor the following two reasons The maintainers are 1) experiencedwith JavaScript packages since they own many packages in npmand 2) part of a large supply chain as many packages use thesemaintainerrsquos packages as dependencies

Our survey is designed to capture practitionersrsquo views on ourproposed weak links and whether they want to be notified if any ofthe proposed weak links exist in their package dependency graph

Table 2 is a complete summary of our survey We attached theagreement and disagreement of practitioners regarding each weaklink signal in the second and third columns The last column pro-vides the percentage of the practitioner who wants to be notifiedabout such signals We observed that package maintainers sup-ported three of our proposed signals as a weak link and they wouldwant to be notified about these signals existence in the packagedependency graph

Agreement Out of six signals more than 50 of practitionerssupported W1 (Expired Maintainer Domain)W2 (Install Scrips) andW3 (Unmaintained Packages) as a weak link signal We observedthat practitioners indicated a desire to be notified of weak linksignals despite not supporting the weak link signal directly Asshown in Table 2 the percentage in the want to be notified columnexceeds the percentage of rdquoAgreed as a weak link for W1 W2 andW3

Disagreement Our survey findings show that practitioners didnot support W4 (Total Number of Maintainers) W5 (Maintainerto contributors ratio) and W6 (Packages per maintainer) as weaklink signals More than 40 of people disagreed with these signalswhereas less than 20 supported this as a weak link To understandthe context behind why practitioners think otherwise we analyzedpractitionerrsquos comments in the open-ended questions For examplendashldquoHistorically I would have agreed with ldquoToo many maintainersrdquo beinga risk but as long as they are known people to youI believe it tobe okrdquo or Many contributors is not a signal itrsquos a desired state ofopen source and if all are reviewed by a reliable maintainer they areno risk Both of the statements indirectly support our weak linksassumptions under certain condition like ldquoreliable maintainerrdquo orldquoas long as maintainer are known peoplerdquo Also practitioners raiseconcerns about these three signals being subjective and may varyfrom project to project Moreover we do not claim that having anyof these signals means a bad package instead our proposed metricsare guidelines for practitioners to make informed decisions aboutthe use of the package

New signals proposed by Maintainers We asked an open-ended question for the respondents to recommend additional signalsthat we should consider in future work Out of 470 practitioners wereceived 213 responses for new signals To label the new signals tworesearchers separately reviewed the 213 responses and comparedresults We included the new signal if the signal was raised by ldquoatleastrdquo two respondents In some cases the practitionerrsquos intent was

unclear and the comment was discarded We summarize the twomost frequently mentioned concepts

Maintainers 41 practitioners in some waymentioned maintain-ers being a risk We have identified the most frequent discussionon maintainers and propose the following four signals

bull Ownership transfer or adding new maintainers Anysudden change in a package maintainers list is proposed tobe a weak link from practitioners Practitioners would wantto know about such changes in the package dependencygraph if a package transfers the ownership or adds any newmaintainers

bull Maintainer Identity Practitioners commented on the roleof maintainer expertise and identity verification in the supplychain A maintainer with a real picture organizational back-ground and email address linked social media or repositoryhistory of co-authoring with other maintainers will makea maintainer reliable over any new maintainers Althoughnpm provides the list of packages owned by maintainersenforcing maintainers to add real identity or experience maybe a big security improvement in the community

bull MaintainerTwo-FactorAuthenticationAmaintainermiss-ing two factored authentication (2FA) for package hostingor releasing a new version or login to the npm account isa weak link 2FA authentication should be enforced for allmaintainers to publish a package

Integration of version control software 52 (244) of the re-sponses were related to version control software(VCS) packagerepository and npm integration

bull No source code repository When a package has no orwrong public source code repositoryhomepageVCS or thelinked repository is archived the access to review sourcecode is restricted forcing users to trust a package blindly

bull npm package vs source code repository The practition-ers raise the concern to validate the published npm packageagainst the code on the source code repository Hence allfiles inside a given package must match the exact contentsin the repository

bull CICD pipeline Missing CICD infrastructure to test codeand build of npm packages The practitioners also mentionedthat the type of CICD services matters Whether CICD ser-vice providers or self-hosted infrastructure the practitionersprefer details on testing code coverage or alerts on the useof compromised CICD systems from past security incidents

bull Open pull request A package with many open issues andpull requests (PR) indicates a poorly maintained packageOne can view if a package has open issues in npm onlinerepository However practitioners commented on addingsuch information in the package dependency graph

Since our analysis is limited to package metadata from npm we didnot consider any repository-related weak link signal in this studybut can be considered as a future research direction

6 LIMITATIONSWe proposed and studied several weak link signals inferred fromnpm metadata and which can be used to evaluate security risksassociated with npm packages However we do not claim that these

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

metadata for further investigations To overcome this we took ad-vantage of the authorrsquos domain expertise and prior reported supplychain attack patterns [4 14 25] Four authors are from industryand two are from academia Therefore we focused on metadataassociated with human involvement (eg maintainer contributorinformation) since many past attacks involved human actors as aweak link to the package [6 18ndash21] Then we selected functionalmetadata of a package (eg dependency dependents) to understandthe underlying reason that might lead to an attack attempt

31 Data Process311 Data Collection We captured a snapshot of 1630101 pack-agejson files on June 7 2021 through the npm public API2 EachJSON file contains metadata properties of an npm package Weused the package ID (package name and version together form aunique identifier called ldquoIDrdquo) as the primary reference while storingcorresponding metadata for each package As a root package IDmaps to all other metadata like dependency information packagelast modified time scripts versions license repository unpackedpackage size a total number of files maintainers and contribu-tors names and email addresses We used Python libraries (JSONNumPy pandas amp psycopg2) to extract all the relevant metadatafrom each packagejson file which we stored systematically in aPostgreSQL database Each packagejson file contained multipleversions history of package metadata against one unique ID In thisstudy we only used the most recent version of the metadata to an-alyze our weak link indicators Apart from metadata extraction weanalyzed the maintainer reach and package reach to understand theimpact of a package and its maintainers in the npm registry Belowwe explain how we queried and measured package and maintainerreachndashPackage Reach We measured the package reach to evaluate a pack-agersquos popularity in terms of dependants and downloads Hence weconsidered two metrics (1) the number of packages that depend ona package (dependants) which we computed from a set of all thepackages that have a direct dependency on a package and (2) thenumber of downloads of a package in the past 12 months which wecollected from public npm API3Maintainer Reach We measured the number of packages owned bya maintainer to evaluate the maintainer reach in npm We queriedagainst npm packages to compute maintainer reach a set of pack-ages where a maintainer is listed as a package maintainer

312 Exclusion Criteria We removed packages that might in-troduce noise (for example wrong ownership invalid metadataetc) In this study we removed a package if the package has nodependents and it meets with any one of the following IF condi-tions

First If a package is a security holding packageand is re-moved from the npm registry by the npm security team due tomalicious activities Hence the JSON file we received for such pack-ages consisted of dummy metadata filled by the npm team Weremoved 8344 packages as security holding packages The ldquodescrip-tionsrdquo and ldquodis-tagsrdquo property of the packagejson file define thesecurity holding status2httpsreplicatenpmjscom_all_docsinclude_docs=true3httpsapinpmjsorgdownloadspointperiod[package]

Second If a package is deprecatedmeaning the package is notactively maintained by the maintainers The packagejson file con-tains a separate property called ldquodeprecatedrdquo to indicate packagedeprecated status which is assigned by the npm administrators orthe maintainers themselves Though a deprecated package does notalways indicate an unusable package npm and maintainers recom-mend using alternative packages or versions While the deprecatedpackages are not actively maintained end-users might still usethem directly or transitively Hence we only removed deprecatedpackages in the recent version if no other packages in the npmregistry used them in their dependency tree We removed 37917deprecated packages that used by none

Third If a package hasno repository andno license A reposi-tory is essential to track organize and validate the source coderightsand a license allows the OSS community to reuse the code A pack-age without a license indicates that the authors retain all sourcecode [22] and no repository is attached to verify otherwise Wehave identified packages where both repository and license prop-erty was null or filled with invalid values (eg ldquoUNLICENSEDrdquoldquoXYZrdquo ldquopersonal userdquo etc) Hence we only removed the packagesfrom further considerations if they meets all three conditions 1)no dependent and 2) no license and 3) no repository We found89893 such packages and removed them from the database

In total we removed 9(135996) packages that fits into our pro-posed exclusion criteria and our final dataset contained 1494105packages which were used for further analysis

32 Weak Link SignalsWe briefly discuss our six weak link signals their attack modelsand specific data analysis in the following six subsections

321 W1 Expired Maintainer Domain An attacker can hi-jack a component if a maintainerrsquos domain is expired anddoes not have 2FA authentication set up on their account Ingeneral any domain name can be purchased from a domain regis-trar allowing the purchaser to connect to an email hosting serviceto get a personal email address An attacker can hijack a userrsquosdomain to take over an account associated with that email addressTypically a domain hijacking attack occurs by 1) gaining unau-thorized access to the registrar or 2) gaining access to the ownerrsquosemail address and then resetting the password Domain hijackingis not a new notion We have seen many attacks in the past suchas 119901119890119903119897 119888119900119898 hijacking [26] where the attacker changed the domainregistrar and renewed the domain expiration date until 2029 andthen changed the DNS address The time to discover the attack wasfour months even though 119901119890119903119897 119888119900119898 is a well-renowned site

In the npm registry an attacker can execute a more simplisticapproach to hijack an email address An attacker can track thedomain of a maintainer in the domain registrar site If the domain isexpired and available for sale the attacker can register and alter theDNS ldquomail exchangerdquo (MX) records to hijack the maintainerrsquos emailaddress In most cases maintainer accounts are associated withan email address in the package registry One could reset a npmaccount directly by email address unless the maintainer activated2FA authentication or used different email address in user account

Analysis of npm maintainers domain To collect the main-tainer email address we extracted and stored the maintainer name

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

and email address from each packagejson file We then queriedand split the domain name from each email address and countedthe number of times a domain is used across all npm packages Outof 93K domains in the 163M npm packages 86 were unique (ap-peared once) while the rest were from the public or organizationaldomain We hypothesize that all maintainer domains in the npmregistry are up-to-date and none of them are available for sale Tothat end we performed a bulk query of 93K domains in Godaddy adomain registrar site We found 5346 domains are available for saleWe picked a random sample of 50 domains to determine the truepositive rate and checked each domain independently in GodaddyWe found 33 of them are not for sale Due to the high false-positiverate we manually verified 5346 domains in Godaddy

We found 2818 maintainerrsquos domains are available for saleand can be purchased These maintainers own 8494 pack-ages in the npm registry with average direct dependents of243 packages and average downloads of 53K in the past 12months

We reported our findings to the npm security team to validatethe email address for npm user accounts We note that our bulkquery may have a high false-negative rate since we found 47 falsepositive from the 5346 domain Hencenpm may have more than2818 maintainers associated with expired domains

322 W2 Installation Script An attacker can use installa-tion scripts to run commands that perform malicious actsthrough the package installation step [27] Install scripts runautomatically either before during or after package installationwhen certain events are triggered These scripts are used to makethe installation process easy since they are automatically run bynpm However for an attacker such scripts create opportunitieswhere ldquosky is the limitrdquo The attacker could steal user-sensitive dataor execute a new child process to create backdoor access or gainaccess to execute a series of commands remotely [4 5 7ndash9 25]Alternatively the attacker can infiltrate the third-party dependencesince that installation script will run automatically by the targetedpackage and its users during installation The CCleaner [28] So-larwinds [12] NotPetya [29] and Adverline [30] data breachesare examples of such attacks where attackers targeted the remoteserver third-party vendors to execute a large supply chain attackThough the presence of installation scripts themselves are not a di-rect indication of maliciousness the privilege to run automaticallymakes installation scripts a weak link signal in the supply chainAs best practices even npm registry recommends avoiding installscript- ldquoDonrsquot use installYou should almost never have to explicitlyset a preinstall or install scriptrdquo4While evidence of such an attackthrough the installations script is not rare [4 5 7ndash9 23 25 28ndash30]similar or perhaps even worse attacks may happen in the future

Analysis of npm packages To collect script details we ex-tracted and stored the script key which is the scriptrsquos name (egpreinstall)and the script value which contains the script pathshell commandsfrom the packagejson file Then we query and separate all thepackages that have script keys like ldquoInstallrdquo

4httpsdocsnpmjscomcliv7using-npmscripts

we found 22 (33249) of packages use install scripts indicat-ing that 978 of packages may follow npm recommendationof not using the install script as best security practices

Additionally we collected 3635 malicious packagesjson filesfrom npm They are similar to the security holding packagejson file(see Section 312) with all dummy metadata The only differenceis that these JSON files have actual malicious script key and valuepair for example- the original directory of malicious code files ormalicious shell script embedded in JSON files with other dummymetadata To analyze the malicious JSON file two researchers sepa-rately reviewed these scripts and compared results for verificationSince the scope of the project was limited to packagejson files onlywe were only able to analyze 419 (out of 3635) packages wheremalicious shell command was embedded directly in the JSON fileFrom our analysis we found three types of attack patterns thatattackers use frequently

bull Transfer Users data to third party server (eg- hostnameetcshadow etcpasswdhomeltusergtssh ) We found 350packages that communicate and transfer data with thirdparty server

bull Download malicious tool and run it to user machine Forexample- download a crypto miner software and run it onuser machine We found 82 packages

bull Spawning new process to interact with operating-systemprocesses particularly the native module child_process (eg-use of reverse shell) We found 212 packages that start a newprocess and transfer data to third party server

We found 939 (3412) of unique malicious packages useinstall scripts indicating that malicious attackers use installscripts frequently

323 W3UnmaintainedPackage Attackers can target pack-ages that are more likely to take over and sneak in malwaredue to lack of maintenance Differentiating between unmain-tained and feature-complete packages that require no further re-leases is complex and ambiguous [31] Even if the package maynot require any maintenance by itself it may need maintenancedue to security issues in its dependencies or use new syntax toimprove performance bug fixing amp documentation improvementsIn 2020 the average time to remediate security issues was 68 daysin open source projects [32] and 66 of security vulnerabilities innpm packages remain unpatched [33] Hence the time requiredto remediate a security issue in unmaintained packages remainsunknown We considered unmaintained packages as a weak linksignal from two directions 1) Inactive packages and 2) Inactivemaintainers

Inactive PackageWe considered a package inactive if the pack-agersquos last modification time in the packagejson file is past two yearsOther prior work [31 33] has defined inactivity as one year gapHowever many npm packages have low complexity with a fewlines of code and may not require any recent update Thus weconsider two years as an inactivity period

An attacker can exploit vulnerabilities in all applications thatdirectly or transitively depend on vulnerable code as long as the

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

vulnerable dependency or the package itself remains unfixed [2]The response time to fix such issues is undetermined for inactivepackages and end-usersmay remain infected due to unawareness ofthe security threat A deprecated package (see Section 312) exposeseven higher security risk since the maintainers no longer maintainthe package to fix security issues An attacker can inject malwarein widely used but unmaintained packages A prominent exampleof such an attack is the ldquoMailparserrdquo attack [34] an old deprecatedpackage An attacker indirectly reaches the ldquoMailParserrdquo through arelatively new package named ldquogetcookiesrdquo which had indirectlybeen made into the nested dependency chain of MailParser

Inactive MaintainersWe defined an inactive maintainer if themaintainer had no active packages in the past two yearsAn attackercan target packages with inactive maintainer(s) because any attackwill remain undetected due to the inactivity of maintainers There-fore distinguishing between active and inactive maintainers is nec-essary to identify the packages that require maintenanceAlthoughinactive maintainerrsquos packages are a subset of inactive packageswe must identify them separately Because an inactive packagemay have maintainers who are active in other packages whereasinactive maintainers indicate the maintainer is inactive in the entirenpm registry

Analysis of npmpackagesWe extracted and stored the ldquotimerdquoproperty of the packagejson file to measure the number of pack-ages that have been inactive for the past two years We identifiedinactive maintainers by evaluating the last modified time proper-ties for all packages corresponding to an individual maintainer Apackage where none of the maintainers are active elsewhere inthe entire package registry is determined as inactive maintainersof unmaintained packages We also considered deprecated pack-ages as unmaintained since they are unmaintained officially by themaintainer We separated the deprecated package where the lastmodification time passed our threshold value because the depreca-tion was declared later

We found 587 of packages and 443 of maintainers areinactive in the npm registry There are 5532 additional dep-recated packages where the deprecation date passed ourthreshold value

The packagejson does not provide individual maintainer activi-ties history Hence distinguishing inactive maintainers from activepackages was not plausible

324 W4 Too many Maintainers An attacker can target apackage in a dependency chain to exploit the possible over-sight due to too many maintainers in chargeWhen a projecthas many people it lacks ownership there may not be proper vet-ting in code changes because all of them are owners and no one isresponsible Alternatively many people in a team may lack appro-priate communication which increases the chance of oversight inmaintainer activities Bad actors may hide their identity by com-promising one maintainer profile or performing social engineeringto access the package

Our hypothesis is supported by previous work of Meneely etal [35] where they empirically showed projects with more develop-ers has more vulnerabilities ndashno one was ldquoin chargerdquo of the security

of the project A recent attack on php-src [36] supports the bene-fits of better communication between maintainers The attack wasidentified within hours because attackers tried to impersonate twoprimary PHP maintainers One of them spotted the unusual commitimmediately and reverted the commit An opposite plot of suchan attack could be if the impersonated maintainers were not theprimary maintainers and did not have proper communication withother team members The unusual commit may have taken days tobe discovered

Analysis of npmpackages Though the exact number of main-tainers varies from project to project in case of npm the numberof maintainers is more likely to be less because npm is buildingupon reusing small JavaScript libraries 15 million npm packageshave an average cost of 17 maintainers which makes sense assmall packages may not require many maintainers We extractedand stored the list of maintainers corresponding to each packagein a SQL database and then ranked them by the total number ofmaintainers in the package As discussed the entire npm had anaverage of 17 maintainers analyzing the whole data set is not prac-tical to identify the unusual package with too many maintainersTherefore we picked the top 1 (14941) packages in npm rankedby the total number of maintainers

We found that our selected 1 packages had an average of324 maintainers per package which was 19 times more thanaverage package maintainers in the entire registry

Our concern was also addressed by Zimmermann et al [2] wherethey suggested that the value of over 20 maintainers in a packageis questionable

325 W5 Too many contributors An attacker can sneakin malicious code bypassing the maintainerrsquos radar when amaintainer is responsible for many contributorsMany priorresearch shows that when multiple contributors change a file thefile is more likely to have more failures [35 37ndash39] which mayinclude security issues These observations motivated our nextsignal A maintainer-to-contributors ratio or increased number ofcontributors increases the security risks

Contributors may vary in knowledge skill and experience Pack-age quality will inevitably suffer if the maintainers do not payenough attention to review the pull request from contributors Es-pecially where an average of 17 maintainers maintain JavaScriptpackages many contributors bring additional responsibility formaintainers in product functionality or security If the maintainersdo not pay enough attention contributors with malicious purposescan include potential backdoors into code and malicious code willmerge An attacker can target packages with many contributorswhere the attacker can do social engineering to become a trustedcontributor and make some minor contribution to gain trust andthen sneak in malicious code

Analysis of npm packagesWe extracted and stored the con-tributor(s) list from packagejson files We measured the ratio be-tween the total number of maintainers to the total number of con-tributors of corresponding packages where a higher ratio indicateseach maintainer is responsible for fewer contributors Out of 15million npm packages only 26 (38913) of the packages have listedcontributors and the average maintainer to contributors ratio is

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

32 Since JavaScript packages are small it makes sense that theymay not need many contributors in one package To understandthe extreme cases where package maintainers added many contrib-utors we picked the top 1 packages in npm where the maintainerto contributor ratio was minimum

We found that the selected 1 (389) packages had an averagemaintainer to contributorrsquos ratio of 140

326 W6 Overloaded Maintainer An attacker may targeta maintainer who owns many packages because the main-tainer may not have enough time to maintain security of allthe packages Overloaded maintainers are weak links if the main-tainers 1) have a large number of upstream packages 2) use a largedependency chain in packages attackers may inject malware intheir dependency or 3) if they have many inactive packages anattacker may try to take over those packages A well-known attackon an overloading maintainer is event-stream attack [6] where anoriginal maintainer handed over a popular npm package ownershipto a malicious maintainer simply because the package was inactiveand the original maintainer does not need that package anymoreWe queried the same maintainer in our database We found that heis an overloaded maintainer and he owns 62 inactive packageswith more than 30k direct dependents Although one may arguethat a maintainer who owns many packages may consider as a signof stability over any new maintainer while we agree with the state-ment we want to include such a maintainer to the supply chainadministratorrsquos security radar Attackers are more likely to targetsuch maintainers if the maintainer is overloaded and does not haveenough time to maintain all of his packages equally We propose thesupply chain administrator should follow up the security measureof overloading maintainers such as minimum package dependencyownership transfer of inactive packages two-factor authorizationset up on their account active domain registration

Analysis of overloaded maintainersWe analyzed the over-loaded maintainer using our maintainer reach metric (section31)We found that 482 maintainers own more than one packagewhereas only 248 have downstream users We again picked thetop 1(4743) maintainers ranked by higher maintainer reach tounderstand the extreme cases

We found that the top 1 maintainers own an average num-ber of 1803 packages with direct dependents of 4010 averagepackages

We analyzed further to understand risk factors associated withinactive packages and downstream dependencies We found that30 (1442) of maintainers do not hold any inactive packages how-ever 70 shows otherwise In terms of package dependency wefound that 80 of packages have dependencies which indicatesthat overloading maintainers are responsible for their dependencychain security to protect the downstream users

4 CASE STUDYThis section provides three case studies of our proposed weaklink signals First we present our analysis in popular packages Wehypothesize that the popular package maintainers are aware of such

weak links and will avoid them to protect the downstream userrsquossecurity Second we present a case study on malicious packagesidentified by our signals and at the end we present an example ofa data-driven attack

41 Case study 1 Popular packagesConsidering 650 growth in supply chain attacks as an indica-tion [3] attackers will likely continue to target popular packagesand maintainers as a preferred path to exploiting downstream vic-tims at scale Hence we analyzed whether weak link signals existin popular packages

To measure package popularity we used the package reach met-rics from Section 31 We picked the top 10000 packages from the(1) package dependents and (2) package downloads analysis andcombined the two lists removing duplicates into a combined pop-ular package sample The sample included 14892 packages (1 oftotal packages) as popular with an average of 9374 dependents and885 million downloads in 12 months We note that the popularpackage analysis does not consider transitive dependents Hencethe impact of a weak link in a popular package may present a highersupply chain risk than presented

Expired Domain (W1) Among the popular packages 33 pack-ages (average direct dependents 3829 and average downloads 111million) have at least one maintainer with an expired domain Theseaccounts can be used to compromise the package unless two-factorauthentication is enabled

W1 12637 packages that depend on popular packages wereexposed to higher supply chain risk due to expired domains

Install Scripts (W2) Among the popular packages 362 pack-ages (average direct dependents 14163 and downloads average346 million) have install scripts This is an encouraging resultshowing that 975 of popular packages are aware of the risks andavoid using install scripts

W2 362 popular packages had install scripts and exposed1416 packages on average to attacks through install scripts

Unmaintained Package (W3) Of the popular packages 38(5645) packages (average direct dependents 4224 and downloadsaverage 761 million) had an inactive status Interestingly 560 ofthem (average direct downstream dependents 13693 and down-loads average 328 million) were deprecated Deprecated packages(Section 312) are a classic example of unmaintained packageswhere maintainers officially declare the package unmaintainedWe also found 619 inactive maintainers who still own 645 popularpackages

W3 38 of popular packages were inactive and 560 of themwere deprecated 645 popular packages did not have any ac-tive maintainers An inactive package exposed 422 packageson average to higher supply chain risk

Too many Maintainers (W4) Among the popular packages421 packages (average direct dependents 2692 and downloadsaverage 416 million) had an average of 346 maintainers

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Large packages may need more maintainers Hence we hypoth-esize that 421 popular packages are large compared to the otherpopular packages To test the hypothesis we extracted and storedthe ldquounpacked sizerdquo and ldquofile countrdquo property to measure the pack-age size We created two samples (1) 14471 popular packageswithout 421 packages (average of 54 files 6968 MB unpacked sizeand 24 maintainers) and 2) 14892 popular packages (average of55 files 6978 MB unpacked size and 34 maintainers) We foundthe 421 packages added an average cost of one maintainer withoutany significant difference between package size and file countsHence having too many maintainers in a JavaScript package doesnot necessarily indicate packages are large

W4 421 popular packages had an average of 346 maintainersand potentially exposed their dependents to higher supplychain risk

Too many contributors (W5) Among the popular packages23 packages (average direct dependents 645804 and downloadsaverage 642 million) had an average maintainer to contributorrsquosratio of 137 Therefore on average one maintainer has to manage37 contributorsrsquo to secure packages against malicious code andmalicious contributors

W5 23 popular packages are exposed to higher supply chainrisk due to maintainers having to manage an average of 37contributors

Overloaded Maintainers (W6) Among the popular packages9871 (663) are owned by popularmaintainers whomanagemanypackages (average direct dependents 103902 and downloads aver-age 1103 million) Attackers can try to compromise these main-tainers to exploit downstream victims at scale

W6 2491 popular maintainers own 9871 popular packages

42 Case study 2 Installation scriptsThe scope of this work is to analyze package metadata to identifyweak links that lead to possible supply chain attacks

Even though the focus was not on detecting malicious packagesour analysis of installation scripts revealedmalicious activity withinpackages Some package have shell commands embedded directlyin the packagejson file We investigated the shell to verify whetherthey could help detect malicious packages We queried keywordsrelated to read or write user-sensitive information from the filesystem (eg etcpasswd etcshadow) or HTTP requests to transferuser data to a remote server (eg curl wget) Our keyword searchis motivated by Duan et al [4] who proposed a framework andterminologies derived from existing supply chain attacks

We identified 74 packages where the installation scripts includedkeywords like curl etcshadow etcpasswd hostname Of those11 were found to be malicious packages and rest were benign Allthe malicious packages performed DNS lookups and send user-sensitive data to a specific URL Three of the malicious packageincluded shell commands for both windows and UNIX OS

Table 1 A subset of combination signals quantified fromourpopular package analysis

Signal combination packagecount

Signal combination packagecount

W6 cap W3 capW4 32 W2 cap W3 86W6 cap W3 capW1 5 W6 cap W1 17W6 cap W3 capW2 38 W6 cap W2 187W3 cap W4 32 W6 cap W3 3356

After three months of our initial data collection we collecteda new security holding package list from npm to validate our re-sult Our findings aligned with the npm security specialists theyidentified 10 out of the 11 packages as malicious We reported theremaining malicious package to the npm security team

This analysis demonstrates the risk of having a dependency onpackages that include installation scripts

43 Case study 3 Data-Driven AttackerTo illustratewe used the sample of popular packages (14892) sampleFigure 1 shows an example of a data-driven attack that combinestwo of the signals Expired Maintainer Domain (W1) and Unmain-tained packages (W3) An attacker can scan the npm registry forpopular packages and download relevant metadata from the pack-agejson file Then an attacker can easily extract inactive packages(5645) and maintainer email addresses (1108) from package meta-data The next step would be looking for domain availability in thedomain registrar site (eg GoDaddy)

Figure 1 Data-driven Attack by combining W1 and W3

In this stage we have identified 15 domains available for sale inGodaddy An attacker can purchase these domains and alter theMX record to hijack the maintainerrsquos email addresses (section 321)In general npm requires an email address to set up an user accountattackers can reset npm accounts by using those 15 email addressesto access 899 npm packages

Another example of a data-driven attack would be looking foroverloaded (W6) inactivemaintainers (W3) in popular packages Anattacker can perform social engineering to take over these packagesWe have found 25 popular packages associated with 16 overloadedinactive maintainers Out of 25 for three packages (average packagedependents 117 and downloads 6906) the last update year was 2013

Table 1 shows a subset of other possible combinations and howthey would reduce a large number of 1M+ npm packages to a smallnumber of candidate packages for potential supply chain attacks

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

5 RQ2 SURVEYIn this section we discuss our RQ2How do practitioners perceivethe identified weak links in the npm supply chain To that end weconducted a survey on npm package maintainers We selected thetop 10 (47433) of the maintainers ranked by the number of ownedpackages as survey candidates We chose this selection criterionfor the following two reasons The maintainers are 1) experiencedwith JavaScript packages since they own many packages in npmand 2) part of a large supply chain as many packages use thesemaintainerrsquos packages as dependencies

Our survey is designed to capture practitionersrsquo views on ourproposed weak links and whether they want to be notified if any ofthe proposed weak links exist in their package dependency graph

Table 2 is a complete summary of our survey We attached theagreement and disagreement of practitioners regarding each weaklink signal in the second and third columns The last column pro-vides the percentage of the practitioner who wants to be notifiedabout such signals We observed that package maintainers sup-ported three of our proposed signals as a weak link and they wouldwant to be notified about these signals existence in the packagedependency graph

Agreement Out of six signals more than 50 of practitionerssupported W1 (Expired Maintainer Domain)W2 (Install Scrips) andW3 (Unmaintained Packages) as a weak link signal We observedthat practitioners indicated a desire to be notified of weak linksignals despite not supporting the weak link signal directly Asshown in Table 2 the percentage in the want to be notified columnexceeds the percentage of rdquoAgreed as a weak link for W1 W2 andW3

Disagreement Our survey findings show that practitioners didnot support W4 (Total Number of Maintainers) W5 (Maintainerto contributors ratio) and W6 (Packages per maintainer) as weaklink signals More than 40 of people disagreed with these signalswhereas less than 20 supported this as a weak link To understandthe context behind why practitioners think otherwise we analyzedpractitionerrsquos comments in the open-ended questions For examplendashldquoHistorically I would have agreed with ldquoToo many maintainersrdquo beinga risk but as long as they are known people to youI believe it tobe okrdquo or Many contributors is not a signal itrsquos a desired state ofopen source and if all are reviewed by a reliable maintainer they areno risk Both of the statements indirectly support our weak linksassumptions under certain condition like ldquoreliable maintainerrdquo orldquoas long as maintainer are known peoplerdquo Also practitioners raiseconcerns about these three signals being subjective and may varyfrom project to project Moreover we do not claim that having anyof these signals means a bad package instead our proposed metricsare guidelines for practitioners to make informed decisions aboutthe use of the package

New signals proposed by Maintainers We asked an open-ended question for the respondents to recommend additional signalsthat we should consider in future work Out of 470 practitioners wereceived 213 responses for new signals To label the new signals tworesearchers separately reviewed the 213 responses and comparedresults We included the new signal if the signal was raised by ldquoatleastrdquo two respondents In some cases the practitionerrsquos intent was

unclear and the comment was discarded We summarize the twomost frequently mentioned concepts

Maintainers 41 practitioners in some waymentioned maintain-ers being a risk We have identified the most frequent discussionon maintainers and propose the following four signals

bull Ownership transfer or adding new maintainers Anysudden change in a package maintainers list is proposed tobe a weak link from practitioners Practitioners would wantto know about such changes in the package dependencygraph if a package transfers the ownership or adds any newmaintainers

bull Maintainer Identity Practitioners commented on the roleof maintainer expertise and identity verification in the supplychain A maintainer with a real picture organizational back-ground and email address linked social media or repositoryhistory of co-authoring with other maintainers will makea maintainer reliable over any new maintainers Althoughnpm provides the list of packages owned by maintainersenforcing maintainers to add real identity or experience maybe a big security improvement in the community

bull MaintainerTwo-FactorAuthenticationAmaintainermiss-ing two factored authentication (2FA) for package hostingor releasing a new version or login to the npm account isa weak link 2FA authentication should be enforced for allmaintainers to publish a package

Integration of version control software 52 (244) of the re-sponses were related to version control software(VCS) packagerepository and npm integration

bull No source code repository When a package has no orwrong public source code repositoryhomepageVCS or thelinked repository is archived the access to review sourcecode is restricted forcing users to trust a package blindly

bull npm package vs source code repository The practition-ers raise the concern to validate the published npm packageagainst the code on the source code repository Hence allfiles inside a given package must match the exact contentsin the repository

bull CICD pipeline Missing CICD infrastructure to test codeand build of npm packages The practitioners also mentionedthat the type of CICD services matters Whether CICD ser-vice providers or self-hosted infrastructure the practitionersprefer details on testing code coverage or alerts on the useof compromised CICD systems from past security incidents

bull Open pull request A package with many open issues andpull requests (PR) indicates a poorly maintained packageOne can view if a package has open issues in npm onlinerepository However practitioners commented on addingsuch information in the package dependency graph

Since our analysis is limited to package metadata from npm we didnot consider any repository-related weak link signal in this studybut can be considered as a future research direction

6 LIMITATIONSWe proposed and studied several weak link signals inferred fromnpm metadata and which can be used to evaluate security risksassociated with npm packages However we do not claim that these

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

and email address from each packagejson file We then queriedand split the domain name from each email address and countedthe number of times a domain is used across all npm packages Outof 93K domains in the 163M npm packages 86 were unique (ap-peared once) while the rest were from the public or organizationaldomain We hypothesize that all maintainer domains in the npmregistry are up-to-date and none of them are available for sale Tothat end we performed a bulk query of 93K domains in Godaddy adomain registrar site We found 5346 domains are available for saleWe picked a random sample of 50 domains to determine the truepositive rate and checked each domain independently in GodaddyWe found 33 of them are not for sale Due to the high false-positiverate we manually verified 5346 domains in Godaddy

We found 2818 maintainerrsquos domains are available for saleand can be purchased These maintainers own 8494 pack-ages in the npm registry with average direct dependents of243 packages and average downloads of 53K in the past 12months

We reported our findings to the npm security team to validatethe email address for npm user accounts We note that our bulkquery may have a high false-negative rate since we found 47 falsepositive from the 5346 domain Hencenpm may have more than2818 maintainers associated with expired domains

322 W2 Installation Script An attacker can use installa-tion scripts to run commands that perform malicious actsthrough the package installation step [27] Install scripts runautomatically either before during or after package installationwhen certain events are triggered These scripts are used to makethe installation process easy since they are automatically run bynpm However for an attacker such scripts create opportunitieswhere ldquosky is the limitrdquo The attacker could steal user-sensitive dataor execute a new child process to create backdoor access or gainaccess to execute a series of commands remotely [4 5 7ndash9 25]Alternatively the attacker can infiltrate the third-party dependencesince that installation script will run automatically by the targetedpackage and its users during installation The CCleaner [28] So-larwinds [12] NotPetya [29] and Adverline [30] data breachesare examples of such attacks where attackers targeted the remoteserver third-party vendors to execute a large supply chain attackThough the presence of installation scripts themselves are not a di-rect indication of maliciousness the privilege to run automaticallymakes installation scripts a weak link signal in the supply chainAs best practices even npm registry recommends avoiding installscript- ldquoDonrsquot use installYou should almost never have to explicitlyset a preinstall or install scriptrdquo4While evidence of such an attackthrough the installations script is not rare [4 5 7ndash9 23 25 28ndash30]similar or perhaps even worse attacks may happen in the future

Analysis of npm packages To collect script details we ex-tracted and stored the script key which is the scriptrsquos name (egpreinstall)and the script value which contains the script pathshell commandsfrom the packagejson file Then we query and separate all thepackages that have script keys like ldquoInstallrdquo

4httpsdocsnpmjscomcliv7using-npmscripts

we found 22 (33249) of packages use install scripts indicat-ing that 978 of packages may follow npm recommendationof not using the install script as best security practices

Additionally we collected 3635 malicious packagesjson filesfrom npm They are similar to the security holding packagejson file(see Section 312) with all dummy metadata The only differenceis that these JSON files have actual malicious script key and valuepair for example- the original directory of malicious code files ormalicious shell script embedded in JSON files with other dummymetadata To analyze the malicious JSON file two researchers sepa-rately reviewed these scripts and compared results for verificationSince the scope of the project was limited to packagejson files onlywe were only able to analyze 419 (out of 3635) packages wheremalicious shell command was embedded directly in the JSON fileFrom our analysis we found three types of attack patterns thatattackers use frequently

bull Transfer Users data to third party server (eg- hostnameetcshadow etcpasswdhomeltusergtssh ) We found 350packages that communicate and transfer data with thirdparty server

bull Download malicious tool and run it to user machine Forexample- download a crypto miner software and run it onuser machine We found 82 packages

bull Spawning new process to interact with operating-systemprocesses particularly the native module child_process (eg-use of reverse shell) We found 212 packages that start a newprocess and transfer data to third party server

We found 939 (3412) of unique malicious packages useinstall scripts indicating that malicious attackers use installscripts frequently

323 W3UnmaintainedPackage Attackers can target pack-ages that are more likely to take over and sneak in malwaredue to lack of maintenance Differentiating between unmain-tained and feature-complete packages that require no further re-leases is complex and ambiguous [31] Even if the package maynot require any maintenance by itself it may need maintenancedue to security issues in its dependencies or use new syntax toimprove performance bug fixing amp documentation improvementsIn 2020 the average time to remediate security issues was 68 daysin open source projects [32] and 66 of security vulnerabilities innpm packages remain unpatched [33] Hence the time requiredto remediate a security issue in unmaintained packages remainsunknown We considered unmaintained packages as a weak linksignal from two directions 1) Inactive packages and 2) Inactivemaintainers

Inactive PackageWe considered a package inactive if the pack-agersquos last modification time in the packagejson file is past two yearsOther prior work [31 33] has defined inactivity as one year gapHowever many npm packages have low complexity with a fewlines of code and may not require any recent update Thus weconsider two years as an inactivity period

An attacker can exploit vulnerabilities in all applications thatdirectly or transitively depend on vulnerable code as long as the

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

vulnerable dependency or the package itself remains unfixed [2]The response time to fix such issues is undetermined for inactivepackages and end-usersmay remain infected due to unawareness ofthe security threat A deprecated package (see Section 312) exposeseven higher security risk since the maintainers no longer maintainthe package to fix security issues An attacker can inject malwarein widely used but unmaintained packages A prominent exampleof such an attack is the ldquoMailparserrdquo attack [34] an old deprecatedpackage An attacker indirectly reaches the ldquoMailParserrdquo through arelatively new package named ldquogetcookiesrdquo which had indirectlybeen made into the nested dependency chain of MailParser

Inactive MaintainersWe defined an inactive maintainer if themaintainer had no active packages in the past two yearsAn attackercan target packages with inactive maintainer(s) because any attackwill remain undetected due to the inactivity of maintainers There-fore distinguishing between active and inactive maintainers is nec-essary to identify the packages that require maintenanceAlthoughinactive maintainerrsquos packages are a subset of inactive packageswe must identify them separately Because an inactive packagemay have maintainers who are active in other packages whereasinactive maintainers indicate the maintainer is inactive in the entirenpm registry

Analysis of npmpackagesWe extracted and stored the ldquotimerdquoproperty of the packagejson file to measure the number of pack-ages that have been inactive for the past two years We identifiedinactive maintainers by evaluating the last modified time proper-ties for all packages corresponding to an individual maintainer Apackage where none of the maintainers are active elsewhere inthe entire package registry is determined as inactive maintainersof unmaintained packages We also considered deprecated pack-ages as unmaintained since they are unmaintained officially by themaintainer We separated the deprecated package where the lastmodification time passed our threshold value because the depreca-tion was declared later

We found 587 of packages and 443 of maintainers areinactive in the npm registry There are 5532 additional dep-recated packages where the deprecation date passed ourthreshold value

The packagejson does not provide individual maintainer activi-ties history Hence distinguishing inactive maintainers from activepackages was not plausible

324 W4 Too many Maintainers An attacker can target apackage in a dependency chain to exploit the possible over-sight due to too many maintainers in chargeWhen a projecthas many people it lacks ownership there may not be proper vet-ting in code changes because all of them are owners and no one isresponsible Alternatively many people in a team may lack appro-priate communication which increases the chance of oversight inmaintainer activities Bad actors may hide their identity by com-promising one maintainer profile or performing social engineeringto access the package

Our hypothesis is supported by previous work of Meneely etal [35] where they empirically showed projects with more develop-ers has more vulnerabilities ndashno one was ldquoin chargerdquo of the security

of the project A recent attack on php-src [36] supports the bene-fits of better communication between maintainers The attack wasidentified within hours because attackers tried to impersonate twoprimary PHP maintainers One of them spotted the unusual commitimmediately and reverted the commit An opposite plot of suchan attack could be if the impersonated maintainers were not theprimary maintainers and did not have proper communication withother team members The unusual commit may have taken days tobe discovered

Analysis of npmpackages Though the exact number of main-tainers varies from project to project in case of npm the numberof maintainers is more likely to be less because npm is buildingupon reusing small JavaScript libraries 15 million npm packageshave an average cost of 17 maintainers which makes sense assmall packages may not require many maintainers We extractedand stored the list of maintainers corresponding to each packagein a SQL database and then ranked them by the total number ofmaintainers in the package As discussed the entire npm had anaverage of 17 maintainers analyzing the whole data set is not prac-tical to identify the unusual package with too many maintainersTherefore we picked the top 1 (14941) packages in npm rankedby the total number of maintainers

We found that our selected 1 packages had an average of324 maintainers per package which was 19 times more thanaverage package maintainers in the entire registry

Our concern was also addressed by Zimmermann et al [2] wherethey suggested that the value of over 20 maintainers in a packageis questionable

325 W5 Too many contributors An attacker can sneakin malicious code bypassing the maintainerrsquos radar when amaintainer is responsible for many contributorsMany priorresearch shows that when multiple contributors change a file thefile is more likely to have more failures [35 37ndash39] which mayinclude security issues These observations motivated our nextsignal A maintainer-to-contributors ratio or increased number ofcontributors increases the security risks

Contributors may vary in knowledge skill and experience Pack-age quality will inevitably suffer if the maintainers do not payenough attention to review the pull request from contributors Es-pecially where an average of 17 maintainers maintain JavaScriptpackages many contributors bring additional responsibility formaintainers in product functionality or security If the maintainersdo not pay enough attention contributors with malicious purposescan include potential backdoors into code and malicious code willmerge An attacker can target packages with many contributorswhere the attacker can do social engineering to become a trustedcontributor and make some minor contribution to gain trust andthen sneak in malicious code

Analysis of npm packagesWe extracted and stored the con-tributor(s) list from packagejson files We measured the ratio be-tween the total number of maintainers to the total number of con-tributors of corresponding packages where a higher ratio indicateseach maintainer is responsible for fewer contributors Out of 15million npm packages only 26 (38913) of the packages have listedcontributors and the average maintainer to contributors ratio is

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

32 Since JavaScript packages are small it makes sense that theymay not need many contributors in one package To understandthe extreme cases where package maintainers added many contrib-utors we picked the top 1 packages in npm where the maintainerto contributor ratio was minimum

We found that the selected 1 (389) packages had an averagemaintainer to contributorrsquos ratio of 140

326 W6 Overloaded Maintainer An attacker may targeta maintainer who owns many packages because the main-tainer may not have enough time to maintain security of allthe packages Overloaded maintainers are weak links if the main-tainers 1) have a large number of upstream packages 2) use a largedependency chain in packages attackers may inject malware intheir dependency or 3) if they have many inactive packages anattacker may try to take over those packages A well-known attackon an overloading maintainer is event-stream attack [6] where anoriginal maintainer handed over a popular npm package ownershipto a malicious maintainer simply because the package was inactiveand the original maintainer does not need that package anymoreWe queried the same maintainer in our database We found that heis an overloaded maintainer and he owns 62 inactive packageswith more than 30k direct dependents Although one may arguethat a maintainer who owns many packages may consider as a signof stability over any new maintainer while we agree with the state-ment we want to include such a maintainer to the supply chainadministratorrsquos security radar Attackers are more likely to targetsuch maintainers if the maintainer is overloaded and does not haveenough time to maintain all of his packages equally We propose thesupply chain administrator should follow up the security measureof overloading maintainers such as minimum package dependencyownership transfer of inactive packages two-factor authorizationset up on their account active domain registration

Analysis of overloaded maintainersWe analyzed the over-loaded maintainer using our maintainer reach metric (section31)We found that 482 maintainers own more than one packagewhereas only 248 have downstream users We again picked thetop 1(4743) maintainers ranked by higher maintainer reach tounderstand the extreme cases

We found that the top 1 maintainers own an average num-ber of 1803 packages with direct dependents of 4010 averagepackages

We analyzed further to understand risk factors associated withinactive packages and downstream dependencies We found that30 (1442) of maintainers do not hold any inactive packages how-ever 70 shows otherwise In terms of package dependency wefound that 80 of packages have dependencies which indicatesthat overloading maintainers are responsible for their dependencychain security to protect the downstream users

4 CASE STUDYThis section provides three case studies of our proposed weaklink signals First we present our analysis in popular packages Wehypothesize that the popular package maintainers are aware of such

weak links and will avoid them to protect the downstream userrsquossecurity Second we present a case study on malicious packagesidentified by our signals and at the end we present an example ofa data-driven attack

41 Case study 1 Popular packagesConsidering 650 growth in supply chain attacks as an indica-tion [3] attackers will likely continue to target popular packagesand maintainers as a preferred path to exploiting downstream vic-tims at scale Hence we analyzed whether weak link signals existin popular packages

To measure package popularity we used the package reach met-rics from Section 31 We picked the top 10000 packages from the(1) package dependents and (2) package downloads analysis andcombined the two lists removing duplicates into a combined pop-ular package sample The sample included 14892 packages (1 oftotal packages) as popular with an average of 9374 dependents and885 million downloads in 12 months We note that the popularpackage analysis does not consider transitive dependents Hencethe impact of a weak link in a popular package may present a highersupply chain risk than presented

Expired Domain (W1) Among the popular packages 33 pack-ages (average direct dependents 3829 and average downloads 111million) have at least one maintainer with an expired domain Theseaccounts can be used to compromise the package unless two-factorauthentication is enabled

W1 12637 packages that depend on popular packages wereexposed to higher supply chain risk due to expired domains

Install Scripts (W2) Among the popular packages 362 pack-ages (average direct dependents 14163 and downloads average346 million) have install scripts This is an encouraging resultshowing that 975 of popular packages are aware of the risks andavoid using install scripts

W2 362 popular packages had install scripts and exposed1416 packages on average to attacks through install scripts

Unmaintained Package (W3) Of the popular packages 38(5645) packages (average direct dependents 4224 and downloadsaverage 761 million) had an inactive status Interestingly 560 ofthem (average direct downstream dependents 13693 and down-loads average 328 million) were deprecated Deprecated packages(Section 312) are a classic example of unmaintained packageswhere maintainers officially declare the package unmaintainedWe also found 619 inactive maintainers who still own 645 popularpackages

W3 38 of popular packages were inactive and 560 of themwere deprecated 645 popular packages did not have any ac-tive maintainers An inactive package exposed 422 packageson average to higher supply chain risk

Too many Maintainers (W4) Among the popular packages421 packages (average direct dependents 2692 and downloadsaverage 416 million) had an average of 346 maintainers

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Large packages may need more maintainers Hence we hypoth-esize that 421 popular packages are large compared to the otherpopular packages To test the hypothesis we extracted and storedthe ldquounpacked sizerdquo and ldquofile countrdquo property to measure the pack-age size We created two samples (1) 14471 popular packageswithout 421 packages (average of 54 files 6968 MB unpacked sizeand 24 maintainers) and 2) 14892 popular packages (average of55 files 6978 MB unpacked size and 34 maintainers) We foundthe 421 packages added an average cost of one maintainer withoutany significant difference between package size and file countsHence having too many maintainers in a JavaScript package doesnot necessarily indicate packages are large

W4 421 popular packages had an average of 346 maintainersand potentially exposed their dependents to higher supplychain risk

Too many contributors (W5) Among the popular packages23 packages (average direct dependents 645804 and downloadsaverage 642 million) had an average maintainer to contributorrsquosratio of 137 Therefore on average one maintainer has to manage37 contributorsrsquo to secure packages against malicious code andmalicious contributors

W5 23 popular packages are exposed to higher supply chainrisk due to maintainers having to manage an average of 37contributors

Overloaded Maintainers (W6) Among the popular packages9871 (663) are owned by popularmaintainers whomanagemanypackages (average direct dependents 103902 and downloads aver-age 1103 million) Attackers can try to compromise these main-tainers to exploit downstream victims at scale

W6 2491 popular maintainers own 9871 popular packages

42 Case study 2 Installation scriptsThe scope of this work is to analyze package metadata to identifyweak links that lead to possible supply chain attacks

Even though the focus was not on detecting malicious packagesour analysis of installation scripts revealedmalicious activity withinpackages Some package have shell commands embedded directlyin the packagejson file We investigated the shell to verify whetherthey could help detect malicious packages We queried keywordsrelated to read or write user-sensitive information from the filesystem (eg etcpasswd etcshadow) or HTTP requests to transferuser data to a remote server (eg curl wget) Our keyword searchis motivated by Duan et al [4] who proposed a framework andterminologies derived from existing supply chain attacks

We identified 74 packages where the installation scripts includedkeywords like curl etcshadow etcpasswd hostname Of those11 were found to be malicious packages and rest were benign Allthe malicious packages performed DNS lookups and send user-sensitive data to a specific URL Three of the malicious packageincluded shell commands for both windows and UNIX OS

Table 1 A subset of combination signals quantified fromourpopular package analysis

Signal combination packagecount

Signal combination packagecount

W6 cap W3 capW4 32 W2 cap W3 86W6 cap W3 capW1 5 W6 cap W1 17W6 cap W3 capW2 38 W6 cap W2 187W3 cap W4 32 W6 cap W3 3356

After three months of our initial data collection we collecteda new security holding package list from npm to validate our re-sult Our findings aligned with the npm security specialists theyidentified 10 out of the 11 packages as malicious We reported theremaining malicious package to the npm security team

This analysis demonstrates the risk of having a dependency onpackages that include installation scripts

43 Case study 3 Data-Driven AttackerTo illustratewe used the sample of popular packages (14892) sampleFigure 1 shows an example of a data-driven attack that combinestwo of the signals Expired Maintainer Domain (W1) and Unmain-tained packages (W3) An attacker can scan the npm registry forpopular packages and download relevant metadata from the pack-agejson file Then an attacker can easily extract inactive packages(5645) and maintainer email addresses (1108) from package meta-data The next step would be looking for domain availability in thedomain registrar site (eg GoDaddy)

Figure 1 Data-driven Attack by combining W1 and W3

In this stage we have identified 15 domains available for sale inGodaddy An attacker can purchase these domains and alter theMX record to hijack the maintainerrsquos email addresses (section 321)In general npm requires an email address to set up an user accountattackers can reset npm accounts by using those 15 email addressesto access 899 npm packages

Another example of a data-driven attack would be looking foroverloaded (W6) inactivemaintainers (W3) in popular packages Anattacker can perform social engineering to take over these packagesWe have found 25 popular packages associated with 16 overloadedinactive maintainers Out of 25 for three packages (average packagedependents 117 and downloads 6906) the last update year was 2013

Table 1 shows a subset of other possible combinations and howthey would reduce a large number of 1M+ npm packages to a smallnumber of candidate packages for potential supply chain attacks

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

5 RQ2 SURVEYIn this section we discuss our RQ2How do practitioners perceivethe identified weak links in the npm supply chain To that end weconducted a survey on npm package maintainers We selected thetop 10 (47433) of the maintainers ranked by the number of ownedpackages as survey candidates We chose this selection criterionfor the following two reasons The maintainers are 1) experiencedwith JavaScript packages since they own many packages in npmand 2) part of a large supply chain as many packages use thesemaintainerrsquos packages as dependencies

Our survey is designed to capture practitionersrsquo views on ourproposed weak links and whether they want to be notified if any ofthe proposed weak links exist in their package dependency graph

Table 2 is a complete summary of our survey We attached theagreement and disagreement of practitioners regarding each weaklink signal in the second and third columns The last column pro-vides the percentage of the practitioner who wants to be notifiedabout such signals We observed that package maintainers sup-ported three of our proposed signals as a weak link and they wouldwant to be notified about these signals existence in the packagedependency graph

Agreement Out of six signals more than 50 of practitionerssupported W1 (Expired Maintainer Domain)W2 (Install Scrips) andW3 (Unmaintained Packages) as a weak link signal We observedthat practitioners indicated a desire to be notified of weak linksignals despite not supporting the weak link signal directly Asshown in Table 2 the percentage in the want to be notified columnexceeds the percentage of rdquoAgreed as a weak link for W1 W2 andW3

Disagreement Our survey findings show that practitioners didnot support W4 (Total Number of Maintainers) W5 (Maintainerto contributors ratio) and W6 (Packages per maintainer) as weaklink signals More than 40 of people disagreed with these signalswhereas less than 20 supported this as a weak link To understandthe context behind why practitioners think otherwise we analyzedpractitionerrsquos comments in the open-ended questions For examplendashldquoHistorically I would have agreed with ldquoToo many maintainersrdquo beinga risk but as long as they are known people to youI believe it tobe okrdquo or Many contributors is not a signal itrsquos a desired state ofopen source and if all are reviewed by a reliable maintainer they areno risk Both of the statements indirectly support our weak linksassumptions under certain condition like ldquoreliable maintainerrdquo orldquoas long as maintainer are known peoplerdquo Also practitioners raiseconcerns about these three signals being subjective and may varyfrom project to project Moreover we do not claim that having anyof these signals means a bad package instead our proposed metricsare guidelines for practitioners to make informed decisions aboutthe use of the package

New signals proposed by Maintainers We asked an open-ended question for the respondents to recommend additional signalsthat we should consider in future work Out of 470 practitioners wereceived 213 responses for new signals To label the new signals tworesearchers separately reviewed the 213 responses and comparedresults We included the new signal if the signal was raised by ldquoatleastrdquo two respondents In some cases the practitionerrsquos intent was

unclear and the comment was discarded We summarize the twomost frequently mentioned concepts

Maintainers 41 practitioners in some waymentioned maintain-ers being a risk We have identified the most frequent discussionon maintainers and propose the following four signals

bull Ownership transfer or adding new maintainers Anysudden change in a package maintainers list is proposed tobe a weak link from practitioners Practitioners would wantto know about such changes in the package dependencygraph if a package transfers the ownership or adds any newmaintainers

bull Maintainer Identity Practitioners commented on the roleof maintainer expertise and identity verification in the supplychain A maintainer with a real picture organizational back-ground and email address linked social media or repositoryhistory of co-authoring with other maintainers will makea maintainer reliable over any new maintainers Althoughnpm provides the list of packages owned by maintainersenforcing maintainers to add real identity or experience maybe a big security improvement in the community

bull MaintainerTwo-FactorAuthenticationAmaintainermiss-ing two factored authentication (2FA) for package hostingor releasing a new version or login to the npm account isa weak link 2FA authentication should be enforced for allmaintainers to publish a package

Integration of version control software 52 (244) of the re-sponses were related to version control software(VCS) packagerepository and npm integration

bull No source code repository When a package has no orwrong public source code repositoryhomepageVCS or thelinked repository is archived the access to review sourcecode is restricted forcing users to trust a package blindly

bull npm package vs source code repository The practition-ers raise the concern to validate the published npm packageagainst the code on the source code repository Hence allfiles inside a given package must match the exact contentsin the repository

bull CICD pipeline Missing CICD infrastructure to test codeand build of npm packages The practitioners also mentionedthat the type of CICD services matters Whether CICD ser-vice providers or self-hosted infrastructure the practitionersprefer details on testing code coverage or alerts on the useof compromised CICD systems from past security incidents

bull Open pull request A package with many open issues andpull requests (PR) indicates a poorly maintained packageOne can view if a package has open issues in npm onlinerepository However practitioners commented on addingsuch information in the package dependency graph

Since our analysis is limited to package metadata from npm we didnot consider any repository-related weak link signal in this studybut can be considered as a future research direction

6 LIMITATIONSWe proposed and studied several weak link signals inferred fromnpm metadata and which can be used to evaluate security risksassociated with npm packages However we do not claim that these

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

vulnerable dependency or the package itself remains unfixed [2]The response time to fix such issues is undetermined for inactivepackages and end-usersmay remain infected due to unawareness ofthe security threat A deprecated package (see Section 312) exposeseven higher security risk since the maintainers no longer maintainthe package to fix security issues An attacker can inject malwarein widely used but unmaintained packages A prominent exampleof such an attack is the ldquoMailparserrdquo attack [34] an old deprecatedpackage An attacker indirectly reaches the ldquoMailParserrdquo through arelatively new package named ldquogetcookiesrdquo which had indirectlybeen made into the nested dependency chain of MailParser

Inactive MaintainersWe defined an inactive maintainer if themaintainer had no active packages in the past two yearsAn attackercan target packages with inactive maintainer(s) because any attackwill remain undetected due to the inactivity of maintainers There-fore distinguishing between active and inactive maintainers is nec-essary to identify the packages that require maintenanceAlthoughinactive maintainerrsquos packages are a subset of inactive packageswe must identify them separately Because an inactive packagemay have maintainers who are active in other packages whereasinactive maintainers indicate the maintainer is inactive in the entirenpm registry

Analysis of npmpackagesWe extracted and stored the ldquotimerdquoproperty of the packagejson file to measure the number of pack-ages that have been inactive for the past two years We identifiedinactive maintainers by evaluating the last modified time proper-ties for all packages corresponding to an individual maintainer Apackage where none of the maintainers are active elsewhere inthe entire package registry is determined as inactive maintainersof unmaintained packages We also considered deprecated pack-ages as unmaintained since they are unmaintained officially by themaintainer We separated the deprecated package where the lastmodification time passed our threshold value because the depreca-tion was declared later

We found 587 of packages and 443 of maintainers areinactive in the npm registry There are 5532 additional dep-recated packages where the deprecation date passed ourthreshold value

The packagejson does not provide individual maintainer activi-ties history Hence distinguishing inactive maintainers from activepackages was not plausible

324 W4 Too many Maintainers An attacker can target apackage in a dependency chain to exploit the possible over-sight due to too many maintainers in chargeWhen a projecthas many people it lacks ownership there may not be proper vet-ting in code changes because all of them are owners and no one isresponsible Alternatively many people in a team may lack appro-priate communication which increases the chance of oversight inmaintainer activities Bad actors may hide their identity by com-promising one maintainer profile or performing social engineeringto access the package

Our hypothesis is supported by previous work of Meneely etal [35] where they empirically showed projects with more develop-ers has more vulnerabilities ndashno one was ldquoin chargerdquo of the security

of the project A recent attack on php-src [36] supports the bene-fits of better communication between maintainers The attack wasidentified within hours because attackers tried to impersonate twoprimary PHP maintainers One of them spotted the unusual commitimmediately and reverted the commit An opposite plot of suchan attack could be if the impersonated maintainers were not theprimary maintainers and did not have proper communication withother team members The unusual commit may have taken days tobe discovered

Analysis of npmpackages Though the exact number of main-tainers varies from project to project in case of npm the numberof maintainers is more likely to be less because npm is buildingupon reusing small JavaScript libraries 15 million npm packageshave an average cost of 17 maintainers which makes sense assmall packages may not require many maintainers We extractedand stored the list of maintainers corresponding to each packagein a SQL database and then ranked them by the total number ofmaintainers in the package As discussed the entire npm had anaverage of 17 maintainers analyzing the whole data set is not prac-tical to identify the unusual package with too many maintainersTherefore we picked the top 1 (14941) packages in npm rankedby the total number of maintainers

We found that our selected 1 packages had an average of324 maintainers per package which was 19 times more thanaverage package maintainers in the entire registry

Our concern was also addressed by Zimmermann et al [2] wherethey suggested that the value of over 20 maintainers in a packageis questionable

325 W5 Too many contributors An attacker can sneakin malicious code bypassing the maintainerrsquos radar when amaintainer is responsible for many contributorsMany priorresearch shows that when multiple contributors change a file thefile is more likely to have more failures [35 37ndash39] which mayinclude security issues These observations motivated our nextsignal A maintainer-to-contributors ratio or increased number ofcontributors increases the security risks

Contributors may vary in knowledge skill and experience Pack-age quality will inevitably suffer if the maintainers do not payenough attention to review the pull request from contributors Es-pecially where an average of 17 maintainers maintain JavaScriptpackages many contributors bring additional responsibility formaintainers in product functionality or security If the maintainersdo not pay enough attention contributors with malicious purposescan include potential backdoors into code and malicious code willmerge An attacker can target packages with many contributorswhere the attacker can do social engineering to become a trustedcontributor and make some minor contribution to gain trust andthen sneak in malicious code

Analysis of npm packagesWe extracted and stored the con-tributor(s) list from packagejson files We measured the ratio be-tween the total number of maintainers to the total number of con-tributors of corresponding packages where a higher ratio indicateseach maintainer is responsible for fewer contributors Out of 15million npm packages only 26 (38913) of the packages have listedcontributors and the average maintainer to contributors ratio is

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

32 Since JavaScript packages are small it makes sense that theymay not need many contributors in one package To understandthe extreme cases where package maintainers added many contrib-utors we picked the top 1 packages in npm where the maintainerto contributor ratio was minimum

We found that the selected 1 (389) packages had an averagemaintainer to contributorrsquos ratio of 140

326 W6 Overloaded Maintainer An attacker may targeta maintainer who owns many packages because the main-tainer may not have enough time to maintain security of allthe packages Overloaded maintainers are weak links if the main-tainers 1) have a large number of upstream packages 2) use a largedependency chain in packages attackers may inject malware intheir dependency or 3) if they have many inactive packages anattacker may try to take over those packages A well-known attackon an overloading maintainer is event-stream attack [6] where anoriginal maintainer handed over a popular npm package ownershipto a malicious maintainer simply because the package was inactiveand the original maintainer does not need that package anymoreWe queried the same maintainer in our database We found that heis an overloaded maintainer and he owns 62 inactive packageswith more than 30k direct dependents Although one may arguethat a maintainer who owns many packages may consider as a signof stability over any new maintainer while we agree with the state-ment we want to include such a maintainer to the supply chainadministratorrsquos security radar Attackers are more likely to targetsuch maintainers if the maintainer is overloaded and does not haveenough time to maintain all of his packages equally We propose thesupply chain administrator should follow up the security measureof overloading maintainers such as minimum package dependencyownership transfer of inactive packages two-factor authorizationset up on their account active domain registration

Analysis of overloaded maintainersWe analyzed the over-loaded maintainer using our maintainer reach metric (section31)We found that 482 maintainers own more than one packagewhereas only 248 have downstream users We again picked thetop 1(4743) maintainers ranked by higher maintainer reach tounderstand the extreme cases

We found that the top 1 maintainers own an average num-ber of 1803 packages with direct dependents of 4010 averagepackages

We analyzed further to understand risk factors associated withinactive packages and downstream dependencies We found that30 (1442) of maintainers do not hold any inactive packages how-ever 70 shows otherwise In terms of package dependency wefound that 80 of packages have dependencies which indicatesthat overloading maintainers are responsible for their dependencychain security to protect the downstream users

4 CASE STUDYThis section provides three case studies of our proposed weaklink signals First we present our analysis in popular packages Wehypothesize that the popular package maintainers are aware of such

weak links and will avoid them to protect the downstream userrsquossecurity Second we present a case study on malicious packagesidentified by our signals and at the end we present an example ofa data-driven attack

41 Case study 1 Popular packagesConsidering 650 growth in supply chain attacks as an indica-tion [3] attackers will likely continue to target popular packagesand maintainers as a preferred path to exploiting downstream vic-tims at scale Hence we analyzed whether weak link signals existin popular packages

To measure package popularity we used the package reach met-rics from Section 31 We picked the top 10000 packages from the(1) package dependents and (2) package downloads analysis andcombined the two lists removing duplicates into a combined pop-ular package sample The sample included 14892 packages (1 oftotal packages) as popular with an average of 9374 dependents and885 million downloads in 12 months We note that the popularpackage analysis does not consider transitive dependents Hencethe impact of a weak link in a popular package may present a highersupply chain risk than presented

Expired Domain (W1) Among the popular packages 33 pack-ages (average direct dependents 3829 and average downloads 111million) have at least one maintainer with an expired domain Theseaccounts can be used to compromise the package unless two-factorauthentication is enabled

W1 12637 packages that depend on popular packages wereexposed to higher supply chain risk due to expired domains

Install Scripts (W2) Among the popular packages 362 pack-ages (average direct dependents 14163 and downloads average346 million) have install scripts This is an encouraging resultshowing that 975 of popular packages are aware of the risks andavoid using install scripts

W2 362 popular packages had install scripts and exposed1416 packages on average to attacks through install scripts

Unmaintained Package (W3) Of the popular packages 38(5645) packages (average direct dependents 4224 and downloadsaverage 761 million) had an inactive status Interestingly 560 ofthem (average direct downstream dependents 13693 and down-loads average 328 million) were deprecated Deprecated packages(Section 312) are a classic example of unmaintained packageswhere maintainers officially declare the package unmaintainedWe also found 619 inactive maintainers who still own 645 popularpackages

W3 38 of popular packages were inactive and 560 of themwere deprecated 645 popular packages did not have any ac-tive maintainers An inactive package exposed 422 packageson average to higher supply chain risk

Too many Maintainers (W4) Among the popular packages421 packages (average direct dependents 2692 and downloadsaverage 416 million) had an average of 346 maintainers

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Large packages may need more maintainers Hence we hypoth-esize that 421 popular packages are large compared to the otherpopular packages To test the hypothesis we extracted and storedthe ldquounpacked sizerdquo and ldquofile countrdquo property to measure the pack-age size We created two samples (1) 14471 popular packageswithout 421 packages (average of 54 files 6968 MB unpacked sizeand 24 maintainers) and 2) 14892 popular packages (average of55 files 6978 MB unpacked size and 34 maintainers) We foundthe 421 packages added an average cost of one maintainer withoutany significant difference between package size and file countsHence having too many maintainers in a JavaScript package doesnot necessarily indicate packages are large

W4 421 popular packages had an average of 346 maintainersand potentially exposed their dependents to higher supplychain risk

Too many contributors (W5) Among the popular packages23 packages (average direct dependents 645804 and downloadsaverage 642 million) had an average maintainer to contributorrsquosratio of 137 Therefore on average one maintainer has to manage37 contributorsrsquo to secure packages against malicious code andmalicious contributors

W5 23 popular packages are exposed to higher supply chainrisk due to maintainers having to manage an average of 37contributors

Overloaded Maintainers (W6) Among the popular packages9871 (663) are owned by popularmaintainers whomanagemanypackages (average direct dependents 103902 and downloads aver-age 1103 million) Attackers can try to compromise these main-tainers to exploit downstream victims at scale

W6 2491 popular maintainers own 9871 popular packages

42 Case study 2 Installation scriptsThe scope of this work is to analyze package metadata to identifyweak links that lead to possible supply chain attacks

Even though the focus was not on detecting malicious packagesour analysis of installation scripts revealedmalicious activity withinpackages Some package have shell commands embedded directlyin the packagejson file We investigated the shell to verify whetherthey could help detect malicious packages We queried keywordsrelated to read or write user-sensitive information from the filesystem (eg etcpasswd etcshadow) or HTTP requests to transferuser data to a remote server (eg curl wget) Our keyword searchis motivated by Duan et al [4] who proposed a framework andterminologies derived from existing supply chain attacks

We identified 74 packages where the installation scripts includedkeywords like curl etcshadow etcpasswd hostname Of those11 were found to be malicious packages and rest were benign Allthe malicious packages performed DNS lookups and send user-sensitive data to a specific URL Three of the malicious packageincluded shell commands for both windows and UNIX OS

Table 1 A subset of combination signals quantified fromourpopular package analysis

Signal combination packagecount

Signal combination packagecount

W6 cap W3 capW4 32 W2 cap W3 86W6 cap W3 capW1 5 W6 cap W1 17W6 cap W3 capW2 38 W6 cap W2 187W3 cap W4 32 W6 cap W3 3356

After three months of our initial data collection we collecteda new security holding package list from npm to validate our re-sult Our findings aligned with the npm security specialists theyidentified 10 out of the 11 packages as malicious We reported theremaining malicious package to the npm security team

This analysis demonstrates the risk of having a dependency onpackages that include installation scripts

43 Case study 3 Data-Driven AttackerTo illustratewe used the sample of popular packages (14892) sampleFigure 1 shows an example of a data-driven attack that combinestwo of the signals Expired Maintainer Domain (W1) and Unmain-tained packages (W3) An attacker can scan the npm registry forpopular packages and download relevant metadata from the pack-agejson file Then an attacker can easily extract inactive packages(5645) and maintainer email addresses (1108) from package meta-data The next step would be looking for domain availability in thedomain registrar site (eg GoDaddy)

Figure 1 Data-driven Attack by combining W1 and W3

In this stage we have identified 15 domains available for sale inGodaddy An attacker can purchase these domains and alter theMX record to hijack the maintainerrsquos email addresses (section 321)In general npm requires an email address to set up an user accountattackers can reset npm accounts by using those 15 email addressesto access 899 npm packages

Another example of a data-driven attack would be looking foroverloaded (W6) inactivemaintainers (W3) in popular packages Anattacker can perform social engineering to take over these packagesWe have found 25 popular packages associated with 16 overloadedinactive maintainers Out of 25 for three packages (average packagedependents 117 and downloads 6906) the last update year was 2013

Table 1 shows a subset of other possible combinations and howthey would reduce a large number of 1M+ npm packages to a smallnumber of candidate packages for potential supply chain attacks

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

5 RQ2 SURVEYIn this section we discuss our RQ2How do practitioners perceivethe identified weak links in the npm supply chain To that end weconducted a survey on npm package maintainers We selected thetop 10 (47433) of the maintainers ranked by the number of ownedpackages as survey candidates We chose this selection criterionfor the following two reasons The maintainers are 1) experiencedwith JavaScript packages since they own many packages in npmand 2) part of a large supply chain as many packages use thesemaintainerrsquos packages as dependencies

Our survey is designed to capture practitionersrsquo views on ourproposed weak links and whether they want to be notified if any ofthe proposed weak links exist in their package dependency graph

Table 2 is a complete summary of our survey We attached theagreement and disagreement of practitioners regarding each weaklink signal in the second and third columns The last column pro-vides the percentage of the practitioner who wants to be notifiedabout such signals We observed that package maintainers sup-ported three of our proposed signals as a weak link and they wouldwant to be notified about these signals existence in the packagedependency graph

Agreement Out of six signals more than 50 of practitionerssupported W1 (Expired Maintainer Domain)W2 (Install Scrips) andW3 (Unmaintained Packages) as a weak link signal We observedthat practitioners indicated a desire to be notified of weak linksignals despite not supporting the weak link signal directly Asshown in Table 2 the percentage in the want to be notified columnexceeds the percentage of rdquoAgreed as a weak link for W1 W2 andW3

Disagreement Our survey findings show that practitioners didnot support W4 (Total Number of Maintainers) W5 (Maintainerto contributors ratio) and W6 (Packages per maintainer) as weaklink signals More than 40 of people disagreed with these signalswhereas less than 20 supported this as a weak link To understandthe context behind why practitioners think otherwise we analyzedpractitionerrsquos comments in the open-ended questions For examplendashldquoHistorically I would have agreed with ldquoToo many maintainersrdquo beinga risk but as long as they are known people to youI believe it tobe okrdquo or Many contributors is not a signal itrsquos a desired state ofopen source and if all are reviewed by a reliable maintainer they areno risk Both of the statements indirectly support our weak linksassumptions under certain condition like ldquoreliable maintainerrdquo orldquoas long as maintainer are known peoplerdquo Also practitioners raiseconcerns about these three signals being subjective and may varyfrom project to project Moreover we do not claim that having anyof these signals means a bad package instead our proposed metricsare guidelines for practitioners to make informed decisions aboutthe use of the package

New signals proposed by Maintainers We asked an open-ended question for the respondents to recommend additional signalsthat we should consider in future work Out of 470 practitioners wereceived 213 responses for new signals To label the new signals tworesearchers separately reviewed the 213 responses and comparedresults We included the new signal if the signal was raised by ldquoatleastrdquo two respondents In some cases the practitionerrsquos intent was

unclear and the comment was discarded We summarize the twomost frequently mentioned concepts

Maintainers 41 practitioners in some waymentioned maintain-ers being a risk We have identified the most frequent discussionon maintainers and propose the following four signals

bull Ownership transfer or adding new maintainers Anysudden change in a package maintainers list is proposed tobe a weak link from practitioners Practitioners would wantto know about such changes in the package dependencygraph if a package transfers the ownership or adds any newmaintainers

bull Maintainer Identity Practitioners commented on the roleof maintainer expertise and identity verification in the supplychain A maintainer with a real picture organizational back-ground and email address linked social media or repositoryhistory of co-authoring with other maintainers will makea maintainer reliable over any new maintainers Althoughnpm provides the list of packages owned by maintainersenforcing maintainers to add real identity or experience maybe a big security improvement in the community

bull MaintainerTwo-FactorAuthenticationAmaintainermiss-ing two factored authentication (2FA) for package hostingor releasing a new version or login to the npm account isa weak link 2FA authentication should be enforced for allmaintainers to publish a package

Integration of version control software 52 (244) of the re-sponses were related to version control software(VCS) packagerepository and npm integration

bull No source code repository When a package has no orwrong public source code repositoryhomepageVCS or thelinked repository is archived the access to review sourcecode is restricted forcing users to trust a package blindly

bull npm package vs source code repository The practition-ers raise the concern to validate the published npm packageagainst the code on the source code repository Hence allfiles inside a given package must match the exact contentsin the repository

bull CICD pipeline Missing CICD infrastructure to test codeand build of npm packages The practitioners also mentionedthat the type of CICD services matters Whether CICD ser-vice providers or self-hosted infrastructure the practitionersprefer details on testing code coverage or alerts on the useof compromised CICD systems from past security incidents

bull Open pull request A package with many open issues andpull requests (PR) indicates a poorly maintained packageOne can view if a package has open issues in npm onlinerepository However practitioners commented on addingsuch information in the package dependency graph

Since our analysis is limited to package metadata from npm we didnot consider any repository-related weak link signal in this studybut can be considered as a future research direction

6 LIMITATIONSWe proposed and studied several weak link signals inferred fromnpm metadata and which can be used to evaluate security risksassociated with npm packages However we do not claim that these

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

32 Since JavaScript packages are small it makes sense that theymay not need many contributors in one package To understandthe extreme cases where package maintainers added many contrib-utors we picked the top 1 packages in npm where the maintainerto contributor ratio was minimum

We found that the selected 1 (389) packages had an averagemaintainer to contributorrsquos ratio of 140

326 W6 Overloaded Maintainer An attacker may targeta maintainer who owns many packages because the main-tainer may not have enough time to maintain security of allthe packages Overloaded maintainers are weak links if the main-tainers 1) have a large number of upstream packages 2) use a largedependency chain in packages attackers may inject malware intheir dependency or 3) if they have many inactive packages anattacker may try to take over those packages A well-known attackon an overloading maintainer is event-stream attack [6] where anoriginal maintainer handed over a popular npm package ownershipto a malicious maintainer simply because the package was inactiveand the original maintainer does not need that package anymoreWe queried the same maintainer in our database We found that heis an overloaded maintainer and he owns 62 inactive packageswith more than 30k direct dependents Although one may arguethat a maintainer who owns many packages may consider as a signof stability over any new maintainer while we agree with the state-ment we want to include such a maintainer to the supply chainadministratorrsquos security radar Attackers are more likely to targetsuch maintainers if the maintainer is overloaded and does not haveenough time to maintain all of his packages equally We propose thesupply chain administrator should follow up the security measureof overloading maintainers such as minimum package dependencyownership transfer of inactive packages two-factor authorizationset up on their account active domain registration

Analysis of overloaded maintainersWe analyzed the over-loaded maintainer using our maintainer reach metric (section31)We found that 482 maintainers own more than one packagewhereas only 248 have downstream users We again picked thetop 1(4743) maintainers ranked by higher maintainer reach tounderstand the extreme cases

We found that the top 1 maintainers own an average num-ber of 1803 packages with direct dependents of 4010 averagepackages

We analyzed further to understand risk factors associated withinactive packages and downstream dependencies We found that30 (1442) of maintainers do not hold any inactive packages how-ever 70 shows otherwise In terms of package dependency wefound that 80 of packages have dependencies which indicatesthat overloading maintainers are responsible for their dependencychain security to protect the downstream users

4 CASE STUDYThis section provides three case studies of our proposed weaklink signals First we present our analysis in popular packages Wehypothesize that the popular package maintainers are aware of such

weak links and will avoid them to protect the downstream userrsquossecurity Second we present a case study on malicious packagesidentified by our signals and at the end we present an example ofa data-driven attack

41 Case study 1 Popular packagesConsidering 650 growth in supply chain attacks as an indica-tion [3] attackers will likely continue to target popular packagesand maintainers as a preferred path to exploiting downstream vic-tims at scale Hence we analyzed whether weak link signals existin popular packages

To measure package popularity we used the package reach met-rics from Section 31 We picked the top 10000 packages from the(1) package dependents and (2) package downloads analysis andcombined the two lists removing duplicates into a combined pop-ular package sample The sample included 14892 packages (1 oftotal packages) as popular with an average of 9374 dependents and885 million downloads in 12 months We note that the popularpackage analysis does not consider transitive dependents Hencethe impact of a weak link in a popular package may present a highersupply chain risk than presented

Expired Domain (W1) Among the popular packages 33 pack-ages (average direct dependents 3829 and average downloads 111million) have at least one maintainer with an expired domain Theseaccounts can be used to compromise the package unless two-factorauthentication is enabled

W1 12637 packages that depend on popular packages wereexposed to higher supply chain risk due to expired domains

Install Scripts (W2) Among the popular packages 362 pack-ages (average direct dependents 14163 and downloads average346 million) have install scripts This is an encouraging resultshowing that 975 of popular packages are aware of the risks andavoid using install scripts

W2 362 popular packages had install scripts and exposed1416 packages on average to attacks through install scripts

Unmaintained Package (W3) Of the popular packages 38(5645) packages (average direct dependents 4224 and downloadsaverage 761 million) had an inactive status Interestingly 560 ofthem (average direct downstream dependents 13693 and down-loads average 328 million) were deprecated Deprecated packages(Section 312) are a classic example of unmaintained packageswhere maintainers officially declare the package unmaintainedWe also found 619 inactive maintainers who still own 645 popularpackages

W3 38 of popular packages were inactive and 560 of themwere deprecated 645 popular packages did not have any ac-tive maintainers An inactive package exposed 422 packageson average to higher supply chain risk

Too many Maintainers (W4) Among the popular packages421 packages (average direct dependents 2692 and downloadsaverage 416 million) had an average of 346 maintainers

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Large packages may need more maintainers Hence we hypoth-esize that 421 popular packages are large compared to the otherpopular packages To test the hypothesis we extracted and storedthe ldquounpacked sizerdquo and ldquofile countrdquo property to measure the pack-age size We created two samples (1) 14471 popular packageswithout 421 packages (average of 54 files 6968 MB unpacked sizeand 24 maintainers) and 2) 14892 popular packages (average of55 files 6978 MB unpacked size and 34 maintainers) We foundthe 421 packages added an average cost of one maintainer withoutany significant difference between package size and file countsHence having too many maintainers in a JavaScript package doesnot necessarily indicate packages are large

W4 421 popular packages had an average of 346 maintainersand potentially exposed their dependents to higher supplychain risk

Too many contributors (W5) Among the popular packages23 packages (average direct dependents 645804 and downloadsaverage 642 million) had an average maintainer to contributorrsquosratio of 137 Therefore on average one maintainer has to manage37 contributorsrsquo to secure packages against malicious code andmalicious contributors

W5 23 popular packages are exposed to higher supply chainrisk due to maintainers having to manage an average of 37contributors

Overloaded Maintainers (W6) Among the popular packages9871 (663) are owned by popularmaintainers whomanagemanypackages (average direct dependents 103902 and downloads aver-age 1103 million) Attackers can try to compromise these main-tainers to exploit downstream victims at scale

W6 2491 popular maintainers own 9871 popular packages

42 Case study 2 Installation scriptsThe scope of this work is to analyze package metadata to identifyweak links that lead to possible supply chain attacks

Even though the focus was not on detecting malicious packagesour analysis of installation scripts revealedmalicious activity withinpackages Some package have shell commands embedded directlyin the packagejson file We investigated the shell to verify whetherthey could help detect malicious packages We queried keywordsrelated to read or write user-sensitive information from the filesystem (eg etcpasswd etcshadow) or HTTP requests to transferuser data to a remote server (eg curl wget) Our keyword searchis motivated by Duan et al [4] who proposed a framework andterminologies derived from existing supply chain attacks

We identified 74 packages where the installation scripts includedkeywords like curl etcshadow etcpasswd hostname Of those11 were found to be malicious packages and rest were benign Allthe malicious packages performed DNS lookups and send user-sensitive data to a specific URL Three of the malicious packageincluded shell commands for both windows and UNIX OS

Table 1 A subset of combination signals quantified fromourpopular package analysis

Signal combination packagecount

Signal combination packagecount

W6 cap W3 capW4 32 W2 cap W3 86W6 cap W3 capW1 5 W6 cap W1 17W6 cap W3 capW2 38 W6 cap W2 187W3 cap W4 32 W6 cap W3 3356

After three months of our initial data collection we collecteda new security holding package list from npm to validate our re-sult Our findings aligned with the npm security specialists theyidentified 10 out of the 11 packages as malicious We reported theremaining malicious package to the npm security team

This analysis demonstrates the risk of having a dependency onpackages that include installation scripts

43 Case study 3 Data-Driven AttackerTo illustratewe used the sample of popular packages (14892) sampleFigure 1 shows an example of a data-driven attack that combinestwo of the signals Expired Maintainer Domain (W1) and Unmain-tained packages (W3) An attacker can scan the npm registry forpopular packages and download relevant metadata from the pack-agejson file Then an attacker can easily extract inactive packages(5645) and maintainer email addresses (1108) from package meta-data The next step would be looking for domain availability in thedomain registrar site (eg GoDaddy)

Figure 1 Data-driven Attack by combining W1 and W3

In this stage we have identified 15 domains available for sale inGodaddy An attacker can purchase these domains and alter theMX record to hijack the maintainerrsquos email addresses (section 321)In general npm requires an email address to set up an user accountattackers can reset npm accounts by using those 15 email addressesto access 899 npm packages

Another example of a data-driven attack would be looking foroverloaded (W6) inactivemaintainers (W3) in popular packages Anattacker can perform social engineering to take over these packagesWe have found 25 popular packages associated with 16 overloadedinactive maintainers Out of 25 for three packages (average packagedependents 117 and downloads 6906) the last update year was 2013

Table 1 shows a subset of other possible combinations and howthey would reduce a large number of 1M+ npm packages to a smallnumber of candidate packages for potential supply chain attacks

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

5 RQ2 SURVEYIn this section we discuss our RQ2How do practitioners perceivethe identified weak links in the npm supply chain To that end weconducted a survey on npm package maintainers We selected thetop 10 (47433) of the maintainers ranked by the number of ownedpackages as survey candidates We chose this selection criterionfor the following two reasons The maintainers are 1) experiencedwith JavaScript packages since they own many packages in npmand 2) part of a large supply chain as many packages use thesemaintainerrsquos packages as dependencies

Our survey is designed to capture practitionersrsquo views on ourproposed weak links and whether they want to be notified if any ofthe proposed weak links exist in their package dependency graph

Table 2 is a complete summary of our survey We attached theagreement and disagreement of practitioners regarding each weaklink signal in the second and third columns The last column pro-vides the percentage of the practitioner who wants to be notifiedabout such signals We observed that package maintainers sup-ported three of our proposed signals as a weak link and they wouldwant to be notified about these signals existence in the packagedependency graph

Agreement Out of six signals more than 50 of practitionerssupported W1 (Expired Maintainer Domain)W2 (Install Scrips) andW3 (Unmaintained Packages) as a weak link signal We observedthat practitioners indicated a desire to be notified of weak linksignals despite not supporting the weak link signal directly Asshown in Table 2 the percentage in the want to be notified columnexceeds the percentage of rdquoAgreed as a weak link for W1 W2 andW3

Disagreement Our survey findings show that practitioners didnot support W4 (Total Number of Maintainers) W5 (Maintainerto contributors ratio) and W6 (Packages per maintainer) as weaklink signals More than 40 of people disagreed with these signalswhereas less than 20 supported this as a weak link To understandthe context behind why practitioners think otherwise we analyzedpractitionerrsquos comments in the open-ended questions For examplendashldquoHistorically I would have agreed with ldquoToo many maintainersrdquo beinga risk but as long as they are known people to youI believe it tobe okrdquo or Many contributors is not a signal itrsquos a desired state ofopen source and if all are reviewed by a reliable maintainer they areno risk Both of the statements indirectly support our weak linksassumptions under certain condition like ldquoreliable maintainerrdquo orldquoas long as maintainer are known peoplerdquo Also practitioners raiseconcerns about these three signals being subjective and may varyfrom project to project Moreover we do not claim that having anyof these signals means a bad package instead our proposed metricsare guidelines for practitioners to make informed decisions aboutthe use of the package

New signals proposed by Maintainers We asked an open-ended question for the respondents to recommend additional signalsthat we should consider in future work Out of 470 practitioners wereceived 213 responses for new signals To label the new signals tworesearchers separately reviewed the 213 responses and comparedresults We included the new signal if the signal was raised by ldquoatleastrdquo two respondents In some cases the practitionerrsquos intent was

unclear and the comment was discarded We summarize the twomost frequently mentioned concepts

Maintainers 41 practitioners in some waymentioned maintain-ers being a risk We have identified the most frequent discussionon maintainers and propose the following four signals

bull Ownership transfer or adding new maintainers Anysudden change in a package maintainers list is proposed tobe a weak link from practitioners Practitioners would wantto know about such changes in the package dependencygraph if a package transfers the ownership or adds any newmaintainers

bull Maintainer Identity Practitioners commented on the roleof maintainer expertise and identity verification in the supplychain A maintainer with a real picture organizational back-ground and email address linked social media or repositoryhistory of co-authoring with other maintainers will makea maintainer reliable over any new maintainers Althoughnpm provides the list of packages owned by maintainersenforcing maintainers to add real identity or experience maybe a big security improvement in the community

bull MaintainerTwo-FactorAuthenticationAmaintainermiss-ing two factored authentication (2FA) for package hostingor releasing a new version or login to the npm account isa weak link 2FA authentication should be enforced for allmaintainers to publish a package

Integration of version control software 52 (244) of the re-sponses were related to version control software(VCS) packagerepository and npm integration

bull No source code repository When a package has no orwrong public source code repositoryhomepageVCS or thelinked repository is archived the access to review sourcecode is restricted forcing users to trust a package blindly

bull npm package vs source code repository The practition-ers raise the concern to validate the published npm packageagainst the code on the source code repository Hence allfiles inside a given package must match the exact contentsin the repository

bull CICD pipeline Missing CICD infrastructure to test codeand build of npm packages The practitioners also mentionedthat the type of CICD services matters Whether CICD ser-vice providers or self-hosted infrastructure the practitionersprefer details on testing code coverage or alerts on the useof compromised CICD systems from past security incidents

bull Open pull request A package with many open issues andpull requests (PR) indicates a poorly maintained packageOne can view if a package has open issues in npm onlinerepository However practitioners commented on addingsuch information in the package dependency graph

Since our analysis is limited to package metadata from npm we didnot consider any repository-related weak link signal in this studybut can be considered as a future research direction

6 LIMITATIONSWe proposed and studied several weak link signals inferred fromnpm metadata and which can be used to evaluate security risksassociated with npm packages However we do not claim that these

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Large packages may need more maintainers Hence we hypoth-esize that 421 popular packages are large compared to the otherpopular packages To test the hypothesis we extracted and storedthe ldquounpacked sizerdquo and ldquofile countrdquo property to measure the pack-age size We created two samples (1) 14471 popular packageswithout 421 packages (average of 54 files 6968 MB unpacked sizeand 24 maintainers) and 2) 14892 popular packages (average of55 files 6978 MB unpacked size and 34 maintainers) We foundthe 421 packages added an average cost of one maintainer withoutany significant difference between package size and file countsHence having too many maintainers in a JavaScript package doesnot necessarily indicate packages are large

W4 421 popular packages had an average of 346 maintainersand potentially exposed their dependents to higher supplychain risk

Too many contributors (W5) Among the popular packages23 packages (average direct dependents 645804 and downloadsaverage 642 million) had an average maintainer to contributorrsquosratio of 137 Therefore on average one maintainer has to manage37 contributorsrsquo to secure packages against malicious code andmalicious contributors

W5 23 popular packages are exposed to higher supply chainrisk due to maintainers having to manage an average of 37contributors

Overloaded Maintainers (W6) Among the popular packages9871 (663) are owned by popularmaintainers whomanagemanypackages (average direct dependents 103902 and downloads aver-age 1103 million) Attackers can try to compromise these main-tainers to exploit downstream victims at scale

W6 2491 popular maintainers own 9871 popular packages

42 Case study 2 Installation scriptsThe scope of this work is to analyze package metadata to identifyweak links that lead to possible supply chain attacks

Even though the focus was not on detecting malicious packagesour analysis of installation scripts revealedmalicious activity withinpackages Some package have shell commands embedded directlyin the packagejson file We investigated the shell to verify whetherthey could help detect malicious packages We queried keywordsrelated to read or write user-sensitive information from the filesystem (eg etcpasswd etcshadow) or HTTP requests to transferuser data to a remote server (eg curl wget) Our keyword searchis motivated by Duan et al [4] who proposed a framework andterminologies derived from existing supply chain attacks

We identified 74 packages where the installation scripts includedkeywords like curl etcshadow etcpasswd hostname Of those11 were found to be malicious packages and rest were benign Allthe malicious packages performed DNS lookups and send user-sensitive data to a specific URL Three of the malicious packageincluded shell commands for both windows and UNIX OS

Table 1 A subset of combination signals quantified fromourpopular package analysis

Signal combination packagecount

Signal combination packagecount

W6 cap W3 capW4 32 W2 cap W3 86W6 cap W3 capW1 5 W6 cap W1 17W6 cap W3 capW2 38 W6 cap W2 187W3 cap W4 32 W6 cap W3 3356

After three months of our initial data collection we collecteda new security holding package list from npm to validate our re-sult Our findings aligned with the npm security specialists theyidentified 10 out of the 11 packages as malicious We reported theremaining malicious package to the npm security team

This analysis demonstrates the risk of having a dependency onpackages that include installation scripts

43 Case study 3 Data-Driven AttackerTo illustratewe used the sample of popular packages (14892) sampleFigure 1 shows an example of a data-driven attack that combinestwo of the signals Expired Maintainer Domain (W1) and Unmain-tained packages (W3) An attacker can scan the npm registry forpopular packages and download relevant metadata from the pack-agejson file Then an attacker can easily extract inactive packages(5645) and maintainer email addresses (1108) from package meta-data The next step would be looking for domain availability in thedomain registrar site (eg GoDaddy)

Figure 1 Data-driven Attack by combining W1 and W3

In this stage we have identified 15 domains available for sale inGodaddy An attacker can purchase these domains and alter theMX record to hijack the maintainerrsquos email addresses (section 321)In general npm requires an email address to set up an user accountattackers can reset npm accounts by using those 15 email addressesto access 899 npm packages

Another example of a data-driven attack would be looking foroverloaded (W6) inactivemaintainers (W3) in popular packages Anattacker can perform social engineering to take over these packagesWe have found 25 popular packages associated with 16 overloadedinactive maintainers Out of 25 for three packages (average packagedependents 117 and downloads 6906) the last update year was 2013

Table 1 shows a subset of other possible combinations and howthey would reduce a large number of 1M+ npm packages to a smallnumber of candidate packages for potential supply chain attacks

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

5 RQ2 SURVEYIn this section we discuss our RQ2How do practitioners perceivethe identified weak links in the npm supply chain To that end weconducted a survey on npm package maintainers We selected thetop 10 (47433) of the maintainers ranked by the number of ownedpackages as survey candidates We chose this selection criterionfor the following two reasons The maintainers are 1) experiencedwith JavaScript packages since they own many packages in npmand 2) part of a large supply chain as many packages use thesemaintainerrsquos packages as dependencies

Our survey is designed to capture practitionersrsquo views on ourproposed weak links and whether they want to be notified if any ofthe proposed weak links exist in their package dependency graph

Table 2 is a complete summary of our survey We attached theagreement and disagreement of practitioners regarding each weaklink signal in the second and third columns The last column pro-vides the percentage of the practitioner who wants to be notifiedabout such signals We observed that package maintainers sup-ported three of our proposed signals as a weak link and they wouldwant to be notified about these signals existence in the packagedependency graph

Agreement Out of six signals more than 50 of practitionerssupported W1 (Expired Maintainer Domain)W2 (Install Scrips) andW3 (Unmaintained Packages) as a weak link signal We observedthat practitioners indicated a desire to be notified of weak linksignals despite not supporting the weak link signal directly Asshown in Table 2 the percentage in the want to be notified columnexceeds the percentage of rdquoAgreed as a weak link for W1 W2 andW3

Disagreement Our survey findings show that practitioners didnot support W4 (Total Number of Maintainers) W5 (Maintainerto contributors ratio) and W6 (Packages per maintainer) as weaklink signals More than 40 of people disagreed with these signalswhereas less than 20 supported this as a weak link To understandthe context behind why practitioners think otherwise we analyzedpractitionerrsquos comments in the open-ended questions For examplendashldquoHistorically I would have agreed with ldquoToo many maintainersrdquo beinga risk but as long as they are known people to youI believe it tobe okrdquo or Many contributors is not a signal itrsquos a desired state ofopen source and if all are reviewed by a reliable maintainer they areno risk Both of the statements indirectly support our weak linksassumptions under certain condition like ldquoreliable maintainerrdquo orldquoas long as maintainer are known peoplerdquo Also practitioners raiseconcerns about these three signals being subjective and may varyfrom project to project Moreover we do not claim that having anyof these signals means a bad package instead our proposed metricsare guidelines for practitioners to make informed decisions aboutthe use of the package

New signals proposed by Maintainers We asked an open-ended question for the respondents to recommend additional signalsthat we should consider in future work Out of 470 practitioners wereceived 213 responses for new signals To label the new signals tworesearchers separately reviewed the 213 responses and comparedresults We included the new signal if the signal was raised by ldquoatleastrdquo two respondents In some cases the practitionerrsquos intent was

unclear and the comment was discarded We summarize the twomost frequently mentioned concepts

Maintainers 41 practitioners in some waymentioned maintain-ers being a risk We have identified the most frequent discussionon maintainers and propose the following four signals

bull Ownership transfer or adding new maintainers Anysudden change in a package maintainers list is proposed tobe a weak link from practitioners Practitioners would wantto know about such changes in the package dependencygraph if a package transfers the ownership or adds any newmaintainers

bull Maintainer Identity Practitioners commented on the roleof maintainer expertise and identity verification in the supplychain A maintainer with a real picture organizational back-ground and email address linked social media or repositoryhistory of co-authoring with other maintainers will makea maintainer reliable over any new maintainers Althoughnpm provides the list of packages owned by maintainersenforcing maintainers to add real identity or experience maybe a big security improvement in the community

bull MaintainerTwo-FactorAuthenticationAmaintainermiss-ing two factored authentication (2FA) for package hostingor releasing a new version or login to the npm account isa weak link 2FA authentication should be enforced for allmaintainers to publish a package

Integration of version control software 52 (244) of the re-sponses were related to version control software(VCS) packagerepository and npm integration

bull No source code repository When a package has no orwrong public source code repositoryhomepageVCS or thelinked repository is archived the access to review sourcecode is restricted forcing users to trust a package blindly

bull npm package vs source code repository The practition-ers raise the concern to validate the published npm packageagainst the code on the source code repository Hence allfiles inside a given package must match the exact contentsin the repository

bull CICD pipeline Missing CICD infrastructure to test codeand build of npm packages The practitioners also mentionedthat the type of CICD services matters Whether CICD ser-vice providers or self-hosted infrastructure the practitionersprefer details on testing code coverage or alerts on the useof compromised CICD systems from past security incidents

bull Open pull request A package with many open issues andpull requests (PR) indicates a poorly maintained packageOne can view if a package has open issues in npm onlinerepository However practitioners commented on addingsuch information in the package dependency graph

Since our analysis is limited to package metadata from npm we didnot consider any repository-related weak link signal in this studybut can be considered as a future research direction

6 LIMITATIONSWe proposed and studied several weak link signals inferred fromnpm metadata and which can be used to evaluate security risksassociated with npm packages However we do not claim that these

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

5 RQ2 SURVEYIn this section we discuss our RQ2How do practitioners perceivethe identified weak links in the npm supply chain To that end weconducted a survey on npm package maintainers We selected thetop 10 (47433) of the maintainers ranked by the number of ownedpackages as survey candidates We chose this selection criterionfor the following two reasons The maintainers are 1) experiencedwith JavaScript packages since they own many packages in npmand 2) part of a large supply chain as many packages use thesemaintainerrsquos packages as dependencies

Our survey is designed to capture practitionersrsquo views on ourproposed weak links and whether they want to be notified if any ofthe proposed weak links exist in their package dependency graph

Table 2 is a complete summary of our survey We attached theagreement and disagreement of practitioners regarding each weaklink signal in the second and third columns The last column pro-vides the percentage of the practitioner who wants to be notifiedabout such signals We observed that package maintainers sup-ported three of our proposed signals as a weak link and they wouldwant to be notified about these signals existence in the packagedependency graph

Agreement Out of six signals more than 50 of practitionerssupported W1 (Expired Maintainer Domain)W2 (Install Scrips) andW3 (Unmaintained Packages) as a weak link signal We observedthat practitioners indicated a desire to be notified of weak linksignals despite not supporting the weak link signal directly Asshown in Table 2 the percentage in the want to be notified columnexceeds the percentage of rdquoAgreed as a weak link for W1 W2 andW3

Disagreement Our survey findings show that practitioners didnot support W4 (Total Number of Maintainers) W5 (Maintainerto contributors ratio) and W6 (Packages per maintainer) as weaklink signals More than 40 of people disagreed with these signalswhereas less than 20 supported this as a weak link To understandthe context behind why practitioners think otherwise we analyzedpractitionerrsquos comments in the open-ended questions For examplendashldquoHistorically I would have agreed with ldquoToo many maintainersrdquo beinga risk but as long as they are known people to youI believe it tobe okrdquo or Many contributors is not a signal itrsquos a desired state ofopen source and if all are reviewed by a reliable maintainer they areno risk Both of the statements indirectly support our weak linksassumptions under certain condition like ldquoreliable maintainerrdquo orldquoas long as maintainer are known peoplerdquo Also practitioners raiseconcerns about these three signals being subjective and may varyfrom project to project Moreover we do not claim that having anyof these signals means a bad package instead our proposed metricsare guidelines for practitioners to make informed decisions aboutthe use of the package

New signals proposed by Maintainers We asked an open-ended question for the respondents to recommend additional signalsthat we should consider in future work Out of 470 practitioners wereceived 213 responses for new signals To label the new signals tworesearchers separately reviewed the 213 responses and comparedresults We included the new signal if the signal was raised by ldquoatleastrdquo two respondents In some cases the practitionerrsquos intent was

unclear and the comment was discarded We summarize the twomost frequently mentioned concepts

Maintainers 41 practitioners in some waymentioned maintain-ers being a risk We have identified the most frequent discussionon maintainers and propose the following four signals

bull Ownership transfer or adding new maintainers Anysudden change in a package maintainers list is proposed tobe a weak link from practitioners Practitioners would wantto know about such changes in the package dependencygraph if a package transfers the ownership or adds any newmaintainers

bull Maintainer Identity Practitioners commented on the roleof maintainer expertise and identity verification in the supplychain A maintainer with a real picture organizational back-ground and email address linked social media or repositoryhistory of co-authoring with other maintainers will makea maintainer reliable over any new maintainers Althoughnpm provides the list of packages owned by maintainersenforcing maintainers to add real identity or experience maybe a big security improvement in the community

bull MaintainerTwo-FactorAuthenticationAmaintainermiss-ing two factored authentication (2FA) for package hostingor releasing a new version or login to the npm account isa weak link 2FA authentication should be enforced for allmaintainers to publish a package

Integration of version control software 52 (244) of the re-sponses were related to version control software(VCS) packagerepository and npm integration

bull No source code repository When a package has no orwrong public source code repositoryhomepageVCS or thelinked repository is archived the access to review sourcecode is restricted forcing users to trust a package blindly

bull npm package vs source code repository The practition-ers raise the concern to validate the published npm packageagainst the code on the source code repository Hence allfiles inside a given package must match the exact contentsin the repository

bull CICD pipeline Missing CICD infrastructure to test codeand build of npm packages The practitioners also mentionedthat the type of CICD services matters Whether CICD ser-vice providers or self-hosted infrastructure the practitionersprefer details on testing code coverage or alerts on the useof compromised CICD systems from past security incidents

bull Open pull request A package with many open issues andpull requests (PR) indicates a poorly maintained packageOne can view if a package has open issues in npm onlinerepository However practitioners commented on addingsuch information in the package dependency graph

Since our analysis is limited to package metadata from npm we didnot consider any repository-related weak link signal in this studybut can be considered as a future research direction

6 LIMITATIONSWe proposed and studied several weak link signals inferred fromnpm metadata and which can be used to evaluate security risksassociated with npm packages However we do not claim that these

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

What are Weak Links in the npm Supply Chain Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USA

Table 2 Survey responses from 470 npm package maintainers

Weak link Signal Agreed as aweak link

Disagreed asa weak link

Want to be notified

W1 Amaintainerrsquos email address is associatedwith an expired domain 585 (275) 1319(62) 55 (258)W2 A package has pre and post install scripts 448 (211) 2234 (105) 574 (270)W2 A package script has shell commands- CURLWGET NCDIG 6745 (317) 809 (38) 726 (341)W3 A package has no update for X years 587 (276) 166(78) 636 (299)W3 A maintainer is inactive for X monthsyears 577 (271) 166(78) 636 (299)W4 A package has many maintainers 153 (72) 547 (257) 117 (55)W5 A maintainer reviews a large number of pull requests from manycontributors

1787 (84) 466 (219) 8 (38)

W6 A maintainer owns too many packages 187 (88) 494 (232) 96 (45)

weak link signals are the only ones that should be considered Addi-tional other signals suggested by practitioners indicate that furtherresearch is needed on weak link identifications Another limita-tion of this work is that three of our six proposed signals W4 (Toomany Maintainers)W5 (Too many Contributors)w6 (OverloadedMaintainer) were hard to empirically evaluate because we did nothave enough metadata on maintainers activities to validate thesein contrast to the clearer ground truth we have for W1 W2 W3To address this limitation future work could try to collect andleverage additional maintainer metadata including commit historyvulnerability fixes and maintainers turnover (how maintainers ofa package are added or removed over time) Unfortunately suchmetadata is not currently available from npm Another limitationof our study is that we analyzed npm ecosystem only which is thelargest package manager ecosystem today but we did not evalu-ate other package manager ecosystems Despite these limitationswe believe our proposed weak link signal detection approach isapplicable to other such ecosystems

7 DISCUSSIONThrough this study we increase awareness and visibility in detect-ing weak link signals to enhance supply chain security Althoughthis study does not provide a complete solution to mitigate the pro-posedweak link signals we expect the findingswill aid practitionersin predicting package vulnerability and reducing supply chain risksThe following subsections discuss our recommendations on howto strengthen supply chain security proactively instead of reactingto attacks

71 Risk ModelCurrently JavaScript npm package users have access to publicdata on package dependencies as well as information about thepackage maintainers However npm packages are not currentlyassociated with an overall health or security score Thereforeestimating a security risk score associated with installing and usinga new package is currently hard Our work identifies several weaklink signals which could be used for this purpose We envision acommunity effort that could address this problem in the near future

For package managers npm could compute and display a risk modelbased on weak link signals Package managers would then knowwhere their packages stand and improve their security scores byaddressing the identified weaknesses Such a risk model wouldallow package users to make more educated data-driven decisionsand comparisons before including new packages into their supplychains To this end we suggest adding automated indicators forW1 W2 W3 in the OpenSSF Metrics [40] and OpenSSF Scorecard[41] projects

72 Control in Package ReleaseNew packages are being released in npm by different maintainersevery day Within a timeline of five months the npm has hostedmore than 200K new packages increasing the size and complexityof the npm supply chain As a package managers npm could vali-date any new release against the risk model that could be developedfollowing the recommendations of this study After a particularpackage is validated using the risk model npm could publish thepackage and make it available to users If the validation is unsatis-factory npm could ask the maintainers of that package to improveits security by reducing weak link signals for instance by confirm-ing specific requirements impacting secure CICD pipelines fordifferent OS environments or limiting the use of install scripts

73 Trusted Package SystemIn our survey (Section 5) practitioners mentioned grading pack-ages based on security risk aligning with our proposed risk modelabove which may be in conjunction with the OpenSSF Metrics [40]Scorecard [41] and Best Practices Badge [42] projects Respondentsindicated a recommendation system in terms of different securitygrades We acknowledge that any kind of grading and measuring ofsuch a large ecosystem is difficult and expensive In that case npmcould prioritize or separate packages based on package reach interms of dependents and downloads the above security risk modelcould be implemented only for the most popular npm packageswhich would then form a new trusted package system npm couldexclude new packages or packages without any dependents or with

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References

Submitted to ICSE 2022 May 21ndash29 2022 Pittsburgh PA USANusrat Zahan Laurie Williams Thomas Zimmermann Patrice Godefroid Brendan Murphy and Chandra Maddila

few downloads from this trusted package system to avoid frictionwith the publication of new packages

8 SUMMARYIn this work we presented a framework that reveals the identifi-cation and quantification of a number of weak link signals in thenpm supply chain which may expose vulnerabilities for supplychain security attacks We hope that identifying these weak sig-nals will help practitioners structure discussions and analyses ofsuch issues As part of an ongoing investigation we submitted alist of suspicious packages to the npm security team to take neces-sary actionsuch as taking over packages from inactive maintainersfreezing the maintainer account if the maintainer domains are avail-able for sale or measuring security status of deprecated packagesOur second contribution consists of a list of new weak link signalsproposed by a survey of npm practitioners We hope our work willpromote further research on weak link signal identifications More-over implementing a risk model around these signals would allowdevelopers to make more educated data-driven decisions beforeincluding packages into their software

AcknowledgmentWe thank Bas Alberts andMax Schafer fromGitHub for encouraging us to pursue this research and for theirvaluable feedback We acknowledge the npm package maintainerscontributions to our study We also thank the NCSU Realsearchgroup for valuable feedback In particular we thank Aishwarya Sethfor her assistance with the qualitative analysis of respondent open-ended responses This work was funded by a Microsoft Researchinternship and by the NCSU Secure Computing Institute

REFERENCES[1] ldquoSolarwinds orion security breach A shift in the software supply chain par-

adigmrdquo httpssnykioblogsolarwinds-orion-security-breach-a-shift-in-the-software-supply-chain-paradigm accessed 2021-09-29

[2] M Zimmermann C-A Staicu C Tenny and M Pradel ldquoSmall world with highrisks A study of security threats in the npm ecosystemrdquo in 28th USENIXSecurity Symposium (USENIX Security 19) 2019 pp 995ndash1010

[3] ldquo2021 state of the software supply chain reportrdquo httpswwwsonatypecomresourcesstate-of-the-software-supply-chain-2021 accessed 2021-09-29

[4] R Duan O Alrawi R P Kasturi R Elder B Saltaformaggio andW Lee ldquoTowardsmeasuring supply chain attacks on package managers for interpreted languagesrdquoarXiv preprint arXiv200201139 2020

[5] ldquoeslint-scope attackrdquo httpsgistgithubcomhzoo51cb84afdc50b14bffa6c6dc49826b3e accessed 2021-09-25

[6] ldquoCompromised npm package event-streamrdquo httpsmediumcomintrinsic-blogcompromised-npm-package-event-stream-d47d08605502 accessed 2021-09-29

[7] ldquoNexus intelligence insights Sonatype-2020-0003 - npm malicious pack-age 1337qq-jsrdquo httpsblogsonatypecomsonatype-2020-0003-npm-malicious-package-1337qq-js accessed 2021-09-29

[8] ldquoDependency-confusionrdquo httpsblogsonatypecommalicious-dependency-confusion-copycats-exfiltrate-bash-history-and-etc-shadow-files accessed2021-09-25

[9] ldquoSnyk uncovers malicious code activities in open source supply chain securityon the npm registryrdquo httpssnykioblognpm-security-malicious-code-in-oss-npm-packages accessed 2021-09-29

[10] ldquoBash uploader security updaterdquo httpsaboutcodecoviosecurity-update ac-cessed 2021-09-29

[11] ldquoLessons learned from the solarwinds supply chain hackrdquo httpslinuxinsidercomstorylessons-learned-from-the-solarwinds-supply-chain-hack-87029html accessed 2021-09-29

[12] ldquoSolarwinds attack explained And why it was so hard to detectrdquohttpswwwcsoonlinecomarticle3601508solarwinds-supply-chain-attack-explained-why-organizations-were-not-preparedhtml accessed 2021-09-29

[13] ldquoSolarwinds attack cost impacted companies an average of $12 mil-lionrdquo httpsheimdalsecuritycomblogsolarwinds-attack-cost-impacted-companies-an-average-of-12-million accessed 2021-09-29

[14] M Ohm H Plate A Sykosch and M Meier ldquoBackstabberrsquos knife collection Areview of open source software supply chain attacksrdquo in International Conferenceon Detection of Intrusions and Malware and Vulnerability Assessment Springer2020 pp 23ndash43

[15] ldquoSecure at every step What is software supply chain security and why doesit matterrdquo httpsgithubblog2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog accessed 2021-09-29

[16] G Ferreira L Jia J Sunshine and C Kaumlstner ldquoContaining malicious packageupdates in npm with a lightweight permission systemrdquo in 2021 IEEEACM 43rdInternational Conference on Software Engineering (ICSE) IEEE 2021 pp 1334ndash1346

[17] D Gonzalez T Zimmermann P Godefroid and M Schaumlfer ldquoAnomaliciousAutomated detection of anomalous and potentially malicious commits on githubrdquoin 2021 IEEEACM 43rd International Conference on Software Engineering SoftwareEngineering in Practice (ICSE-SEIP) IEEE 2021 pp 258ndash267

[18] ldquoGathering weak npm credentialsrdquo httpsgithubcomChALkeRnotesblobmasterGathering-weak-npm-credentialsmd accessed 2021-10-11

[19] ldquoBackdoored python library caught stealing ssh credentialsrdquo httpswwwbleepingcomputercomnewssecuritybackdoored-python-library-caught-stealing-ssh-credentials accessed 2021-10-11

[20] ldquocore contributor to the conventional-changelog ecosystem had their npm creden-tials compromisedrdquo httpsgithubcomconventional-changelogconventional-changelogissues282issuecomment-365367804 accessed 2021-10-11

[21] ldquoPostmortem for malicious packages published on july 12th 2018rdquo httpseslintorgblog201807postmortem-for-malicious-package-publishes accessed 2021-10-11

[22] ldquoSpecifics of npmrsquos packagejson handlingrdquo httpsdocsnpmjscomcliv7configuring-npmpackage-json accessed 2021-09-29

[23] ldquoSolarwinds the worldrsquos biggest security failure and open sourcersquos better an-swerrdquo httpsthenewstackiosolarwinds-the-worlds-biggest-security-failure-and-open-sources-better-answer accessed 2021-09-29

[24] J Viega and G R McGraw Building secure software How to avoid security problemsthe right way portable documents Pearson Education 2001

[25] K Garrett G Ferreira L Jia J Sunshine and C Kaumlstner ldquoDetecting suspiciouspackage updatesrdquo in 2019 IEEEACM 41st International Conference on SoftwareEngineering New Ideas and Emerging Results (ICSE-NIER) IEEE 2019 pp 13ndash16

[26] ldquoThe hijacking of perlcomrdquo httpswwwperlcomarticlethe-hijacking-of-perl-com accessed 2021-09-29

[27] ldquo10 npm security best practicesrdquo httpssnykioblogten-npm-security-best-practices accessed 2021-10-13

[28] ldquoCcleaner attack timelinemdashherersquos how hackers infected 23 million pcsrdquo httpsthehackernewscom201804ccleaner-malware-attackhtml accessed 2021-10-13

[29] ldquoNotpetya ndash a threat to supply chainsrdquo httpswwwidagentcomblog2017-08-03-notpetya-threat-supply-chains-across-ukraine accessed 2021-10-13

[30] ldquoThe adverline breach and the emerging risk of using third-party vendorsrdquo httpsinsightsintegrity360comthe-adverline-breach accessed 2021-10-12

[31] R K Vaidya L De Carli D Davidson and V Rastogi ldquoSecurity issues in language-based sofware ecosystemsrdquo arXiv preprint arXiv190302613 2019

[32] ldquoThe state of open source security 2020rdquo httpssnykioopen-source-securityaccessed 2021-10-12

[33] ldquo2020 state of the software supply chain reportrdquo httpswwwsonatypecomresourceswhite-paper-state-of-the-software-supply-chain-2020 accessed 2021-09-29

[34] ldquoNpm attackers sneak a backdoor into nodejs deployments through depen-denciesrdquo httpsthenewstackionpm-attackers-sneak-a-backdoor-into-node-js-deployments-through-dependencies accessed 2021-10-11

[35] A Meneely and L Williams ldquoSecure open source collaboration an empiricalstudy of linusrsquo lawrdquo in Proceedings of the 16th ACM conference on Computer andcommunications security 2009 pp 453ndash462

[36] ldquogitphpnet server compromised move to github and delayed updatesrdquo httpsphpwatchnews202103git-php-net-hack accessed 2021-10-11

[37] C Bird N Nagappan B Murphy H Gall and P Devanbu ldquoDonrsquot touch my codeexamining the effects of ownership on software qualityrdquo in Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundationsof software engineering 2011 pp 4ndash14

[38] N Nagappan B Murphy and V Basili ldquoThe influence of organizational structureon software qualityrdquo in 2008 ACMIEEE 30th International Conference on SoftwareEngineering IEEE 2008 pp 521ndash530

[39] C Bird N Nagappan P Devanbu H Gall and B Murphy ldquoDoes distributeddevelopment affect software quality an empirical case study of windows vistardquoin 2009 IEEE 31st International Conference on Software Engineering IEEE 2009pp 518ndash528

[40] OpenSSF ldquoOpen source security metricsrdquo httpsmetricsopenssforg 2021[41] mdashmdash ldquoSecurity scorecards for open source projectsrdquo

httpsgithubcomossfscorecard 2021[42] mdashmdash ldquoCii best practices badge programrdquo httpsbestpracticescoreinfrastructureorgen

2021

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
    • 21 Key Terminology
    • 22 Attack Vector
    • 23 Research on Supply Chain Work
      • 3 RQ1 Weak link signal
        • 31 Data Process
        • 32 Weak Link Signals
          • 4 Case Study
            • 41 Case study 1 Popular packages
            • 42 Case study 2 Installation scripts
            • 43 Case study 3 Data-Driven Attacker
              • 5 RQ2 Survey
              • 6 Limitations
              • 7 Discussion
                • 71 Risk Model
                • 72 Control in Package Release
                • 73 Trusted Package System
                  • 8 Summary
                  • References