Blog

Falling Stars

7 min.

November 18, 2024

Intro

The number of the open-source packages is constantly rising, complicating how developers choose a package that fits their needs and is secure. Package repositories offer various metrics to help developers choose the right package, like the number of downloads, GitHub statistics, and user ratings. Package repositories offer various metrics to help developers choose the right package, like the number of downloads, GitHub statistics, and user ratings. Nevertheless, popularity continues to be one of the most influential factors in package selection. When we see a popular package, we assume it’s well-maintained and reliable. This common assumption led to the emergence of starjacking two years ago.

Starjacking is a technique that artificially inflates a package’s apparent popularity by exploiting how package repositories display information about associated GitHub repositories. After the technique became public, several major repositories, including npm and Yarn, were found to allow package publications with links to GitHub repositories not owned by the package publisher. We recently conducted comprehensive research across more than 20 package repositories to evaluate the current state of starjacking, and the findings show promising developments in security measures.

Researched package repositories

Our research encompassed 21 separate package repositories, Ranging from the big ones like npm, maven, and PyPI to smaller ones like CPAN, LuaRocks, and Hackage. The table below lists each repository and its primary programming language, included in the research.

Repo Name Language
npm JS
Maven Central java
Pypi python
NuGet csharp
pkg.go.dev go
Packagist PHP
Rubygems Ruby
crates.io Rust
CocoaPods ObjC/Swift
Pub.dev dart
CPAN perl
CRAN R
Clojars JS
Yarn JS
anaconda python, r
LuaRocks lua
Hackage haskell
Opam ocaml
Hex erlang
Meteor JS
Swift package index Swift

These repositories fall into two primary categories based on their artifact management approaches:
  • Some store the artifacts created during building, compiling, or packaging the code.
  • Others simply provide references to GitHub repositories containing the necessary files for package installation.

Package managers that exclusively reference GitHub repositories, such as pkg.go.dev and RubyGems, are inherently protected against starjacking since they display data directly from GitHub repositories. This direct integration eliminates the possibility of linking to one repository while serving code from another.

GitHub repositories pkg.go.dev web image

While such package repositories are not susceptible to Starjacking, the displayed GitHub statistics can still be misleading. They can be manipulated using more sophisticated techniques. For example, Swift Package Index and Packagist display comprehensive GitHub repository details, which can trick the users, if the stats are spoofed.

Packagist index web screen shot
Swift index web screen shot

Results

Most repositories do not display the GitHub repository statistics referred to by the package. While PyPI and Yarn previously showed these stats, they’ve since modified their approaches: Yarn has completely removed the statistics while PyPI implemented a more sophisticated metadata display system.   Yet some package repos still display GitHub statistics; for example, npm continues to show the number of issues and pull requests from the GitHub repository specified in the package metadata.

npn index web screenshot

Moreover, the CPAN Perl package repository displays the GitHub stats.

CPAN Perl package repository screenshot

Pypi’s Transformation of GitHub Statistics Display

PyPI slowly but steadily added verification of the package metadata.

Initially, PyPI displayed GitHub repository statistics without any verification mechanism. This approach made the platform vulnerable to starjacking attempts, as any package could claim association with any GitHub repository. PyPI’s first security improvement divided package information into two distinct sections: unverified and verified details.

While this division helped users identify trusted information, statistics of arbitrary GitHub repositories were still shown in the unverified details section. This was a good step towards informing the user which data they can trust. However, this was not enough since most people don’t carefully distinguish between verified and unverified information.

PyPI made a crucial advancement by implementing a comprehensive verification system through the Trusted Publisher Management feature. Starting from August 2024, the platform now ensures GitHub statistics appear exclusively in the verified details section and are only displayed for packages uploaded through the Trusted Publisher Management feature. This system utilizes OpenID Connect to enable secure publishing through trusted services like GitHub Actions.

The new publishing process works as follows: A PyPI project maintainer specifies a workflow in their GitHub repository for automatic package publishing. When triggered, the workflow authenticates with PyPI, proving that the code comes from the intended source. Only after verification can the package be published. Under this new system, PyPI displays GitHub repository statistics only when the links point to verified code repositories that have been authenticated through the trusted publishing workflow.

The evolution of PyPI’s security measures against Starjacking can be seen in three distinct phases (left to right):

  1. Initial phase: GitHub statistics were displayed without any verification or indication of their authenticity.

2. Second phase: Separation of verified and unverified details, with GitHub statistics specifically placed
in the unverified details section.

3. Current phase: GitHub statistics are now only displayed in the verified details section and appear
exclusively for packages uploaded through the Trusted Publisher Management feature.

This progression demonstrates PyPI’s commitment to maintaining security while providing valuable repository information to users.

GitHub project description page screenshot

Conclusion

While npm and CPAN continue to display unverified GitHub statistics, the risk of Starjacking has significantly decreased over the past two years. This improvement stems from most repositories either removing GitHub statistics entirely or implementing more robust verification systems, as exemplified by PyPI. It’s worth noting that most repositories (with PyPI being the exception) still display package metadata links without verification. While this vulnerability could potentially be exploited by malicious actors, it poses a substantially lower risk of misleading users compared to the original Starjacking technique.

Read More