Leaks of secret information found in 100,000 repositories on GitHub

The methodology for collecting secrets involves various phases, which allows you to eventually identify secret information with a high degree of confidence. Illustration from scientific work

GitHub and similar platforms for open source code publishing have become the standard tool for developers today. However, a problem arises if this open source works with authentication tokens, private API keys and private cryptographic keys. For security, this data must be kept secret. Unfortunately, many developers add sensitive information to the code, which often leads to random information leaks.

A group of researchers from the University of North Carolina conducted a large-scale study of secret data leaks on GitHub. They scanned billions of files, which are compiled by two complementary methods:
')

Nearly six-month real-time scanning of GitHub public commits
A snapshot of publicly accessible repositories covering 13% of all repositories on GitHub, a total of about 4 million repositories.

The conclusions are disappointing. Scientists have not only found that leaks are widespread and affect more than 100,000 repositories. Even worse, thousands of new, unique “secrets” fall on GitHub every day.

The table lists the APIs of popular services and the risks associated with leaking this information.

General statistics on found secret objects shows that most often Google API keys are in open access. Also, RSA private keys and Google OAuth IDs are common. Characteristically, the vast majority of leaks occur through repositories with one owner.

Secret	Total	Unique	% one owner
API Key	212,892	85 311	95.10%
RSA secret key	158 011	37,781	90.42%
Google oauth id	106 909	47,814	96.67%
Regular private key	30,286	12,576	88.99%
Amazon AWS Access Key ID	26 395	4648	91.57%
Twitter access token	20,760	7953	94.83%
EC private key	7838	1584	74.67%
Facebook access token	6367	1715	97.35%
PGP Private Key	2091	684	82.58%
MailGun API Key	1868	742	94.25%
MailChimp API Key	871	484	92.51%
Stripe Standard API Key	542	213	91.87%
Twilio API Key	320	50	90.00%
Access token Square	121	61	96.67%
Square OAuth Secret	28	nineteen	94.74%
Amazon MWS Auth Token	28	13	100.00%
Braintree Access Token	24	eight	87.50%
Picatic API Key	five	four	100.00%
Total	575 456	201 642	93.58%

Monitoring of commits in real time allowed us to determine how much sensitive information is removed from the repositories shortly after getting there. It turned out that on the first day a little more than 10% of secrets are deleted, and on the following days a few more percent, but more than 80% of the private information remains in the repositories two weeks after the addition, and this proportion practically does not decrease later.

Among the most notable leaks are the AWS account of a government agency in one of the Eastern European countries, as well as 7,280 RSA private keys for accessing thousands of private VPN networks.

The study demonstrates that an attacker, even with minimal resources, can compromise many GitHub users and find a lot of secret keys. The authors note that many existing methods of protection are ineffective against the collection of classified information. For example, tools like TruffleHog demonstrate efficiency at only 25%. The built-in GitHub limit on the number of requests to the API is also easy to manage.

However, many of the secrets discovered have clear patterns that simplify
their search. It is logical to assume that these same templates can be used to monitor secret information leaks and warn developers. Probably, similar mechanisms should be implemented on the server side, that is, on GitHub. The service may issue a warning directly during a commit.

GitHub recently implemented a token scan token ( Token Scanning feature), which scans repositories, searches for tokens, and notifies service providers of information leaks. In turn, the vendor may cancel this key. The authors believe that through their research, GitHub can improve this function and expand the number of vendors.

Source: https://habr.com/ru/post/445038/

All Articles

Leaks of secret information found in 100,000 repositories on GitHub

More articles: