A group of researchers from the University of North Carolina (North Carolina State University, NCSU) conducted a study of the service for hosting IT projects and their joint development of GitHub. Experts have found that over 100 thousand GitHub repositories contain API keys, tokens and cryptographic keys.
The problem of unintentional leakage of critical information (encryption keys, tokens and API keys from various online services, etc.) has long been one of the hottest topics.
“Due to” such leaks have already occurred several major incidents with personal data (Uber, DJI, DXC Technologies, etc.).
From October 31, 2017 to April 20, 2018, researchers from NCSU scanned 4,394,476 files in 681,784 repositories via the search API of GitHub itself and 2,312,763,353 files in 3,374,973 repositories previously collected in the Google BigQuery database.
In the process of scanning, experts searched for strings that would fall under the patterns of API keys (Stripe, MailChimp, YouTube, etc.), tokens (Amazon MWS, PayPal Braintree, Amazon AWS, etc.) or cryptographic keys (RSA, PGP, etc.).
In total, experts found about 575,476 tokens, API- and cryptographic keys, and 201,642 of them were unique. 93.58% of finds were related to accounts with one owner.
When manually checking part of the selected results, AWS credentials were found for the site of a large government department in a Western European country and for a server with millions of applications for admission to an American college.
The study revealed an interesting trend - if the data owners detected a leak, then 19% of the data monitored by experts were deleted (as “removed”, see below) within 16 days (of which 12% - during the first day), and 81% were not removed during the observation period.
The most interesting thing is that all the “deleted” data, which the researchers observed, were not actually physically deleted, and their owners simply made a new commit.
At the end of last year, we wrote a short note on Habr , in which we told how to prevent unintended leaks using DeviceLock DLP solution by monitoring data downloaded to GitHub.
Regular news about individual cases of data leakage, promptly published on the information leakage channel.
Source: https://habr.com/ru/post/444930/
All Articles