Recently, there has been a lot of news about random leaks of various confidential data from a web service for hosting IT projects and their joint development of GitHub.
I emphasize that we’ll talk about random leaks, i.e. due to negligence and without malicious intent from the perpetrators of incidents. Write off such leaks to the inexperience of employees in IT matters will not work, because GitHub users are overwhelmingly developers, i.e. well qualified and competent personnel. Unfortunately, even very good specialists sometimes make banal mistakes, especially when it comes to security issues. Let's take it as negligence.
Here are some very well known examples related to GitHub:
Obviously, all these instances of unintentional leaks could easily have been prevented by monitoring the data uploaded to GitHub. No one talks about a total ban on access to GitHub, this is a meaningless and even harmful idea (if there is a ban, but the service is needed, then the developers will bypass this ban). A solution is needed that prevents information leaks and has a real-time content analyzer that does not allow to upload only data to GitHub that should not be there for security reasons (for example, access keys to the Amazon cloud).
I will show you how to solve this particular problem, using the example of DeviceLock DLP. Our baseline data are:
To begin with, we define that the AWS key is the protected data and its getting to GitHub must be prevented.
Since the key is a set of bytes without any pronounced signatures (yes, I know about the text “BEGIN / END PRIVATE KEY” at the beginning and at the end, but this is a very weak signature and it is better not to rely on it), we will use identification on digital prints .
Add the key file to the DeviceLock DLP digital fingerprints database so that the product “knows” our key “in person” and can identify it unambiguously later (and not confuse, for example, with test keys that can be easily uploaded to GitHub).
Now let's create a content filtering rule for file storages in DeviceLock DLP (GitHub falls under our classification of “file storages”, within which, in addition to GitHub, more than 15 different file exchange and synchronization services are supported).
According to this rule, any users are prohibited from loading data with digital fingerprints that match the ones specified above, and when detecting prohibited data, corresponding events (incident records) and shadow copies should be recorded in the central archive logs, in addition to actually executing the action with the prohibition of downloading data to GitHub .
Let's now try to load the AWS key into the GitHub repository.
As you can see, the “for some reason” download process failed, and DeviceLock DLP warned us that this operation was blocked by it (of course, the message is configurable and deactivated).
At the same time, if you look at the DeviceLock DLP shadow copy log, you can find that key there.
Thus, in this example, it was shown how using DeviceLock DLP to solve the particular problem of preventing the leakage of any confidential data (digital prints can be taken from almost any files) to cloud storages.
Of course, in addition to preventing data leakage to GitHub, you can still carry out periodic inventory repositories and identify information in them that should not be there. GitHubs, Git Secrets, Git Hound, Truffle Hog, and many other free utilities have been created for scanning GitHub repositories.
Source: https://habr.com/ru/post/429796/
All Articles