The bulk of the data stored by modern companies is unstructured, i.e. these are data created by employees of the company, and not, say, a database or unloading of an automatic service. At the same time, even with a perfectly configured system of access rights to resources, it is impossible to guarantee that the content that we expect to see there really lies in a single folder. Passport and credit card numbers in the contractors contract folder? Elementary. Fotochki with no doubt an exciting vacation in Goa in the folder reporting? Easy! New film distribution in the directory for employee training? Yes, easily! Are you still surprised?

Most of our clients are sure that “but we have” everything is fine with this. Those who doubt, often do not even suspect what the true scale of disasters. When, after scanning the classifier, you show a bunch of confidential documents in the briefly named daddy “!!! for washi ”in the main ball, representatives of the IT security department are beginning to uncomfortably crawl in their chair. And if you find a document with top management awards in the public domain ... Yes, yes, it happened.
To identify and prevent such situations, data classification is just necessary. It can be configured to work with metadata (name, type, size, file creation date, etc.), and with the contents. First you need to create a number of rules consisting of a set of filters, logical operations and regular expressions, as well as specify the schedule of work - because we do not want the analysis to occur during hours of maximum load on the server. To facilitate this, most full-text analysis products already have a set of predefined patterns, such as PCI DSS compliance, but you really still have to sit and think about filters that are most suitable for solving specific business problems.
')
Among the standard rules that we usually set up for our clients, we can highlight the search for passport data and credit cards, the definition of confidential data and data for official use, the identification of audio and video records, as well as launch files (software). Many are not limited to this and already independently add a search for SNILS, TIN numbers, financial statements with difficult conditions and much more.
Okay, let's say the data we have classified, what next? Of course, you need to bring everything in order in accordance with the security policies: hide passport data and credit cards away from prying eyes, remove personal photos, film back into the Internet to abolish, and educate the daddy for Vasya. For convenience, you can use the results of the relevant reports, which clearly show what exactly and how often is found in your files, and where these files are located.
It sounds good, but it still does not solve the problem of relapses and new cases. To do this, you should already set up alerts in case of detection of new files that fall under the customized classification rules, so we will quickly find out about policy violations without the need for periodic “clean ups”. Why do everything manually if it can be automated? But unfortunately, administrators do not always respond quickly enough to the messages sent, so to minimize the risks, you can move these newly discovered files first automatically to quarantine, and only then carry out a debriefing. Fast, convenient and safe.
As a result, one can gain an understanding of the structure and full control over the dissemination of data within the organization, identify the perpetrators of violations of security policies and automatically take measures to minimize risks in the event of new cases. We believe that data classification is too important element of control of unstructured information, so that it can be simply ignored, because without it it is simply impossible to be sure that the data is exactly where they should be.