📜 ⬆️ ⬇️

How we manage data quality

Data quality management is a new discipline. Gradually, the direction is gaining momentum in the oil industry, banking and retail. Everyone goes their own way, almost to the touch.
I work as a data quality analyst. In the article I will tell you how we manage data quality, what difficulties we faced, and how to overcome them.

image
Visualization of data quality on the screen in the office. The block level is proportional to the number of errors.

In our IT-circuit more than 30 large software systems and systems for various technologies. There are large corporate systems, special programs for oil production and own development. In this case, the systems interact and exchange information with each other.

We are engaged in the extraction of hydrocarbons. Mistakes in the oil industry are very expensive. Incorrect data on the well trajectory can lead to equipment seizure, destruction of neighboring wells. The cost of repairs and fines reach several million dollars. Serious consequences require serious consideration. It is easier and cheaper to correct errors in the data than their consequences.
')

What to check?


What data errors are important, and what can you skip? We decided to build on the reality and focus on what causes problems for users.

It is difficult to get employees to respond to the phrase "You have incomplete data there." But if you say that you forgot to put down the date of the contract, and the contractor is not allowed to the field - they will act quickly.

Or another example. The plan of the cluster site was not updated in time. The builders took a plan on which the power cable was not marked, and successfully dug it up. As a result, the well cluster was de-energized. And this is not only the cost of recovery, but also a threat to the lives and health of employees. The problem in the data led to problems in the real world. Such situations are taken into the work of data quality analyst.

image
Slide from my speech at the “Smart Oil & Gas: Digital Transformation of the Oil and Gas Industry” conference.

When errors are found in the data, the analyst seeks to prevent their occurrence. For example, problems can be avoided by changing the business process. Or it can prevent software development, for example, creating an input mask.

If the error cannot be completely eliminated, the verification rules are formalized, and the data quality management system (our own development) launches them on a schedule. Detected errors are associated with specific people, the user receives only those errors that relate to him.

How to measure?


To control something, you need to measure it first. The first thing that comes to mind is the percentage of data quality. We need to take the number of errors, divide by the number of objects, get a number. That's just the real situation here is not felt. One critical error per 1000 objects will not even be noticed. And generally, is 99% good or bad?

We decided to use absolute indicators - the total number of errors in all monitored systems.

image
Dynamics of the number of errors in corporate systems.

In a year and a half, we went an incredible way and reduced the number of errors from 18,000 to 400.

How to motivate?


After the pain points are found, it is necessary to build a system of interaction with users who should correct errors. Reaching out to a particular person is not easy, especially if the error does not affect him, and people in another department suffer. For example, today the drilling department did not indicate the depth of the well, and tomorrow the production department does not know where to lower the pump.

It is difficult to find a universal recipe that fits all. In our work we use the mix from the technician:


image
One of our visualizations to attract attention. The depth of the submarine is equal to the number of errors .

A strong impetus for reducing the number of errors was the revised template of the error letter. Initially, the system sent statistics on errors in the context of checks. Such a letter is more like a report. The conversion of letters was weak. We decided to switch to a format that would encourage users to correct the data. The new template states what specifically went wrong, why it is important to correct this data, and how to do it. And there were buttons to interact with the data quality team. Through them, we get feedback from our users.

image
Sample Error Letter

What's next?


We are constantly looking for new areas where data quality management will be required, involving new people and new departments. This year we added 100 rules of data verification, and their total number exceeded 500.

We are interested to grow not only in breadth, but also in depth. We would like to find like-minded people and organize a small data quality management forum where we could exchange experience.

And how do you work with the quality of your data?

Source: https://habr.com/ru/post/347838/


All Articles