Here I have already written about what an MDM system is and why it is needed. Now I would like to touch upon the topic of choice, which one way or another faces all who are thinking about managing master data: whether to buy a ready-made MDM system or develop it on their own.
A universal recipe traditionally does not exist, and everyone must decide for themselves which path to take. To make the right decision, you need to define a set of requirements for MDM, and then correctly assess your strengths and needs for functionality.
Therefore, I will begin by describing the typical functionality that a modern MDM system should have.
Master Data Lifecycle Management:
The key functionality of MDM systems is the ability to manage master data throughout their life cycle: from the moment they are defined until the moment they are stopped.
')
To do this, the MDM system must support the following features:
- Creating a data model . As part of the creation of a data model, master data objects, their model structures and attributes are defined. It is of fundamental importance to ensure the possibility of flexible creation and modification of the data model throughout the life cycle of the master data object. In the course of daily work, there are often situations when you need to quickly add a missing attribute or change an existing schema of a model of a master data object. This task must be performed promptly, in user mode, without reprogramming and stopping the system.
- Use of universal data storage . In contrast to, for example, ERP-systems, MDM data is stored in special formats that allow you to store data simultaneously in different DBMS. It provides quick access to data in various scenarios and allows for horizontal scaling and clustering of data stores. A typical approach is the separation of information domains across different data storages.
- Storing caches of "hot" data that are placed in memory with active swapping. To ensure high speed data access, the hottest data is loaded into various in-memory caches. Special mechanisms track the changing activity of the data request and use the forecasting tools to quickly update the data in the caches.
- Manage groupings and hierarchies of master data objects . Combining master data objects into groups or hierarchies is used to solve a variety of applied tasks, for example, creating a hierarchy of organizations within a holding, or grouping products according to some attribute, etc.
- Creating and managing relationships . Relationships exist between master data objects both within a single domain and between domains. For example, it is possible to establish several types of relationships between individuals and organizations: an individual can work in an organization, be a client of an organization, be a supplier of an organization, etc.
- Versioning and storage of change history . It is very important to store historical information not only on the master data objects themselves and on the attributes of their models, but also on the structure of the models, relationships with other objects, hierarchies, groupings, etc ... For example, it may be important to make a decision that such an individual was formerly an employee of such an organization. Ideally, history should provide the ability to roll back the status of data to any selected recovery point.
- Leading taxonomies . For master data objects, different taxonomies can be defined. For example, one or several classifiers can be set for material and technical resources: the first - grouping elements from the point of view of the buyer by product categories, and the second - from the point of view of the buyer by suppliers. The set of attributes of the model of one or another element of the master data may depend on the established taxonomy.
- Ensure data security. An MDM system must have the tools to configure and provide access control to the data, both at the record level and at the attribute level.
- Conducting an audit . It should be possible to track the history of all changes in data and models, by whom and when they were made.
Quality control
Poor-quality data reduce to no the entire effect of centralized data and their centralized management.
To control the quality of data in the system, the following mechanisms and tools should be present:
- Data analysis and profiling . Before embarking on any data manipulation, it is necessary to examine this data. Automatic mechanisms for analyzing and profiling data make it possible to roughly evaluate the quality of data, identify errors in data, and build a strategy for their processing. For data analysis without reference to any subject area, most often used methods of statistical analysis. This analysis reveals the presence and depth of problems with completeness of data (omissions), “suspicious” records (extreme values ​​and outliers by one of the attributes, records that are not in any cluster), unsuitable attributes without prior preparation for use in machine learning methods (omissions, outliers, extreme values, low frequency of encounters of some unique values, etc.). If you carry out analysis with immersion in the subject area and analyzed domains, then you should consider the data subtypes (ordered and categorical) and the types of data (continuous and discrete) for each attribute. Having determined the main types, subtypes and types of data, it is possible to reasonably analyze the calculated statistical indicators and carry out the profiling of the available data, determine the ways of correcting their values, prepare data for using modern modeling methods.
- Validation, standardization, cleaning and enrichment of data . Here, simple mechanisms can be used such as casting values ​​to a single format (for example, phone numbers), deleting / replacing random interspersions of characters of the “other” alphabet, removing extra spaces, replacing abbreviations and abbreviations in the dictionary, correcting obvious typos, etc ... Also more sophisticated mechanisms can be applied based on business rules, use for cleaning and enrichment of external databases (for example, addresses or legal entities databases).
- Detection of duplicate master data entities . One of the key features of the system. There should be both deduplication mechanisms based on clear business rules for structured data (often used in the Clients domain) and various complex semantic self-learning mechanisms for weakly structured and unstructured data (often used in the Nomenclature domain).
- Job data stewards (experts) engaged in semi-automatic or manual data processing . There must be jobs where it is convenient to perform various manipulations with data that could not be performed automatically at earlier stages, or stages for which the decision maker is responsible. The work of data stewards may include the editing of attributes that are not amenable to automatic processing, the confirmation of duplicates and the choice of a "surviving" element or attribute, etc.
- Evaluation of data quality changes over time . This is the ability to create specialized KPIs for data quality and track their status over time. On the basis of these indicators, it is possible to build a policy of motivation of the NSI unit in companies.
Integration and synchronization of information
The task of integration and synchronization of information between the MDM-system and application systems that consume master data is one of the main ones. Data must be synchronized between all participants in the interaction. Often this function is provided not by the MDM system itself, but by a specialized ESB or MQ system. Ideally, the ESB system should be built on a single technological platform with the MDM system, since while ensuring maximum integration between them.
To build interaction mechanisms, the following features should be present:
- Receive data or changes in data from application systems in synchronous and asynchronous modes.
- Distribution of master data from an MDM system across application systems in synchronous and asynchronous modes.
- Transfer of various kinds of events from MDM systems to application systems. For example, that some data is outdated, and we are no longer working with any client, and the data about it should be deleted or sent to the archive.
- Correction of synchronization errors: tracking of sent but not received data, resubmission, resolution of conflicts in the relevance of the transmitted data, etc.
- It is important to provide real-time interaction for the effective functioning of end-to-end business processes. This is especially important with the operational method of using MDM (Operational MDM), where application systems can use MDM services as part of a single business transaction.
The list given by me does not claim to be complete. There are many other functions of MDM systems. I just brought the most important, without which most full-fledged MDM implementations can’t do without.
And yet, write or buy?
If you are inclined to implement the MDM system on your own, then evaluate which of the above functions you need not only now, but also in the future. Often, companies that follow this path start with the implementation of a certain similarity of a central database into which master data objects are placed in a high degree of readiness (manually or using batch loading) and from which master data are downloaded to subscriber application systems . Often this approach is called “centralized directory entry”, which is used to provide a single point to enter reference information, i.e. in fact, to simplify its input. In most cases, this project, tied to an independent development, ends, and further development of the functionality does not occur because of the greater complexity of the implementation of other features of MDM-systems. Strictly speaking, such results cannot be considered as full-fledged master data management. However, for some companies that do not have so many master data entities and low data quality requirements, this is also enough.
If you do not want to be limited to a simple “centralization of the input of directories”, then with a high degree of probability you will want to implement a ready-made MDM system. Here you have a difficult choice of MDM-system, which is best suited for you.
If you touch on the functionality of the system, then you should analyze only after you have already more or less understood which tasks you want to solve with the help of the MDM system: which data domains you have, which method of use and which style implementation style you choose. Here is more about it. Only after defining the tasks can the functionality of the MDM system be evaluated, since Not all MDM systems are equally good in all the variety of domains / methods of use / implementation styles.
In addition to the above functions, pay special attention to the following aspects of MDM systems:
- Domain Support. Historically, many MDM systems have developed architectures with the support of a single domain, for example, Clients. Such systems often poorly support other domains and do not specialize in them. For example, the principles of working with Customer Domain data and with Product Domain data are very different. Therefore, it is categorically not enough to analyze the functionality of the system on the example of a single domain, you need to look at everything.
- If you plan to implement a collective method of use (Collaborative method of use), then pay attention to the ease of setting up business processes and user roles. This should, if possible, be done without programming, in parametric mode, since Processes and regulations often change.
- If you plan to implement the Operational method of use with the maximum automation of data processing functions and the minimum involvement of data stewards, then you need to pay attention to the presence of automatic processing mechanisms and mechanisms for adjusting the sequence of their use, the availability of fast transfer methods data between source systems and MDM.
It is also crucial to pay attention to the performance and fault tolerance of the system offered to you. Without these two properties, any MDM functionality will be useless.
Here are some points that you should definitely check:- Ask a potential MDM provider to model the largest master data object and load this data into MDM. Estimate download speed.
- Perform various kinds of searches for downloaded data: search by main attributes, search by additional attributes, fuzzy search by different algorithms, full-text search. Rate search speed. This is a very important basic parameter. Many other functions of the system and the speed of their work depend on the speed and quality of the search. If at this stage there is a slow system, then it will only get worse.
- Modify the model of any master data object or its attribute. Estimate the speed of restructuring information and the speed of retracement in case of an unforeseen situation.
- Analyze the response time of the system to standard requests in the mode of use that is planned for implementation in your company. For example, many MDM systems work satisfactorily in Transactional Hub mode, when all data is entered directly into MDM and then distributed to subscriber systems, but their performance is not enough when working in Coexistence Hub mode, when you need to interact very quickly between systems in two-way real-time.
- Analyze which integration mechanisms the MDM system supports and to what extent is it consistent with the systems with which it is supposed to interact. Check the ease of connecting new subscriber systems and their connection speed. Also important is the ability to change the logic and routes for receiving and distributing data without deeply modifying all systems and with minimal downtime.
In any case, the process of choosing the path is a creative process and it is impossible to predict all the questions that will arise, but I tried to describe the main ones that seem important to me.
Maxim Vlasov, Development Director