📜 ⬆️ ⬇️

Test data for HL7 messages

In this small article I would like to touch on some, maybe for some not entirely obvious, aspects of testing interfaces of medical systems. And although most often the interaction of such systems is based on one of the protocols Health Level 7 (HL7), this is not a prerequisite, other methods of communication can be used (for example, ASTM Continuity of Care Records (CCR) can be called before It was adapted by HL7 called CCD).

And so, after the interface between the medical systems is implemented, the next step is to conduct a number of different tests aimed at verifying the implementation of business requirements (acceptance testing or acceptance testing, by the way, there are no articles on the Wiki in Russian about this type of test) and implementation technological requirements (integration testing and unit testing). Both types of testing are impossible without test data. This article is just about one of the aspects of creating test data for testing honey systems.

To make the problem easier to understand, let's see how it happens in the general case. The team of testers receives a database dump from the production system. This data consists of three sufficiently conditional groups: the test data itself, test reference data and application reference data. To check the correctness of the program (checking the implementation of technological requirements), the tester can create his own fake data that covers all three types.
')
Other types of testing, such as acceptance testing, integration testing, testing of non-functional requirements (again, there are no articles in the wiki in Russian) require more complex test data, as well as a sufficient amount of test data for testing system performance.

In the case of medical systems (in the case when this is done correctly, and not on the knees), the task is to manage test data when developing and testing such systems, taking into account compliance with applicable laws and regulations aimed at protecting the patient’s demographic and medical information from accidental or deliberate disclosure. For this purpose, de-identification of data is used about what I wanted to say in this article.

De-identification of patient information is extremely important not only in the development and testing of medical systems, as it may seem at first glance, but also when used in analytical systems that I wrote about last time - (Public Health Information Networks and Clinical Research Support).

Here are just a few of the reasons why data de-identification is needed: developers' laptops, hard drives, or flash drives may be lost or stolen; hacker attacks are possible on non-highly protected development computers, including from inside the company and many other things.

You can read about real examples on the website - databreachtoday.com - and although the first page of this site will probably be different, different from what I see at the time of the publication of this article, more than likely, at least one article will be about honey data or HIPAA or similar act.

In all the above and not only these cases, the regulatory authorities require immediate notification if a leak or loss of personal data has occurred. Moreover, in the pursuit of roasted facts, the media will try to inflate the loss of even a small amount of personal data to the world problem (again, see the website above). I had the opportunity to follow the newspaper publications for a group of medical analysts who were dismissed by the entire composition and brought to trial, and a year later they were reinstated with apologies because of the access to the medical data.

Before continuing the story, a small homework. One company claims that the data below are test data:

Rhonda James; DOB: 08-Sep-1988; Address: 71 Ansubet Dr, Charleston, WV
Denise Lewis; DOB: 03-Mar-1976; Address: 23 Adams Chapel Rd, Mankato, MN
Rosemarie Hardy; DOB: 14-Nov-1985; Address: 310 Camp Creek Rd, Weston, MA

Does this look like test data? Yes, it seems. However, the company also claims that it is not just test data, but also de-identified data. I took Whitepages and did a search for one of the names and state of residence. It turned out that there are several people with such data. The home address does not match, but you can always say that the person moved, etc. So, as long as some Rhonda or Denise or media do not pay attention to it, the company can sleep well, problems can begin later.

Suppose you decide to do everything correctly and prepare a sufficient amount of de-identified test data. (Again, we are talking about honey systems and their interaction, I’m not trying to get into some related areas.) Before a project manager or architect or the most important tester makes such a decision and rushes to write a parser to dump the database, it’s worth considering that others have already come across this, have long thought and come up with different approaches to creating this most de-identified data, such as: data reduction, data modification, data suppression, and so on.

For review, I suggest reading the Tools for De-Identification of Personal Health Information written by Ross Fraser and Don Willison. Even if you do not understand anything (which is more than likely, at least for me it is a dark forest), then at least it should be clear that creating a dump with de-identified data is not just a replacement for Sergey Sergeyevich with Ivan Ivanovich (or Denise Lewis on John Smith) should be a more serious approach.

A couple of other sources on the same occasion:
• Guidance Regarding and Accountability (HIPAA) Privacy Rule .
• Canadian “Best Practice” guidelines for Disclosure of De-Identified Health Information .
• Guide from the UK Information Commissioner's Office: Anonymisation: managing data protection risk code of practice .

Well, in conclusion, if you saw that the honey systems integration project manager set aside a couple of days to create test data, and in rare cases didn’t set it as a separate task at all, the risk of flying through all the project deadlines is very high due to the complexity of creating test data in the amount sufficient to test the performance of the honey system. In an amicable way, this sub-project must begin with the start of the main project.

Source: https://habr.com/ru/post/257613/


All Articles