Plowed bigdata field in medicine and pharmaceuticals

Grigory Bakunov, director of the dissemination of Yandex technologies, popularizer of programming, one of the creators and permanent presenter of the Radio-T podcast, told last year’s DUMP conference what fundamental changes are taking place in medicine and pharmacology right now, what practical problems are facing science and what medicine looks like of the future.

Under the cut video report and its text version.

Hello! Briefly about the sponsors. Not so long ago, I was invited to participate in a conference related to medicine and the technologies in it. They say: "We need a short lecture for 15 minutes." Before the performance I was stopped for a second, they say, a short announcement. It turns out the man who says: “Excellent medical conference, very cool! Literally, my sea water is sold at the nearby stand, it cures for all diseases, efficiency is 150% higher than the usual medicine, be sure to come. ” I look and think, God, if such a person came out at a developer conference, they would throw stones at him. And the doctors are sitting and whether they are normal, or they are accustomed, I do not know.
')

I need all this. When I was preparing for this presentation, to be honest, I thought that I would be talking about something else. But about a week and a half ago it dawned on me that I did not want to tell a typical bulshit about machine learning in science. Not in the sense that it is bullying, but in the sense that, on average, if you are interested in it, you already know about it.

If you do not know about it, it means that you are simply not interested.

What do I want to tell you? I want to talk about what practical problems now face science, so that you, programmers, understand it, and moreover, it is even pleasant.

Forgive me for the illustrations. I always have such illustrations that you may not even look at them, but I will be pleased if you at least sometimes smile.

The main message, which is probably the beginning, that's what. When three years ago I started studying the topic of medicine, health, pharmaceuticals, the use of algorithmic methods, I went to several large institutes, and there I met many smart people, and each time I asked them the same question. How can I get people who are engaged in medicine and pharmacy, medicine first of all, into dialogue? How to make them talk? They say: well, they must be provoked.

And I started coming to the conferences with this slogan.

It always looked like this. You come into the hall, you say: "Hello, dear doctors and dear scientists, I must tell you this: medicine is not a science."

Of course, this is not entirely true. I have a simple definition of why medicine is not a science. Because modern medicine looks like this: until you graduated from medical school, you do not practice medicine, you do not know anything about medicine. You need to not just finish. There is an internship, a complicated procedure, you study for almost 9 years, and only starting from this, you are beginning to be considered a novice doctor. There is a special esoteric language in which only doctors speak. And sometimes I have the feeling that they have their own writing.

In this case, at first you just learn, gain knowledge, then you are given a teacher, for whom you walk and repeat what he does. And only then they give you a white coat, cap, stethoscope (which, as you know, the doctors no longer use, this is a pure attribute) and they say: that's all, now you are a doctor.

Think for a second, does it remind you of anything? For many years, you are first taught, hardly taken through the exam, then you follow the teacher and repeat everything you need. And after a while you yourself become a teacher.

The structure of one to one repeats the structure of the secret orders of the 12-14 century. One to one. Those who played Assassin's creed probably should remember this story. One to one - the secret order.

At the same time you need to know this. The secret order in the task is not to create new knowledge, not to multiply the old, but simply to preserve the knowledge of the ancients. Because of this, medicine has braked for many years. Thank god it's over. In my opinion, it just ended, and ended not thanks to medicine and doctors, but due to the fact that humanity began to accumulate data.

These data, which we have accumulated, often began to contradict medicine. And they began to contradict strongly. Directly specifically, hard to contradict.

Most of the major and important changes in medicine that have occurred over the past 20-30 years are associated exclusively with the data.

At the same time, medicine, despite the fact that, in my opinion, it began to be scientific from the 21st century, it has one big problem.

There is no hard definition of what science is. But there are a number of important scientific techniques. It seems to me that the most important of them is that if you do science, you constantly conduct experiments, you tell other people about them, and other people should be able to reproduce your experiment.

The key point of science in the modern world is the reproducibility of experiment. And reproducibility in many senses. You can repeat the experiment I did. Another person can repeat the experiment you did.

And that's what's important. Someone repeats your experiments all the time. Without this, there is no science, no verification.

When we came to this topic (there are several enthusiasts who deal with this topic), the first thing we found was that most people who work on data around science do not know how it works in the normal world of programmers.

I think that this is one of the most successful experiments that we have done, we started working with pharmacy and cell biology, we started an experiment culture. Each experiment and the results of each experiment, we actually designed in the form of an existing test. Finished written test on Python. Each experiment was designed in this way.

The data of each experimental action, that is, for example, the use of a drug on a protein or the use of a drug on a cell, was the execution of a test. And that's what's important. All these tests were run in parallel, all the time, non-stop. This is a classic pattern called Continuous Integration.

When we started talking about it with scientists, they started talking about it: “Well, it's incredibly difficult. To do this, you need to write some kind of software. ” It turned out that most of the software that programmers have been using for all these pieces for years, such as some Travis, which we have used for many years, some Jenkins, which we have used for many years, is one-on-one for scientists too.

If you connect your head and start thinking, then experiment is the code. The same classic regression stories work. For example, if at some point you suddenly decided that your science experiment requires changes, let us run through all the old tests using a new experiment and check that they work.

Classic regression testing has not gone anywhere. Scientists were shocked, because they found that if experiments were conducted in the old way and in the new way, the difference in experimental measurements was up to 20%.

What is 20% in pharma? Well, it would seem that pharma has long been accustomed to mistakes. Well, they released an unsuccessful drug, after a year they paid someone, someone did not start working on this drug. In reality, errors detected in pharma at a later stage often lead to company closures. Because if you revealed a complicated side effect 4-5 years after the launch of the drug and, on your own stupidity, sold it, for example, in the United States or any other civilized market, the number of lawsuits against you, as a company, will amount to tens and hundreds, each of which will consist of tens of millions of dollars. You just spend more on lawyers.

The introduction of regression tests in this environment has allowed in many situations to reduce the cost of errors by 20-30%. What is 20-30% of the total flow of a fairly large pharmaceutical company with which I interacted with this? Well, it's like 4-5 billion dollars. According to them, the money is small. For my taste, for the introduction of one small tool, the money is really good.

The same story is one on one about versioning and the approach to the experiment as such. Starting from the moment when you start to think about the experiment and about scientific action, like about the code, you immediately start to think that you need to put all this somewhere. It turned out that most of the scientists I work with now look at Githab with enthusiasm and say: “Well, was it possible?”

People who have been working with Github and git for a long time understand that here you started a new test, it was there that Travis joined, who took it all, pumped it, and drove the new tests. By the way, it looks very beautiful! Travis is twitching, a mechanized hand is moving, which begins to pick up old preparations in a pipette. Incredible picture!

In fact, the most important thing in the story “let's look at tests as code” is what versioning has appeared. Differently began to work with hypotheses. Not of the kind “somewhere we seem to be wrong”, but “let's take a git, make a bisect, find in which piece of code we have a mistake, in which test we made a mistake, at what point we stopped”.

I do not know about you, but these stories excite me greatly. When I begin to think about it, I think, God, well, after all, the stock of tools that programmers have created is incredibly large. He's just gigantic.

And God be with him, with pure versioning as part of Github. First, tests are code. If we describe experiments and hypotheses in the form of code, we have excellent tools for static analysis. We have excellent code analysis tools. Let's look for logical errors, even without starting the experiment? Let's merge all the tests into one big algorithm and look for logical errors in them? No problem.

Here you need to understand that in the farm such Continuous Integration is a rather expensive process, because each test costs some money. Conducting one CI cycle in the current history with a large farm company with which I work takes about 80 thousand dollars. Let's translate differently. If we have a logical error in the experiment we can spend before testing - saving instant 80 thousand dollars.

Programmers are well aware: the linter and static analysis can be run before the commit. Just do not let the test initially erroneous hypotheses. Or say that the error is not in the hypothesis that you now want to add. And this also happens.

And at this moment also a very important thing comes.

When one person works on a chain of experiments, no problem. It’s as if one programmer writes code — no problems, put them in a folder on Samba or in Dropbox, and no problems, everything is fine. At that moment, when there are two programmers, conflicts already begin. When there are 50 programmers, and they all work on about one piece of code, read on one set of tests, of course, there are problems. There is an incredible rampant of creativity for the application of ready-made tools that programmers have developed over the past decades.

At the same time I vote with two hands for Githab. I sincerely believe that using Github outside of just storing code is incredible. With that, of course, I am not a representative of Github in any place.

The emergence of tools for collective work on experiments in combination with versioning allowed to do very interesting things. For example, the guys with whom I work, started each other to do pull-requests with proposals. Just because, well, he went to see how the other team was doing, found an interesting hypothesis, and instead of just throwing it in the smoking room, as is customary in children involved in biology and physics, he made everything simple, he designed a pull request, put it. From the other side, the guys said: “Oh, great idea,” they froze it, and after a while we saw a new test with a new experiment in the database.

Unfortunately, due to the fact that most of the relationships between technical and pharmaceutical companies are not very public, we cannot tell everything. I can say that I know at least one drug that began three years ago with a pull-requisition and which is now receiving FDA certification.

FDA certification means that this drug may appear in pharmacies in a year. Not in ours yet.

Unfortunately, this change in the minds of young scientists is very difficult to overestimate. This is a transition from a closed development, as was the case for many years in the framework of small research teams, to open procedures. I am sure that it will take 3-4 years, and you will see small research laboratories that are all kept at Githab and who are ready to accept pull requests from people outside. And this will be just a bomb. Just another world, where each person can somehow participate in normal scientific activities.

Why is it important? Therefore, why open source is important as such. No, I do not say now that open source is the coolest software in the world, no. Moreover, it seems that this is a catch phrase of fifteen years ago with the signature “Brilliance and poverty of open source”. But without open source there would not be a huge amount of things that we use every day. Half of Android. Without open source, there would be no Android.

The same story is happening now with drugs and it will be cool, it will be incredibly cool when we find ourselves in this world.

Here until, of course, everything is not so fast. But there is an area in which this is our current approach, it is probably the easiest to apply.

There is an interesting approach, which says that it is possible to begin with, in order not to change your whole structure, not to force you to rewrite everything, to begin to do the digitization of the results of those experiments that you are already conducting. And turn them, for example, into a set of just text files. And then use the ready-made tools for working with logs.

So that you understand, I have an incredible story. I tell her with delight every time. When the results of scientific experiments are crammed into Kibana and ClickHouse, ready-made databases containing usually a large number of logs are used for various tests, measurements, experiments, and among other things, standard algorithms for anomaly detection are used. How in Russian is called? In Russian, "anomaly detection" is called "search discord." I myself am shocked by this word, but I like it so much.

The search for discord, as it turned out, is incredibly good in application to experimental science. The coolest place to use it now is that Yandex has interaction with CERN. At CERN, there are several large experiments at the Large Hadron Collider. The smallest of them is called LHCb, in which billions of particle collisions occur. The results of each of these collisions are recorded in the database.

After that, a ready-made set of algorithms is run that finds anomalies there. Objects and events that do not fit into the idea of beauty. I can’t say that big discoveries have been made there now, but if some discovery is made there as part of this experiment, it will be done solely thanks to this IT approach to what would seem to be such a classic area as particle collision analysis.

These are, of course, fundamental changes in science. And in science any. Coming back to the topic of pharma, medicine and biology, I want to say that in fact, the more scientific the science, the more difficult it is to use programmer approaches in it.

Because after all, for example, in physics, a long time ago another culture of experiment. There are accustomed to mathematical methods and mathematical approaches. And in the pharmaceutical, medicine and biology, no. Therefore, when you tell them that there is a means of collective work, and one part of the experiment can be carried out on one part of the continent, and the other part on the other, and there is a system that allows you to combine it all. And more than that: even if you have one person write one thing and another another, you can somehow unite this conflict. There is a system that allows you to automatically conduct those experiments that you add and say that some of them did not happen or something happened. Doctors who interact with experimental medicine, eyes light up.

When you do this, you have a feeling (I hope that it is not false) that you are changing the world. It is quite possible, in 20-30 years due to the fact that you simply taught pharmacists to use Travis, people will die less.

This whole story has another sad side. There are very few people who, like me, are trying to bring IT working methods, working methods and methodologies to other areas outside of IT. I came here to tell you this whole story, largely because you may be able to convey to scientists, specialists, lawyers, anyone, the endless possibilities that we already have in our tools.

For a moment they pushed the whole story about pharma, biology and physics. Imagine for a second that you are working with a law firm. Do you understand that most modern contracts can be written in algorithmic language? Do you understand that modern codes of law are libraries for these treaties? Do you understand that the constitution is the operating system for these contracts? Do you understand that static analysis methods, if all this is converted into an algorithmic language, find defects, errors and problems in this legislation much more efficiently than any professional lawyer?

I have been working in IT for many years, I’m not bad, I think, I can get into the deadlines for the performance of any task. So, in order to digitize all this history, digitize all legislation, bring it into digital form, you need a good programmer, a good lawyer, and, probably, a year and a half. Here is the concept of a startup, if you want, take it.

In fact, we are close to finish. By and large, this approach called “we will take IT tools and bring them to the rest of the world”, it is a little messianic. Like, we have a religion, it is called, that the word "Agile" is already dirty, let's take some other word?Let's just "Tools for teamwork."

To bring the means of automated work to any other specialty is such a mission that allows people to save hours of life, and sometimes just human lives. That is why I am doing this very actively now.

This is all I wanted to talk about.

You can find me like this, it's me.

I am ready to answer your questions. Before we move on, I want to say that I always worry in front of an audience like this. You are all very different. And there are also many people from Yekaterinburg, I myself am from here, and I know that here it is not very common for us to smile. Thank you one of you smiled. It was great, thanks.

, Python, « », « ». Python, - Haskell?

Haskell, , Python, - Python , , machine learning, , , Python. Haskell , Python .

— , — -. . , :) , . - , : « ? ». , . . , , , open source, , , ? ?

20 . ? , . , , Helicobacter pylori, , , , . - . , , 60 . 60 .

10. , , , 10 . . , 10 , .

, . , , , . , . .

« — », . - , 24% . ? ?

, . - — — , , , , , — 24%. 50, . 24%.

? . . , . , , , 6, 9 , , , , , , . , .

, . , Apple Watch. , , Apple Watch. - . : «, ?» : «, ? , 160, ».

I need it. . « , , », «, , , - , , - », . , 20-30 , , 50%. , , , .

. , IT, ? . , , , ?

, , . , , . , . , 50 , - .

, . , , 20 , , , , . .

In the sense that you do not expect that scientists will enthusiastically respond to your suggestions. First there will be some pressure. You come and say: it seems to me that in your particular method, in this particular place, it is not bad to do this. “This is” is, for example, collective work on one article or on one test. Do not wait for delight. Fortunately, in two or three iterations of the interaction, they will realize what happiness it is, and before that there will be a rejection.

It is very interesting what tests are carried out. I understand correctly that the company has a certain set of pharmaceutical tests for certain products? How to introduce new tests?

Right now no way.

For example, we test for allergies, tests of this nature?

. , , , , , - , , , , . . . .

, , ?

.

?

, , , , , . . . .

- ?

, , . . , .

. , ?

, -, , , , .

— -, .

, , …

. , , , , . , - : , - . . , . And that's all.

— . open source , , , , , - .

, , — IT ? , . , , ?

, , , , . , , , , . , , .

, , . , : « , ». , . .

IT? . IT, , , , .

.

: ?

. , , , . 90- . , . , , , .

I have a man-idol, he, unfortunately, died, but he had a brilliant phrase. He once in one meeting, where I was, at which the programmers swore strongly, went out, wrote two lines. Line one: nothing will work. Point number two: progress does not stop. And with this thought that everything will be necessarily bad, but progress cannot be stopped, I live.

Yes, a large number of people as a result of technical progress will lose their jobs. But progress because of this is not necessary to stop. Humanity will find some way out. Unconditional income, compulsory treatment by programming people who have lost their jobs.

I did not quite understand. What exactly is described in medicine by tests on Python?

, , . , .

Continuous Integration - ?

, . , , , . , , : - , -.

DUMP 19 . Science. : (, Postgres Professional), (, Gero), (, KantrSkrip), (Naumen), (), (Tinkoff.ru), (Naumen). — .

Source: https://habr.com/ru/post/445972/

All Articles

Plowed bigdata field in medicine and pharmaceuticals

More articles: