📜 ⬆️ ⬇️

When the program falls only on Wednesdays

Lean back and relax - it's time to talk about one of my favorite bugs.

This was my very first work in the field of IT - a summer practice in a company that develops very serious medical equipment, in particular , anesthesia delivery systems and monitors for hospital patients . A patient monitor is such a squealing box next to a bed patient that measures pulse, pressure, respiration rate, etc., and also alerts the nurses in case of trouble. The office was full of two-meter cylinders of laughing gas, luxurious beard guru-experts on embedded systems strolled around it, and there were entire storage rooms specifically for the documentation needed to certify various equipment. The people are still whispering about a bug that testers missed a dozen years ago, because of which the anesthesia delivery system restarted in the middle of the operation. Needless to say that a green student jugger like me wouldn't be allowed to enter combat systems a mile away?

Instead, I was assigned a prototype project to try out the hottest technological innovations of 1997 — a C ++ server that listened to monitors on a serial port, merged interesting data into a database on SQL Server and sent them to a Java applet through CORBA, so that Doctors and relatives could monitor the patient's condition through the Internet. Beauty is the same! Especially considering the fact that I didn’t have practical experience with any of these systems and technologies!

After a few weeks of hellish crushing, mostly spent on smoking the Visibroker ORB manuals and tedious catching of type-conversion errors, my Simpson system was more or less ready. The server “Homer” saved and gave out the data, and the client “Bart” displayed them for the user. During this time, I learned that CORBA is hopelessly overly wise, AWT is tin ( GridBagLayout , brrr!), Applets work at a snail's pace — but otherwise, Java seems to be a good language. I was worried about only one minor bug - from time to time the server in C ++ crashed, and I decided to find out the reason.
')
Since in the next room we had a test stand with a real monitor, during development and testing, I used an extremely convenient “demo mode” that happily reproduces a heart attack imitation in a circle. In this mode, my server never crashed - it did it sometimes during the manual control of the monitor, especially during the shows, but at least you are bursting, I could not achieve stable failure playback. I added logging to all events, ran back and forth between the monitor and my workplace, slowly and thoughtfully reproducing the necessary steps ("set the filter in X , turn the control knob exactly three divisions clockwise, click here ..."), but the drop is not reproduced. Whatever this “evil event”, as I called it, it avoided any logging! Maybe the problem was in I / O or at the level of iron? Maybe cosmic rays dumped bits in the memory of my computer?

Several weeks of fruitless experiments led me to complete despair. I went so far as to add printf output after every damn line of code between retrieving data from the serial port and writing it to the database ... and in the process, after reviewing each of these lines many times, it suddenly struck me.

When I described the database schema, in an inexplicable attempt to save space, I foolishly and underage used the timestamp event as the primary key. If two events come in the same millisecond, the base will throw an exception about the violation of uniqueness. I found it quite early, but I thought that this could happen only in some very strange cases when someone was digging into the internal settings of the monitor, and with a clear conscience wrapped this code in try/catch with error logging.

But! The logging code was written in the old school style, and the error text was written to the string buffer 80 characters long. The message about the violation of the uniqueness of the key has always been the same, but the date was set before it in an extended format in English, such as this:

Monday, July 17, 1997, 10:38:47.123


The names of the days in English have a funny property:
TitleLength in characters
Sunday6
Monday6
Friday6
Tuesday7
Thursdayeight
Saturdayeight
Wednesday9
Already guessed?

On Wednesday, and only on Wednesday , if someone changed certain monitor settings in a certain way, two events could occur simultaneously and cause a failure in the database. And the error message was exactly 81 bytes long (including the zero character), overflowed the buffer and caused the entire system to fall!

Since then, I have learned three things. First, always use the auto-increment primary key in any table I need. Second, log the date in ISO format - YYYY-MM-DD , without the days of the week. But the most important thing is that even the most random and unpredictable bug has a logical explanation, and you can find it if you dig deep enough.

Source: https://habr.com/ru/post/263871/


All Articles