There are bugs and there are bugs. And if bugs are usually fixed and forgotten, then BAGs stay with us forever. I want to share with you three of these BASES.
The first such incident occurred in 2005, when I worked at FriendScout24. We had a monitoring tool in which there was a htmlnaya plate and in each line on the server. If the server responded normally - it was painted green, if not, then red. Usually everything was quiet greenbacks. And then, one fine August day, the servers began to fall down the ladder. Pam-Pam-Pam - 4 servers in 3 minutes. After 5 minutes, everything turned green again, as if nothing had happened.
')
This was repeated the next day, every other day, and so on all week. After the usual suspects (loadbalancer, javascript) were eliminated, Oliver (one of the frontend virgins) hypothesized that this is some kind of user. Since there were about 2 million users and about 25,000 users logged in at the same time, it turned out to be difficult to find. But in the history of FriendScout24 there was already a situation when one user put the entire system, so we decided not to give up.
And so, in total, the cause of all evil was photography. But not entirely simple. One girl decided to enrich her profile with a photo, which in itself is commendable and welcomed. However, her photo was only in PDF form. Like all normal portals of the time, we did not accept PDF, but accepted JPEGs and various GIFs there. The girl - not a fool - renamed foto.pdf to photo.jpg. Thus, she bypassed the mime-type check and her picture swam into the wilds of the system. In these wilds sat
imagemagick , then the state-of-the-art library for photo processing. So,
imagemagick is also not a fool, instead of saying that it is not jpg and sending a photo back, recognized the content in the pdf photo and called ghostscript of its chum to process this pdf. And since no one was ever going to process PDFs on these machines, no ghostscript was there and around that caused an easy seg fault in the native lib, and safely put the JVM to rest nearby. Oops.
The girl did not lose heart and tried everything again on the next server and killed the server one by one. Thanks a lot to her. Thanks for not having enough patience to try it 12 times, that’s how many web servers we had then.
The second bug occurred in prehistoric times, when I was doing one of the first versions of
this site . The site contains information about all kinds of dry cleaning and laundry machines, and all these machines were asked in the content-management-system (cms) from which the site was drawn. At first everything was fine, satisfied customer and all that. A week later, the customer called and complained that the addition of new cars lasted somehow suspiciously long. I checked, the logs are empty, the server is idle, I did not find anything. The customer calls again, says added 100 cars, now each new car is added a minute. Looked, checked - the truth speaks. In general, a long time job is done, but soon the fairy tale affects, put a time dimension on almost every line, found a scoundrel. Long did not believe my eyes:
log.debug (cache) .
At the same time, debug itself was turned off, so I didn’t see anything in any of the logs, but the toString method of this cache simply outlined the content in all its details. And lasted more and more. Edak three minutes on one operation. In general, since then I always use
log.isDebugEnabled () . Though time was well spent.
And finally, my favorite. The main bug of all time. It was in 2003 at the same FriendScout-e. Before they hired me (maybe that's why they hired me). The platform at that time was very unstable, fell often and was supported by people who understood little of what they were doing. And when people do not want or can not understand the reason for the bad behavior of the system, they have one method of repair - ctrl-alt-del. After all, what is good on Windows should be good everywhere, shouldn’t it?
In our case, one of the administrators wrote a super-vumny script that read the system logs and if it found the keyword
FATAL there , then it would restart the entire application. With all 25 servers, moorings and steamboats. When the restarts were frequent, they had to revise their policies. And it happened like this:
A woman calls the support team and says:
Woman: “Why, when I log into your system, does it immediately turn off?”
Support Agent: “What is your login?”
Woman: "
femme-fatale " (femme fatale).
A curtain.