
There are already a lot of collections of very interesting bugs on the Internet - the most
funny ones and those that brought the maximum damage (for example,
here ).
But other people's rakes, of course, are not taught as reliably as their own, adjusted to your height, with a carved handle and sharp teeth, those that are lovingly designed with curved hands of developers and carefully laid out by insidious users.
Therefore, congratulations to all those involved in the process of production, finding and correcting bugs. By the way, of course, I am proud of our team of testers. These people are able to catch all the possible bugs - from primitive-type to unpredictable-magical - and every day they do it very successfully. Therefore, I will not talk about everyday victories. On the contrary, in honor of the holiday I will describe the most epic of our mistakes and packs.
')
Here I deliberately do not cite cases where the problem is on the user's side or is related to the seller’s misunderstanding, such situations, of course, deserve a separate study, and today only our own mistakes and mistakes.
To save some intrigue, each case will consist of two parts - how it looked initially, and what we found out in the end.
So -
5. Deja vu with unpredictable effect.
What it looked like:
Once, on a bad October day, a local apocalypse happened. All clients from different countries, right from the early morning began to complain about the system, and the complaints came a variety. Someone just could not enter the program. Someone started, but showed yesterday's data. Someone shamelessly braked. Someone gave recommendations for those advertising campaigns that have already ended.
The situation would be understandable if, for example, we had just put a major release. Or if only one client fell. Or, at worst, all clients are in the same data center. But we didn’t make any changes on the client servers for a long time, and between all cases there was nothing in common until ...
What really:
... so far, as is usually the case, we have not understood the logs. Since our application is very resource-intensive, most of the calculations are carried out at night - so that by morning users already receive fresh recommendations for improving the network. Plus, the initial data for the previous day are ready not earlier than 1-2 am.
And synchronization and calculations, as usual, waited for hours X, data availability, and started to update. And then, at exactly 3 am, throughout Europe, the servers switched back an hour, to winter time.
And our system again scheduled the second copy of synchronization and calculations. And two systems operating in parallel, simultaneously updating data, are an inexhaustible source of unpredictable situations.
4. Vasya was here!
What it looked like:
We are written by one of the client’s operators: “Oh, great, you decided to advertise yourself on our websites all over the world! Well done, cool program, you should know about. But I don’t understand why you show ads to everyone? ”
What really:
At first we fall into complete bewilderment, for we were not going to advertise at all. And then someone guessed to look at the client site. And I saw the Maxifire logo on the main page. That, in principle, should never be.
The answer was banal and offensive. The test advertising campaign that we needed to debug the algorithms was mistakenly launched not on our own test site, but on the entire advertising network of the client. True, to be absolutely accurate, it happened because of the undocumented feature of the ad server with a certain type of targeting, which we just inadvertently uncovered during the testing process. As a result of this sweet mistake, we showed our logo throughout the day across Europe about 5 million times.
You have to apologize, compensate for losses, etc., etc., etc. Once again you understand that the most serious mistakes are made not by the program, but by man. Or, as they say, it is possible to provide protection against the fool, but only from non-ingenious.
3. Believe in our time can not be anyone. I can!
What it looked like:
We have a demo show for a strategic client. The standard demo does not suit him, as usual, salespeople want to demonstrate all the newest, something that only existed yesterday in their heads, and today, still uncertainly, in the code.
We, in soap and foam, install the version. Before the show half an hour. A letter comes from one of our vendors: “Listen, I know that our system is very, very intelligent and analyzes all the dependencies of the network as a person can never. And what she predicts more precisely - I also understand. But here in front of me on the screen is a list of recommendations with a forecast. And for all recommendations, the forecast is negative. Of course, I understand that our system knows something important and therefore offers them for use. But could you tell me what to answer to the customer if he asks? ”
What really:
Well, what can be done in half an hour, if you suddenly realize that a bug has crept into the forecast system, and now it gives out incorrect values ​​(read - full bullshit). But there are not only values, because there are graphs of the forecast are drawn. The bug is not clear where, for such a period, nothing is fixed. And to increase the chance of selling a product to a new customer, oh, as you like.
As a result, we are not changing the algorithm, but the output. Random from a small plus value - and now all our recommendations generate quite reasonable numbers. Plus a few static graphs, drawn by hand in 5 minutes, - and now there is already a visualization of the progress of advertising campaigns. Next, a short briefing salesman - and the demo is a great success.
I know that it is not good to deceive customers, I know. But here, almost like in auto-saloons, that sparkling and brilliant miracle that you are advertised is not a fact that you can immediately start and move out of the runway. But it doesn't matter to you - but it’s necessary that when you have already bought, sit inside and insert the key into the ignition, there will be no problems and disappointments at the last moment.
By the way, that's why I love selling on the SaaS principle. For he does not allow the developer to sell any idle system, and then dump with the attendants at the Canaries, leaving the deceived customer at the back of nothing. For they pay every month for the result. Well and, of course, the customer simply take advantage of the results and not pay is also very problematic. Win-win situation.
2. I don’t understand yours
What it looked like:
This time the complaints went from our support service. Vida - “The user asks a question about such type of recommendations that I have never heard of” or “The user sends a screenshot of the message that our system cannot generate in principle”.
And while programmers, developers also do not recognize the system and can not help.
What really:
To be absolutely honest, in this case it is not a bug, but a planner of communication and planning. Initially, the system was developed from Russia, but to the Western market. And all the names and messages were written in one way or another in the “ruglish”. Of course, the literacy level was sufficient, but probably not perfect. However, customers are accustomed to, have not complained and have worked normally with the system for several years.
And then, as part of improving the product, we hired a technical writer in New York. And she enthusiastically set to work. But while the documentation was being translated, everything was not bad. And then one of the sellers from the interface became incomprehensible to someone from the sellers, and the CO came to alter all the names and explanations so that it was absolutely correct from the point of view of the native speaker.
And all the names from the brief became redundant. But there are two nuances - first, the technical writer understood their meaning based on the existing documentation, which is not always complete and accurate (well, plus to understand and explain the difference between several hundred types of improvements that we offer to the client to increase efficiency, the thing itself Nontrivial in itself. And it is clear to explain this ...). And secondly, the whole technical writer kept a list of correspondences between the old and the new.
And he gave the result at the very last second, when, as usual, in the trawl, the next major release was going. And the list of correspondences between the old and the new was the only programmer responsible for localization. And then at the only tester who was responsible for checking.
As a result, customers went to a system that was very unusual, not only to them, but to us. And any question put support (also in Russia) in a stupor - because she just did not understand what is meant. And the programmers could not help - and they saw the system for the first time.
The feeling is very similar, as if the interface suddenly switched to an unfamiliar language. As a result, of course, we found a list of correspondences, finalized it, printed it out for everyone, and after a while we got used to it, but the initial shock and damaged blood were remembered for a long time.
1. As the Christmas tree almost ruined Christmas
What it looked like:
- Igor, you will not believe it, but our Christmas tree destroyed the entire system.
- Not understood? Are you sure this message is for me?
- I remind you. In honor of Christmas, we decided to make customers a surprise. And they changed the icon of our program to the image of a Christmas tree. Nothing else in the version has changed. But now the system does not work for any of the clients.
- How??? How is that even possible?
What really:
As it turned out, it is easy, if it is “right” to design the system. We have both the version with visualization for the user and the “invisible” versions that are intended exclusively for computing in the cluster mode - forecasting and generating recommendations. those. service launch that does not require a graphical interface.
And now, having replaced the plugin, we put it to all versions. As a result, each version is initiated at the time of launch and tries to display that same Christmas tree — it looks for the title of the window, the title bar and so on. And the OS is responsible, I'm sorry, they say, the graphic display is not assigned to you. And the plugin throws out an irreversible exception, which we have not guessed to intercept, because initially this situation was simply not foreseen.
The conclusions, again, are worthy of Captain Obvious - no need to put a version without testing how insignificant the changes might seem. It is necessary to catch exceptions and handle them, however unlikely they are. Well, of course, do not do everything at the last moment.
I hope that all these stories raised your spirits or gave you food for thought. If you have curious or instructive stories from your own experience - please share in the comments - I wonder.