How Google Tests Software

After listening to the “How Google Tests Software” webinar, I was so inspired that I decided to write down some abstracts. This article is my summary. First of all, I have to clarify its content. This is not a literal translation. Here are described only those things that seemed important to me. Simply put, not everything described in the webinar is described here. It is also possible that I did not fully understand something, or even misunderstood. Therefore, I strongly recommend listening to the webinar on your own.
He is led by James Whitaker, who currently holds the post of technical director of software testing at Google. James, together with his colleagues, is preparing to release a book of the same name. It will provide comprehensive information on how to test GoogleMaps, Google+, ChromeOS, Android, etc ...

* Further, for simplicity, the narration will be conducted in the first person, as if I was James.

Philosophy

For the most part, Google preaches the culture of engineers, there are very few prospects for managers. But every engineer can devote 20% of his time to his own projects. This time is the most fruitful for interesting ideas. For example, gmail grew out of a similar idea and became a popular web service. Chrome was a 20% project and became a popular application. With these examples, I wanted to emphasize one important feature of google - mobility. We quickly create new applications. And employees migrate easily from one development team to another.

The google testing philosophy states that testing is not for quality. Testing is part of engineering culture. Testing, this is part of the development, testing, this is what we do before releases. But we do not test in order to raise the quality. High quality is laid by developers, project managers and testers, not tests. It seems to me that our peculiarity lies in the fact that we have made testing an absolutely integral part of our work. No developer can commit a code that is not covered by tests. This all happens automatically as part of the workflow. We do not have posts for a tester, we have roles. We will talk about them later. Testing is work for everyone. But we do not see ourselves as testers; we call ourselves the engineers productivity department. After all, the idea is not to create tests as such, but to speed up development. As it turned out, errors and miscalculations are the main causes of delays. Testing helps us massively reduce their number. Therefore, we can develop much faster. That is why we were able to create Google+ in 100 days. During the summer, that is, in the next 3 months, we were able to add hundreds more new features to the functionality. This is an amazing development speed. And you know what? This product complies with international quality standards. Of course, software is not always perfect. Google+ is no exception. However, we created it quickly and efficiently.
')

Productivity

As you understand, our role is to speed up google, and not to test everything. To increase the productivity of engineers, we have special tools. There are common tools (IDE, compilers, build systems, tests) and there are specific products. We jointly work on product releases, train engineers and, of course, test. We have three roles for testers: SWE, SET, TE.

SWE - our main developers who work on functionality. On them lies the disassembly code (code review), TDD, Unit testing and acceptance testing. They are also responsible for the quality of each code section. It is important. Testers are not responsible for the quality. SWE cannot write some code, and then tell the tester, try to improve this code. This is his own work.

SET - develop an environment for testing. They do not write directly test cases. They create the infrastructure for their creation. They release frameworks, write utilities automating routine checks. They create automated procedures. I tell about it in detail in the book.

The first two roles are 100% related to the code. The third role is opposite; it is abstracted from it. The role of TE is related to users. This separation makes sense for many reasons, the main one of which lies in a different type of thinking. Some defend the code, other users. We want everyone to focus on their business. You do not think that TE is the one who makes the tests manually. TE write a lot of code. Their code sends something to the input, then validates the result at the output. This is a different level of tests. They also write scripts for tests.

Strategy

One of the most important components of our strategy for producing high-end software is called "crawl, go, run." The idea comes down to quick product creation, quick release, quick feedback and quick response. Google has succeeded in this. What we definitely do not want to see with us is the two-year development cycle. When someone runs in and waving his arms, he says that he has a brilliant idea that will change the whole world! And that if we realize it, then it will be cool, and that everyone will certainly love it. This number does not pass with us. If you have such an idea, jot down a code. Implement the most important functionality. Get to know the result of as wide a circle of people as possible. Find out if you were wrong. And if you were right, then you will have users, and you will be able to proceed to the next stage. We use this approach even on very large projects. If you take a look at Chrome, to which I dedicated 18 months, you will find out that we collect a new release every day. We call these releases canary . Anyone associated with a project can take such a release and play with new features. At the end of the week, the best canary release is selected. Each member of the team should work with this weekly candidate, which we call the dogfood release. It is very important that everyone tests the same latest release. In order to avoid detecting already solved errors. Every month the best dogfood Chrome is automatically selected, it is designed for our employees. Later, I will talk more about the tools they use to interact with developers. We found an interesting pattern. It turned out that users of dogfood releases find disproportionately more bugs than developers and testers. This is a very significant moment. Instead of hiring an army of testers who will imitate the behavior of users, we give the product to the Googleers, who really use it. From them we get rich error reports. Finally, we are releasing a beta release that is being tested by dedicated users .

Tests

We divide tests into 3 groups, small, medium and large. We do not divide them into unit / integration / system for several reasons. First, we have a centralized test execution system. If you tell this system that you are going to perform a small test, it will understand that it will not last long and will be able to schedule its execution accordingly. All projects in google use this system and therefore we want to efficiently allocate time between them. The system alternately inserts small tests between large ones and, as a result, runs continuously.

Small tests:

Almost always automated
Run in pseudo environment (mocked / facked environment)
Test one function or one module
Focused on data validation and exception checking
Run in milliseconds
Mostly written SWE

Average tests:

Usually automated
Run in pseudo environment
Test interactions between functions.
Focused on functional data interaction issues
Run in seconds
Written SET and SWE

Big tests:

They can be either automated or manual
Performed in a true environment.
Checked end functionality
Can be performed for hours
TE and SET are engaged in writing.

A few more words about the tests

We strive to automate all checks. If the result can be verified by the machine, we will automate it. If there is no need for human evaluation, we will automate this test. If the check needs to be done several times, we will automate it. If something cannot be left to chance, we automate it. Only if these conditions are not applicable, we allow developers to test manually. But even when the test is performed manually, we record it using the js application. This allows us to reproduce such tests and reuse them. Later I will talk about this application.

We even test the readability of the code. Each line in c ++, phyton, java and java script is checked for readability within our development process.

Both engineers and testers use the same environment. All use the same version of linux, regardless of the task, whether in the data center on servers or on the desktop of the developer. The homogeneity of the environment is very important for the reproduction of errors. For us, local = test = staging = production. If you are wrong, I suggest you think about it seriously.

ACC - 10 Minute Test Plan

When I first came to google I was busy tiring reading test plans. Each project had its own unlike plan. These plans had only one common feature; they were outdated and did not reflect reality. These plans were written, then viewed, and then ignored. Test planning is one of TE's responsibilities and I knew that we needed to do it better. That's what I did. I assembled several of my engineers in the room, gave them one application, said you have 10 minutes, make a test plan and leave them. Of course, I came back after 10 minutes and heard: “Dude, we thought you were joking, because 10 minutes is not enough for a full plan.” I must say that despite the lack of managers in google there is an understanding of responsibilities and responsibilities, so they generally had to do what I instructed them. I told them that they could cope with it, let them try harder. Then gave them another application and a new 10 minutes. This time they have made some progress. I again gave them another application and again took only 10 minutes. This lasted for an hour and a half, until one of the engineers got something valuable. As a result, this technique resulted in a methodology. This is what we have learned:

Give up prose in favor of lists
Do not bother with sales
Do not pour water, this is not an essay
If it is not reproducible, forget
Create a stream, let one flow into another
Accompany tester thinking
The output should get tests

We called this planning ACC (Attributes Components Capabilities) and even released an open source application (there is a live version) that makes it easier to create such a plan. Now we can create a test for any google application in less than half an hour. Making a list of important attributes of the system is probably the longest process. Usually, the last thing developers want is to participate in it. In order to get the main list you can look at the official website of the application, most likely everything is already written there. Here's the trick we use in google to get developers involved. For example, when we made a test plan for ChromeOS, in 20 minutes we described all the features that we only knew. We got a list of 304 items. We came to the developers and said: "You know, ChromeOS is so simple that in fact there are only 304 attributes for the test." They instantly blurted out that this could not be, that the system is very complex, that it is very complex. Developers love to complicate things. They like to assume that their code is the most complex in the world. Therefore, if you say that their code is primitive, then they will want to prove that you are wrong, and this is exactly what we needed. They gathered a 2-hour meeting, which by the standards of google is the same as separating people from work for a week. They sat down and added a few more important attributes to this list and as a result, the list grew to 320 points. This trick works well. Make up part of the attributes and involve the developers, if that doesn't work, start testing without them. Important things gradually differentiate themselves. You can read more about how to make a 10 minute plan in our blog .

BITE Record & Playback

The second tool I would like to talk about and which I mentioned earlier is called BITE. BITE was also released as opensource application. It allows you to record actions in the browser, and then play this material. This tool is used by all Googleers, testers of our internal dogfood releases. They help us create verification tests. You can read more about this in our blog .

BITE Bug Filing

For me, it does not matter how many test books you read. Moreover, I am even the author of one of these books. I am convinced that the benefits of your tests will be less than the real users. They work in a real environment, not in the one you are recreating. They have real data, not the ones that you have invented. They use their own work scenarios, not those that emulate such. You are a tester who just claims to be a user. Users do not claim this title, they are users. Therefore, we decided to turn users into testers. If you think about google maps, and this is another product that I devoted a lot of time to. You will understand that only users can reliably verify the data of their surrounding reality. How can we even check such things? Seriously, this is a global task, literally. Only those who live on this street are able to recognize that there is a mistake on the map. Even though we are google, we cannot hire testers to double-check the entire earth. We must trust our users and we must rely on their knowledge. The result is that this is a good idea, because they are better than us. This utility is currently not open. It is directly connected to our database of errors. When you click on the map, this utility makes snapshots of the dom structure, collects information about the browser, the operating system, in short, all that is needed to reproduce errors.

Quality boots

And lastly, I will talk about another utility that we use to test the Chrome browser. When we first started testing chromium, we did a lot of manual checks. In fact, for me they were devoid of any meaning. For us it was important not to "break the Internet." For us, it was important that Chrome continued to correctly display popular sites. We did not want to release a browser that would not render cnn, facebook or other popular services. Of course, it is easy to make a test for downloading popular hundreds of sites and find out if they are causing the browser crash. True, this is not exactly what we need. In fact, we want to check 10 thousand of the most popular sites and check if they look like in the old version of our browser and how they look now in Firefox, Safari, and IE. The application we made is called Quality Bots, we also told about it in our blog . The app does two things. Compares the DOM and checks the location of all controls on the page. Of course, sites that have ads every day will look different. For us it is important that they are approximately similar. If so, then we consider the test passed.

Epilogue

Those who are interested in the details, can follow James' twitter , plug it in the pros or read a blog where articles on this topic appear regularly.

Source: https://habr.com/ru/post/135776/

All Articles