Translation of Jeff Sauro's article from MeasuringU .
Many are familiar with the Hawthorne effect , experiencing which people behave differently when monitored.
This effect got its name when the researchers found that the workers at the Hawthorn factory did their job better not because of the improved lighting, as they thought, but because they were under observation.
This effect works not only on humans, but also on particles. In physics, external observation of a phenomenon (for example, the movement of subatomic particles) changes the result .
Although there are still some questions about the details of the Hawthorne experiment and the fact that humans are not subatomic particles , there is strong evidence from other sources that humans tend to behave differently from outside observation.
The effect of "White coat" showed that the patient's blood pressure rises from the psychological impact when visiting a doctor (who likes to go to the doctor?). The effect also varies with the gender of the observer and the participant in the test. The study showed that, for example, women are physiologically susceptible to non-acceptance by society, while men are more sensitive to problems on the way to achieving goals.
Interestingly, the behavior is not always consistent. There is some evidence that when people perform simple tasks, their effectiveness tends to increase while observing them. And vice versa: efficiency decreases when tasks are complex and less familiar.
As an example, the effectiveness of experienced billiards players increases when they are observed, while the effectiveness of inexperienced players in this case decreases. This effect also affects some of our favorite insects. Cockroaches , being under the supervision of their relatives, are able to go through a simple maze faster, but have difficulty passing difficult mazes.
Since most of us are not particularly interested in playing billiards and conducting cockroach races, we will focus on usability testing, which is based on monitoring the behavior of its participants. We use both observation equipment (video of the screen and the respondent’s face, sound recording) and physical observation by the moderator and often other observers located at a distance or in close proximity. Does this observation affect the test results? If so, how? There are several studies that shed light on this issue.
In 2005, Harris et al. set 100 students with simple and complex tasks in Microsoft Word. Participants were divided into 4 groups with different conditions:
The researchers found differences in the level of errors between these groups, but did not reveal a clear pattern. With somewhat puzzling results, they found the group without moderator (2) and the group with moderator and reminders (3) made significantly fewer errors than the other two groups (1 and 4).
In another study conducted by Grubaugh at al. in 2005, the effects of surveillance equipment were studied. 150 university students were assigned a task in the Microsoft OneNote program. The students were divided into 3 groups, where each group was equipped with a different set of observation equipment:
The study showed a higher level of errors in the case of strict observation (group 1) compared with a simplified laboratory (group 3).
In 2009, Sonderegger & Sauer set 60 respondents with tasks on mobile phone prototypes, dividing the respondents into 3 groups:
In much the same way as in the psychological experiment at the Hawthorne Factory in the 1950s, the researchers used wire devices to track the changes in the heart rate of the participants (a measure of stress). They also determined the values of several usability metrics (ease of perception, task execution, lead time, interface attractiveness and participant's emotions).
They determined that the presence of two additional observers (a man and a woman) had a negative effect on the participants:
Participants found no differences in perceived usability (using PSSUQ ) and interface appeal.
In 2014, Uebelbacher asked 80 participants to solve simple and complex tasks in a prototype mobile application for tourists and in a real application on iPhone 3. Participants were divided into 2 groups with different conditions:
No differences were found in the performance of tasks, time of completion, ease of perception, and emotional performance. However, a difference in heart rate was found. The researchers could not reproduce the results of Sonderegger & Sauer, but showed that heart rate variability is a sensitive measure of stress. They noted:
“When observers were present, there was a much stronger increase in the average heart rate from the rest stage to the completion stage of the task (+ 5.4 ppm) than when the participants worked independently.”
Uebelbacher further noted that the observed participants reported that they were disturbed by observation, but this was less true for older participants who felt less observable than participants younger (the potential effect of age).
Source: https://habr.com/ru/post/346162/
All Articles