📜 ⬆️ ⬇️

Michael Cohen "Voice User Interface Design". 2004 [Translation. Fragment]

The booming prosperity of the western industry of voice applications led to numerous studies in the field of usability of voice interfaces.

The classic study in this area is the book Mike Cohen (Michael Cohen) "Voice User Interface Design" (2004).

Under habracut, a translation of one chapter of this book is devoted to the need to take certain factors into account when designing a VUI. Of interest to developers of mobile applications and voice menu structures, interface optimizers, as well as all interested in voice technology and usability.
')
Mike Cohen is a recognized authority in the field of speech technology, the author of more than 70 papers and many patents; Professor at Stanford University.
In 1994, he became one of the founders of Nuance, today the leading western company offering speech solutions.
Since 2004, Cohen has been a staff scientist at Google, where he heads the Speech Technology Group department.


So, - the chapter "Reducing mental effort"
* Caution: large and serious text

Ps . If you don’t have time to read, you can download and listen to this text as a synthesized podcast .


Cognition is the processing of information coming from the outside world. It includes perception, attention, pattern matching, memory, language processing, decision making and action. Mental efforts are a collection of mental resources necessary to perform the listed tasks.

All user interfaces require the mental effort of users. The user is required to master the special rules for using the system, become familiar with new terms, and also put this information into short-term memory. He is required to understand how the system works and how it can use it. Systems that use interfaces that are designed only for hearing present a significant problem for human memory and attention, since the information in them is transmitted in parts and not permanently. A good user interface design should respect the limitations of human mental effort. If the interface design requires the user to memorize too many concepts or an understanding of the whole complex of new commands, then such a design can be considered failed. This chapter describes a number of basic principles for reducing mental load when a user accesses voice access interfaces (from a mobile or landline phone).

There are three problems associated with mental load, which must be borne in mind when designing a design:

  1. Conceptual complexity . How much effort must the callers make in order to deal with new concepts. How new concepts correspond to the concepts and procedures with which users are already familiar?
  2. Load on memory . How much information users should keep in short-term memory. How many new data (for example, commands, procedures) they must memorize.
  3. Attention . How easy it is for the caller to perceive the most important information. Whether the user's attention will be dispersed. If the user is distracted for a moment (for example, while driving), can he, returning to the system, continue his interaction with it without hindrance.


In the following chapters, we will discuss each of these potential problems and present the guidelines for their solutions.

9.1. Conceptual complexity
The conceptual complex is, among other things, the volume of new concepts that the user must learn, as well as their inherent complexity. However, the consideration of conceptual problems goes beyond the simple counting of concepts and the measurement of their complexity. A question of conceptual complexity is also a question of understanding human abilities in general (which is difficult and easy for human perception), as well as the context in which users will act (for example, how this application will interact with already existing knowledge, skills, expectations and thought patterns users).

In this book we do not consider the theoretical foundations that allow you to accurately predict the difficulties of individual design decisions. At the moment, there is not enough knowledge to create such a theory. Here we will present only a number of guidelines that will help you reduce the mental efforts of your users.

The following principles are considered in this section:


9.1.1. Setting constants
Graphic user interfaces use the ability to display information (sometimes a lot and immediately) on a computer screen. For example, many GUIs are implemented as panels (see Figure 9.1), usually placed at the top of the screen. This is a toolbar, which usually consists of icons that represent different actions and are a visual reminder of the selected actions and the way to start them.

The toolbar is fixed: it remains on the screen, and the icons on it do not change. The persistence of the toolbar reduces the need for the user to remember a number of actions and commands.

Such consistency can also be achieved in voice user interfaces (VUI) - to create a small set of voice commands that are always available, regardless of context (see chapter 5.2.2). After users remember the universal commands, they will be able to use them at any time during subsequent calls. These commands, in essence, become the mental panel of actions that are always available (see Figure 9.2).

image
Fig. 9.1. Graphical user interface displays pinned icons

image
Fig. 9.2. " Thought panel " speech commands

It is unreasonable to expect that the user will master a large number of universal commands. Although their number may still be slightly increased due to the prevalence and applicability of universal commands in the voice technology industry. Considering that the number of universal commands should be small, it is better to associate commands with functions that the user can use to resolve difficulties — for example, to get additional help or instructions, to move to another level of the application, or to switch to a live operator. Successful use of such universal commands should improve the performance of operations, automation and user satisfaction.

For universal commands, it is necessary to choose phrases or phrases that are intuitive and easy to remember (for example, “ Help ”). Commands should have the same meaning, regardless of the point at which they are pronounced. For example, the Help command means that the user wants more detailed instructions on what can be done, no matter at what level of the menu the request is made. Despite the fact that the answer that the user receives will correspond to the current context, such commands should always be available.

Two standardization committees investigated the issue of universal teams - Telephone Speech Standards Committee (TSSC 2000) and European Telecommunications Standards Institute (ETSI 2002). Both committees examined existing interfaces and existing experience using universal commands. Experiments were conducted to find elements of universal terminology that would most closely correspond to user behavior patterns. Both committees provided similar preliminary conclusions.

The following list presents a set of generic commands that we would recommend for all applications. The word or phrase in parentheses means the command that the caller will use. Our list is based on the research results of the two standardization committees, as well as on our own experience in implementing voice applications. In the future, if a particular standard is adopted, we will support it. The entire voice industry and of course users will benefit if certain universal commands are standardized.

Specifying universal commands:
[ help ]: providing help or additional instructions in the current section of the dialogue;
[ repeat ]: repeat the last message.

Navigation universal commands:
[ main menu / return to the beginning ]: return the user to the beginning of the application (from any level of the application);
[ back ]: go back one step.

Final universal commands:
[ operator ]: switch user to operator;
[ bye ]: allowing the user to interrupt the conversation with the possibility of conveniently stopping the communication.

The farewell command is included in this list, since an analysis of the data we have showed that users are saying goodbye to the system, even if they do not know that such a command is implemented in the interactive system. Usability studies have shown that many users prefer to end the dialogue with the system by saying “ goodbye ” rather than just hanging up. Apparently, this behavior gives them confidence that their session of communication with the system is really completed.

It is necessary that users be notified of the presence of universal commands; otherwise they will not use them. One approach is to inform the user of this at the very first access to the system. The description of other universal commands can be included in the final part of the “ help ” message, as well as in the hints when there are errors communicating with the system. For example, in an application used when contacting a bank, the following initial appeal is possible:

We welcome you to Western Valley Bank. If you have any difficulties using this service, just say: “ Help ”. What do you want to do now? - pay the bill, check your balance or transfer money?


If the user says “ Help ” in the middle of a balance receipt transaction, the system can provide the following information:

This is how we can help you. You requested balance information, but we don’t know what kind of user we are talking about. You can say " Account " or " Check Account ". Also at any time you can say " Main Menu " or " Operator ".


9.1.2. Setting analogies
You can also reduce the mental load of users, paying due attention to the similarity of elements. The idea is to allow the user to perform the same steps in the same ways. For example, there may be several lists that are provided to the user during the execution of an application (stock lists, unpaid or paid bills, and under.). Using the same moving strategy, users transfer their skills from one list to another. A similar sequence can be applied at different levels. Both the strategy and terminology of the dialogue should be similar. Grammar constructions should also be similar; if you use one place a detailed system of accounts of money, do not use elsewhere rounding.

As practice shows, users learn at every committed action. Each interaction between the system and the user improves the “mental model” of users. This model includes an understanding of the capabilities of the system, commands for performing various tasks, grammatical models, etc. The establishment of analogies makes the process of mastering models more efficient and effective. While the lack of similarity of similar elements leads to ineffective learning and confusion, and results in a negative experience of communicating with the system.

Consider an example of moving a user through the list. In the above dialogue with the broker system, two lists are moved: the list of companies' observed securities and the list of sales and purchases of shares. Note that the user uses the same system controls in these different lists.

(one)
System : What would you like to do?
User : View a list of securities.
System : International Business Machines - eighty-two, up one point. Amazon, thirty-six, down at ...
User : Next.
System : Ford Motor Company, ten points ...
User : Next.
System : Charles Schwab Corporation, eleven ...
User : Finish.
System : What would you like to do next?
User : Tell me about the auction.
System : You have five open orders. The first is to sell one hundred shares of Microsoft ...
User : Next.
System : Buy two hundred shares of Cisco Systems at seventeen ...
User : Next.
System : Buy five hundred shares of International Business Machines at eighty-two ...
User : Finish.
System : What would you like to do next?


9.1.3. Contextual setting
Context setting is another important concept that, when applied to a voice user interface, helps reduce mental load (Weinschenk and Barker 2000). Psychological studies have shown that people understand and remember information more easily when it is presented in an appropriate context. For example, consider the following verbal passage (Bransford and Johnson 1973):

“The procedure is really very simple. First you divide everything into different groups. Of course, one serving can be enough, depending on how much needs to be done. If the facilities are not enough and you have to go somewhere else, this is the next step; if not, then everything is going very well. It is important not to overdo it. That is, at one time it is better to do too little than too much. At first, this may seem trivial, but difficulties can quickly increase. Mistake can be expensive too. At first, the whole procedure will seem complicated. However, very soon it will become just another little thing in life. "

Reading this paragraph, you, of course, it was difficult to understand what exactly it was about. But having the context of "washing clothes", you can use your knowledge of washing clothes to decipher all the incomprehensible in this passage. The “procedure” mentioned in the first sentence is washing clothes; “All” is clothes, “different groups” - groups of clothes of different colors and so on. Now, if you re-read this paragraph, you should understand it perfectly. When participating in the experiment, without deciphering the idea of ​​the content, were asked to recall as many thoughts as possible from this text, they were able to recall about three key fragments. When the story was preceded by a message that it was a laundry, the participants in the experiment recalled two times more elements. Context helps people to connect new information with already known concepts, which, undoubtedly, reduces the mental load.

One way to create a context in a user interface is to use metaphors . As discussed in Chapter 4, a metaphor is a sign of an object or a scheme, which is used to facilitate understanding of unknown elements. You can use the desktop metaphor or shopping cart metaphor.

To study whether the metaphor really helps users of voice interfaces, British Telecom scientists conducted a study that compared three automated voice systems for making purchases. One system did not use any metaphor, but was a simple description of the goods located in the voice menu. Another used the store metaphor, in which users chose to move from floor to floor on a virtual elevator (with corresponding sound effects). The third system used the log-catalog metaphor. Users rated a system that uses the store metaphor more than a system that does not use any metaphor at all. The system with the metaphor of the magazine-catalog is located in the middle. In addition, users are easier to navigate through the elements of the system in which the metaphor was used. These findings suggest that contextual communication using metaphors increases user satisfaction and system efficiency.

9.2. Memory load
Callers cannot acquire a large amount of new information at one time and will not memorize new information, which in their opinion is useless. There are a number of ways to create menus, query wording, and instructions that help reduce the load on user memory.

9.2.1. Menu size
In the famous article “The Magic Number Seven, Plus or Minus Two,” Miller (1956) described the structure of human short-term memory, which is capable of storing seven plus or minus two elements. Often, this article serves as a guide for designers in terms of how many items to use in menu items. Nevertheless, the task of extracting information from the sentences being listened to turns out to be more complex than with Miller’s experiments. Experiments in which it is proposed to listen to a series of sentences and remember the last word in each sentence are more suitable for our task (Daneman and Carpenter 1980). In experiments using this fully audible approach (which also emphasizes the understanding of the sentence), on average, people remember about three elements.

Another study of human memory has shown that people most naturally group elements into triples, and recall occurs better when elements are grouped into triples or quadruples (Broadbent 1975; Wickelgren 1964). Thus, the combination of these studies suggests that the load on the user's memory should be quite small. A reasonable restriction will be to reduce the menu elements to the level of three or four elements. Researchers at Gardner-Bonneau (1992) and Schumacher, Hardzinski, and Schwarz (1995) also recommend using four or less menu items.

9.2.2. Memorability
When you compose hints consisting of expressions that users can use in their query, place them at the end of a sentence so that they sound last. For example, the phrase “To listen to the list again, say:“ Repeat list ”” is better than the phrase “Say:“ Repeat list ”to listen to the list again.” The user's memory voltage in the first case decreases, because he only needs to remember the last phrase he heard. This effect is often referred to as the “memory effect”. Such a sequence (first assignment, then action) was inherited from the standards of tone telephone systems (Balentine 1999). But there are also language reasons indicating a more advantageous arrangement of such expressions at the end of a sentence. These reasons are discussed in Chapter 10.

9.2.3. Teams
Applications with more functionality, especially those that involve multiple use, often include a learning mode for using the system. The list of commands, often sent to e-mail or posted as a list of tips, is not very effective. Most users do not read the instructions before using the system. Therefore, the application must be self-sufficient. It should provide an opportunity for inexperienced users to receive all necessary assistance already at the first use of the service. Below we consider two approaches.

Tutorial
Some systems offer online lessons , demonstrations , or a combination of both. The audio guide option is usually present in voice systems in the first step of using them. Such an approach is usually used in subscription services or in services designed for repeated use (for example, personal diaries, bank directories or brokerage informers). Manuals include step-by-step instructions on how to use the basic functions of the system. Demonstrations consist of recorded conversations between an imaginary user and a system. Voice prompts of the system are played during the actual use of the service.

The guides, which are a demonstration of user interaction with the system (CCIR-4 1999), and interactive tutoring lessons (Kamm, Litman, and Walker 1998) have found their need for new users of the system. However, users find it very difficult to assimilate both guides and demonstrations in which too much information is presented (Balogh, LeDuc, and Cohen 2001). There are two key rules regarding manuals:
  1. Explain only a small number of concepts.
  2. Make the guide interactive. Enable user interaction with the system.

Timely team
If it is necessary to clarify a large number of functions, it is impossible to fully rely on a guide or demonstration. First, the user is difficult to perceive a long description of various functions. Secondly, if the function is not immediately used, its value may soon be forgotten. As a rule, users are not too patient to listen to lengthy instructions, especially if this does not help in solving the problem.

The concept of a “timely team” removes these two limitations inherent in instructional guidelines (Cohen 2000). The idea is to provide the user with an instruction directly at the moment when one or another task is in front of him. The amount of new information provided at the right time is small, and it is immediately applied.

Consider, for example, a personal diary with a large set of functions. Instead of listening to a detailed manual when using the system for the first time, the user is informed about the functionality of the system when they are first accessed. For example, when a user first requests information about a route and traffic jams, the following message may be provided:

You can get an accurate schedule of cars on the main highways of the city, pronouncing the name of the place. You can also save time by naming the name of the road or the main highway or the name of your destination. For example, one might say, "Highway 101 in San Francisco."


A timely command can be offered at the moment when the user first encounters a new function for him. In addition, you can offer commands to users who encounter a number of problems when using the system (frequent errors, problems with recognition, going beyond waiting time limits, etc.). Also, timely commands are possible in cases where the user does not use all the capabilities of the system (for example, does not use quick transitions or specially impoverishes his speech when prompted).

In the following two examples, we compare the two approaches of providing instructions. In example (2), the user is provided with a guide when they first access. The manual is lengthy and covers a large amount of information. It seems doubtful that the user will be able to remember much of what he heard. In example (3) a timely command is used. The same material is covered; The example shows only commands for quotes and a list of stocks. However, the system offers short comments when this information is relevant and useful to the user.

(2)
: Princeton Brokerage. , , , . , . , « », , . , , , : « ». , « ». , « » « » , , . , : « Intel » « Apple ». , : « ». , : « ». , .


(3)
: Princeton Brokerage. , , , . ?
: .
: , , .
: IBM.
: International Business Machines .
: Cisco.
: Cisco 17.25.
: .
: , – . , . . : « ». , ?
: Cisco.
: ? : « ».
: IBM.
: : « ».
: Intel.
: ?
: America Online.
: ?
: < >
: , ? < > : « ».
: .
: . , , : « ». , : « ». ?

9.3. Attention

Attention is the process of choosing a subject of concentration among a multitude of equal subjects (Preece, Rogers, and Sharp 2002). The way information is conveyed to users can have a significant impact on how easily users can focus on the information that interests them.

Let's look at an example of reading flight information in an application that allows you to plan a trip. Imagine a user requesting information about planes from New York to Boston "this afternoon." Suppose the system has found four flights in a database matching the search criteria, and must inform the user about the results. Provide the following list of four possible flights:

(four)
System: United Airlines flight 47 departing from New York Kennedy Airport at 13.00, lane 36 and arriving in Boston Logan at 13.45; lane 22. Flight 243 United Airlines takes off from New York at Kennedy Airport at 2:15 pm, lane 12 and arrives in Boston Logan at 3:00 pm; lane 47. Flight 260 United Airlines takes off from New York's Kennedy Airport at 15.45, lane 15 and arrives in Boston Logan at 4.30pm; lane 42. United Airlines Flight 52 leaves New York Kennedy Airport at 17.00, lane 38 and arrives in Boston Logan at 17.45; lane 31. Which flight do you choose?


Despite the fact that some of the flights may meet the requirements, the information is presented in such a disorder that it is able to completely disperse the attention and not allow the choice to be made. Now imagine an alternative:

(five)
System : There are flights at 13.00, 14.15, 15.45 and 17.00. What would you like?
User : What about the flight at 14.15?
System : United Airlines Flight 243 takes off from New York's Kennedy Airport at 2:15 pm, lane 12 and arrives in Boston Logan at 3:00 pm; lane 47. Want to book this flight?


In this case, the user is provided only with the information that he is really interested in, which greatly facilitates the decision. If your application includes complex information, you should pay attention to this when drafting the application requirements. You need to make sure that you understand the goals, priorities, and criteria for selecting potential users. In this case, you can accurately provide information and do not overload the users with the entire volume of data.

In some cases, attention scattering is inevitable. For example, when the user is behind the wheel, a situation may arise that will require a complete shift of attention to the road. A system designed for use in a car must take into account this interaction feature and be able to control the pace of dialogue. Users themselves may also be able to control the system, for example, with the " pause " / " resume " commands . Or the system itself must confirm the continuation of the dialogue so that the user can continue interaction with the system. ( NoteA: The question of VUI design for drivers is a separate area of ​​research that requires careful study. The examples given are only illustrations and are not the results of serious research.)

There are cases when it is necessary to submit the required information continuously. Imagine that a personal diary reads voice and email to you. If a new message was received at the time of reading, the application can report on its delivery by sound message and not interrupt the reading of current letters. The sound message “ New voice mail received ” will not lead to scattered attention.

Summarizing, we can say that the first step in the interaction with the user's attention is the need to understand user goals and priorities. After that, it becomes possible to create such interaction models in which the user is provided only relevant information for him. At the same time, the system can adapt to user needs and becomes capable of paying user attention to information and events that go beyond the system itself.

9. 4. Conclusion
In this chapter, we examined the main principles of mental tasks, such as mental efforts, stress on memory and attention. Like all other provisions, they of course must be applied with care and in a specific context. The next two chapters will be devoted to discussing expectations during a conversation. Chapter 10 discusses the wording options (“ what ” the system says while waiting), and chapter 11 discusses the sound options (“how” the system says while waiting).

Source: https://habr.com/ru/post/65537/


All Articles