📜 ⬆️ ⬇️

Speech recognition with hardware acceleration. Specialized ASIC consumes less than 8 mW


Specifications of ASIC specialized speech recognition chip

Voice commands are the most natural and convenient interface for controlling electronics. One can imagine that in the future, the host team will understand almost all electronic devices: from light bulbs in an apartment to a refrigerator, a microwave oven and a kettle in the kitchen. Connected to the general network of the Internet of Things, these devices will not only understand the owner, but also coordinate their actions with each other.

In recent years, speech recognition technologies have reached a high level and have matured for various commercial applications: automobile computer management, healthcare (maintaining digital documentation on the recognition of doctors' speech) and use in the army. For example, in the Italian training aircraft M-346 and in the American F-35 bomber the accuracy of speech recognition systems reaches 98% . But to perform speech recognition on household appliances and wearable electronics, you need to drastically reduce the power consumption of this interface.

Engineers from the Laboratory of Informatics and Artificial Intelligence (CSAIL) at the Massachusetts Institute of Technology (MIT) have already begun preparing for this futuristic picture, when all the surrounding electronics begin to understand the voice of man. As part of a joint project with Qmulus , Quanta Computer, researchers from MIT have developed a prototype of a specialized integrated circuit (ASIC) for speech recognition. A unique feature of this chip is ultra-low power consumption: just from 0.2 mW to 10 mW, depending on the number of words that need to be recognized. This makes it possible to use such electronics in virtually any device, even with power from the human body.
')
Normal metabolism in the body of an adult male produces about 80 watts of heat, and a trained cyclist produces up to 400 watts of mechanical energy. Of course, such power cannot be used to power the electronics to the full, but you don’t need much. A few watts are easily removed from the human body in the passive mode. For example, a small bracelet with a length of 10 cm on the wrist continuously generates about 40 mW due to the difference in temperature of the human body (approximately 37 ° C) and ambient air (20 ° C).



If you don’t wear a bracelet, but a whole thermo bag or an orset with a width of 50-100 cm, then it will remove about 2 watts from the body. But you can still convert the kinetic energy of motion and break down sugar from the blood. This is enough to feed the wearable electronics, clothing and the most simple gadgets.

In addition to the human body, low-power electronic devices can extract energy, for example, from background radio emissions (microwave, radio, WiFi, etc.), from window and floor vibrations, etc.

An ordinary average smartphone can hardly work on energy collected from the human body or from the ether. According to the developers, the speech recognition program on the smartphone on the usual mobile "hardware" will pull about 1 watt. This is a lot. The use of MIT and Quanta Computer specialized microcircuits in real conditions means an energy saving of 90-99%. Most importantly, such a low-power device dramatically expands the scope of speech recognition. Now it can be embedded not only in smartphones or expensive electronic devices, but in the most banal surrounding objects, even in the bathroom mirror.

If you collect energy from the environment, then such a device will never need to replace batteries. If you still supply it with a battery for reliability, then one charge will last for months or years.

The joint project Qmulus at MIT and Quanta Computer started back in 2005, then it was called the T-Party. The developers assume that with the spread of the Internet of Things, computer chips will be introduced into various objects, even in domestic animals and cattle - to monitor livestock and monitor their condition. Microchips carry out a permanent collection of information and send it to the central server in real time.

Perhaps speech-recognition chips can be strapped into pet collars — for example, a voice command can feed a weak electrical pulse into a collar, encouraging a pet to perform a particular action. However, pets and without a microchip are very well understood by the voice commands of the host, so such an invention is more useful in other areas.

“Voice commands will be the natural interface for wearable and smart devices,” says Anantha Chandrakasan, an electrical engineering professor at MIT, whose group has developed a new microchip. - The miniaturization of such devices will require a different interface than the keyboard. It is critical to embed speech recognition functionality at the local level, reducing the system’s power consumption compared to performing this operation in the cloud. ”

Designed by ASIC, the recognition accuracy is about the same as the commercial software of Kaldi with a dictionary of 145 thousand words, and at 80 MHz the chip performance (word search speed in the vocabulary grid) roughly corresponds to the performance of a computer with a Xeon processor and a clock frequency of 3, 7 GHz.



The quality of speech recognition (WER) and ASIC power consumption are shown in the table.
TaskVocabularyFrequencyMemory exchangeWERpower usage
Numberseleven3 MHz0.11 MB / s1.65%172 µW
Weather2k23 MHz10.1 MB / s4.38%4.70 mW
Food diary7k46 MHz9.02 MB / s8.57%4.67 mW
News (1)5k15 MHz4.84 MB / s3.12%1.78 mW
News (2)145k40 MHz15.0 MB / s8.78%7.78 mW
The scientific article "A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating" with a description of the microchip was presented last week at the International Solid-State Circuits Conference ( presentation, pdf ).

Source: https://habr.com/ru/post/401503/


All Articles