Millions of people around the world use music streaming services, that is, they listen to songs without downloading them to devices. Today this market has great potential. In the first half of 2016, the number of audio streams in the United States
doubled compared with 2015.
Moreover, by the end of 2015, the number of subscribers of music streaming services amounted to 68 million around the globe, and this figure continues to grow. Today, many popular companies operate in this market, starting with foreign Spotify, Pandora, 8tracks and ending with Russian Yandex.Music and Zvooq.
Why do users like streaming so much? Because it is convenient - no need to bother with physical media, no need to download music to your device on the device - all songs are literally within walking distance. But one of the main reasons for the popularity of streaming is musical recommendations.
')
/ photo Patrik Nygren CC
It seems that each of us is tired of the music that we “listened to the holes”, I want something new, because services such as, for example, Tidal and Apple Music, offer collections of songs that fit our musical tastes.
To create a playlist, companies use a huge amount of data processed by computer algorithms. Brian Whitman, senior research associate at Spotify and co-founder of Echo Nest, has “
taught computers to music” throughout his professional career.
He identifies four approaches to the analysis of music for the formulation of recommendations: the use of data on the popularity of the composition (the number of auditions and purchases of a song) and the opinions of critics, as well as text analysis and acoustic analysis. The first two types of analysis have one major drawback - they do not contribute to the promotion of music by little-known performers, therefore we will pay attention to the two remaining options.
Acoustic and text analysis
We can say that the history of Echo Nest began at the moment when Whitman, while still a student, created a program that analyzes music blogs using natural language processing technologies. Today, its algorithm has evolved and is constantly studying the web, browsing about 10 million pages related to music.
Any phrase that appears on the Internet and is related to music passes through Echo Nest systems looking for descriptors, keywords and related terms. In addition, each term has its own weight, which speaks of its importance (in fact, it represents the likelihood that someone will describe the song with this word). Lists of recommendations are formed by comparing the identified descriptors with the descriptors of the user's favorite songs.
As for the second method of forming recommendations - acoustic analysis, it is not applied by the service in its pure form. It is still impossible to talk about the qualitative recognition, for example, of musical instruments. However, despite this, signal analysis plays a very important role in the work of recommender algorithms. For example, people want playlists to be “smooth”: after a quiet and calm song, loud cannot go, and in playlists compiled for jogging, the pace should gradually increase.
The analysis of the song begins with the fact that the sound is broken into small pieces, ranging in size from 200 ms to 4 s, depending on how quickly the “drawing” of the song changes. Then for each segment the volume, timbre is determined, the musical instruments used are also identified; what part of the composition (chorus, couplet, etc.) is assigned to this segment.
Further, the information obtained is combined and analyzed using machine learning tools. This gives us the opportunity to understand the song at a “high level. After that, the composition receives special labels (energy, liveliness and others) that perform a descriptive function.
Thanks to the development of such powerful technologies, Echo Nest has managed to become the world leader in music analysis algorithms. For this reason, in 2014, it was bought by the Spotify music streaming titans. Spotify is the world leader in streaming music with 30 million paid subscribers. At the same time, the company receives thousands of enthusiastic reviews of its recommendation services.
The company owes its success to collaborative filtering. This approach allows you to predict user preferences based on their history of content consumption - likes, number of auditions, etc. - by comparing with other users. Thus, the algorithm identifies songs that are ideally suited to customers without human intervention.
The future of reference services
However, there are technologies that can bring recommendation services to a completely different level. Sander Dieleman, a researcher at Google DeepMind, co-authored an
article stating that neural networks and depth learning can handle audio recommendations far more effectively than collaborative filtering.
Dileman began to explore the possibilities of convolutional neural networks with 7–8 layers. In particular, he used the
t-SNE algorithm , which allows visualizing multidimensional data. The networks that Dileman coached learned how to identify musical instruments, chords, and even harmony and progression. The first layer of the network singled out 256 different filters, for example, “vibrato singing” and “bass drum”. Moreover, the network independently found and combined Chinese pop songs into playlists.
Dileman's solution worked well and, if successful, tests on a real system will be used in conjunction with data from other algorithms. However, streaming services do not stop only on the analysis of songs and personal musical preferences of customers.
About a year ago, Spotify announced its intention to collect data on the location, contacts and voice of users. Six months later, it became known about another innovation: Spotify
joined forces with Runkeeper, in order to use the physical data of customers for the selection of tracks that are ideally suited to their pace. A few years ago this would have seemed fantastic.
Perhaps in the future with the help of motion sensors in the phones, it will be possible to determine whether you are running, cycling or driving a car. A heart rate sensor will help determine the degree of your tension or excitement. In addition, much can be said about the physical state of users during sleep.
Preferences, pulse, movement, sleep. What will the recommendations of the future take into account - the weather, the level of dopamine in the blood? Existing technologies already seem incredible, but there are all the prerequisites for the fact that everything will become much more incredible.
PS Additional reading: Our IaaS digest - 30 materials on the applicability of cloud technologies.