09.12.2021

MTS AI developer becomes prize winner in Yandex Cup championship

Andrey Parkov is the senior developer of ASR team at MTS AI. He came in third in Yandex Cup competition with his solution to a speech recognition problem. Read the article to learn more about Andrey and his ideas.

Yandex Cup: third prize for speech recognition

Andrey Parkov is the senior developer of ASR team at MTS AI’s machine learning department. Since he was a kid, Andrey had a dream of making robots. In 1992, he graduated from the university with a degree in Robotics and a major in artificial intelligence. Then he took a job in IT and telecommunications, and returned to AI only 10 years later. He began to participate in new machine learning workshops on his own, solved problems and created smart systems as a hobby, and posted his work on GitHub. That’s where our team saw Andrey’s works, and this is how his hobby turned into a full-time job.

Yandex Cup and other competitions

Participation in competitions offers developers room for experiments, a chance to test the level of their knowledge and competencies and to try new methods and algorithms. Plus, more often than not, competitions also provide data for solving problems. Today, high-quality data is still in short supply, and getting a dataset for use is already a huge benefit of participating in competitions.

What is Yandex Cup? It is an open online championship for developers in six different tracks: front-end, back-end, mobile development, analytics, algorithm, and machine learning. In the machine learning track, participants were tasked to solve four problems in a variety of ML streams, including speech recognition, recommendation systems, computer vision and text analysis.

Voice activation problem

In the case study that Andrey was working on, the participants were expected to train a noise-resistant model to recognize a fixed set of key phrases. The organizers provided a set of “clean” key phrases: 38 words, each pronounced by around three thousand people, and a separate set with recordings of typical noises. In the test dataset, activation phrases were randomly mixed with noises, and the system had to determine what people were saying.

Andrey spent 15 evenings to solve this problem. To train the model, he decided to use a non-standard neural network resembling the human brain, in which one part is responsible for vision, another one for hearing, and the third one for conversation. The neural network had a similar architecture. One part was in charge of getting rid of noise from the audio: a noisy signal was fed to it at the input, and it was trying to remove the noise at the output. Since its performance was not completely accurate, the clean signal was mixed with the noisy signal, and the mix was fed into the next grid that tried to recognize the word based on spectrograms – this is one branch of the model. The second branch was a little smarter: it identified letters first, and then determined words based on the letters. In the end, the deliverables were put together to obtain one specific solution.

There was not enough data to train the neural network well, so Andrey applied the training method with unlabeled data. He tried to further train the system and improve its quality using test data iteratively, and the algorithm worked.

How did his rivals work?

Andrey took the third prize in the competition. The first and second prizes went to teams that used a pre-trained neural network that was proficient in image recognition. They retrained the network using audio spectrograms and won by showing a more accurate result. The systems developed by the gold and silver prize winners demonstrated 96% and 95% accuracy, accordingly. Andrey’s algorithm showed 92% accuracy. The percentage determines the accuracy of word recognition by the system. The organizers did not use other assessment metrics, despite the difference in approaches to solving the problem.

Competitions as a way to run into unorthodox solutions

It often happens that companies turn to such competitions to find an unorthodox solution that shows how to solve standard problems in a new way. Another goal is to build a pool of talented developers who think outside the box. In any case, such competitions help developers perfect their skills, look at the solutions of their rivals, learn from new experiences and share them with the community. This is exactly what Andrey did: a demo version of the system he developed can be found on YouTube, and the technical framework of the project is available on GitHub.

Demo: recognize a fixed set of keywords in a noisy video stream

Practical use

The market for smart devices with voice assistants is growing every year. According to Just AI, it is now estimated at 14 billion RUB. Their forecast says that 2.9 million smart speakers, screens and TV boxes will be sold in Russia in 2021. The demand for technologies capable of working with noisy data will only grow and overcome existing limitations, including quantitative (speed, memory and data volume) and qualitative (maturity of machine learning systems). The use of quantum computers and work with probabilities, not numbers, is seen as the next qualitative leap in the industry’s development.

News