MTS AI developer becomes prize winner in Yandex Cup championship

Picture

Andrey Parkov is the senior developer of ASR team at MTS AI. He came in third in Yandex Cup competition with his solution to a speech recognition problem. Read the article to learn more about Andrey and his ideas.

Yandex Cup: third prize for speech recognition

Andrey Parkov is the senior developer of ASR team at MTS AI’s machine learning department. Since he was a kid, Andrey had a dream of making robots. In 1992, he graduated from the university with a degree in Robotics and a major in artificial intelligence. Then he took a job in IT and telecommunications, and returned to AI only 10 years later. He began to participate in new machine learning workshops on his own, solved problems and created smart systems as a hobby, and posted his work on GitHub. That’s where our team saw Andrey’s works, and this is how his hobby turned into a full-time job.

Yandex Cup and other competitions

Participation in competitions offers developers room for experiments, a chance to test the level of their knowledge and competencies and to try new methods and algorithms. Plus, more often than not, competitions also provide data for solving problems. Today, high-quality data is still in short supply, and getting a dataset for use is already a huge benefit of participating in competitions.

What is Yandex Cup? It is an open online championship for developers in six different tracks: front-end, back-end, mobile development, analytics, algorithm, and machine learning. In the machine learning track, participants were tasked to solve four problems in a variety of ML streams, including speech recognition, recommendation systems, computer vision and text analysis.

Voice activation problem

In the case study that Andrey was working on, the participants were expected to train a noise-resistant model to recognize a fixed set of key phrases. The organizers provided a set of “clean” key phrases: 38 words, each pronounced by around three thousand people, and a separate set with recordings of typical noises. In the test dataset, activation phrases were randomly mixed with noises, and the system had to determine what people were saying.

Andrey spent 15 evenings to solve this problem. To train the model, he decided to use a non-standard neural network resembling the human brain, in which one part is responsible for vision, another one for hearing, and the third one for conversation. The neural network had a similar architecture. One part was in charge of getting rid of noise from the audio: a noisy signal was fed to it at the input, and it was trying to remove the noise at the output. Since its performance was not completely accurate, the clean signal was mixed with the noisy signal, and the mix was fed into the next grid that tried to recognize the word based on spectrograms – this is one branch of the model. The second branch was a little smarter: it identified letters first, and then determined words based on the letters. In the end, the deliverables were put together to obtain one specific solution.

There was not enough data to train the neural network well, so Andrey applied the training method with unlabeled data. He tried to further train the system and improve its quality using test data iteratively, and the algorithm worked.

How did his rivals work?

Andrey took the third prize in the competition. The first and second prizes went to teams that used a pre-trained neural network that was proficient in image recognition. They retrained the network using audio spectrograms and won by showing a more accurate result. The systems developed by the gold and silver prize winners demonstrated 96% and 95% accuracy, accordingly. Andrey’s algorithm showed 92% accuracy. The percentage determines the accuracy of word recognition by the system. The organizers did not use other assessment metrics, despite the difference in approaches to solving the problem.

Competitions as a way to run into unorthodox solutions

It often happens that companies turn to such competitions to find an unorthodox solution that shows how to solve standard problems in a new way. Another goal is to build a pool of talented developers who think outside the box. In any case, such competitions help developers perfect their skills, look at the solutions of their rivals, learn from new experiences and share them with the community. This is exactly what Andrey did: a demo version of the system he developed can be found on YouTube, and the technical framework of the project is available on GitHub.

Demo: recognize a fixed set of keywords in a noisy video stream

Practical use

The market for smart devices with voice assistants is growing every year. According to Just AI, it is now estimated at 14 billion RUB. Their forecast says that 2.9 million smart speakers, screens and TV boxes will be sold in Russia in 2021. The demand for technologies capable of working with noisy data will only grow and overcome existing limitations, including quantitative (speed, memory and data volume) and qualitative (maturity of machine learning systems). The use of quantum computers and work with probabilities, not numbers, is seen as the next qualitative leap in the industry’s development.

News
Latest Articles
See more

Investment

Media about MTS AI

Solutions

Cases

Partnership

AI Trends

Team news

Events

Tech

AI Trends
Crafty Negotiator, Decent Diagnostician and Mediocre Cook
Cases
MTS AI Helps Launch MTS Video Surveillance for Business
Cases
Using Audiogram to Develop AI Operator for MTS Call Center
Cases
MTS AI Adds Spam Calls Transcript Feature to MTS service
AI Trends
Light Waves for ML Computing and Robotic Cockroach Exterminator
In The Focus Of AI
Understanding Vertebrate Evolution Helps Robot Engineers; Voice-Based Methods Enable Medical Diagnostics
Cases
MTS AI Trains Artificial Intelligence to Improve Video Quality and Skip Movie Credits for KION
AI Trends
AI Saves Animals and Helps Understand Medical Texts
AI Trends
Patrol Robot and AI-Enabled Architectural Masterpiece Design Solution
Events
RuPAWS Dataset Introduced at LREC 2022 Conference