MTS AI developed a new version of the large language model Cotype Lite to work with texts in the Tatar language. The company showcased the new version of its large language model Cotype at the Kazan Digital Week forum, which took place in the capital of Tatarstan from September 9 to 11. The LLM can process documents up to 8,000 tokens (approximately 5 A4 pages) in length and can extract and summarize data within seconds.
Cotype Lite can be used in archives, libraries, and both government and private organizations — wherever there is a need for information processing and document analysis in Tatar. For example, with the help of the large language model, the processing of applications in government agencies can be accelerated. Cotype will extract key information such as the subject of the request, location, and personal data of the applicant, and transfer this information into the relevant database. Like other models in the Cotype family, this version can be installed within an organization’s infrastructure, ensuring there are no data leaks.
«In creating a large language model in Tatar, the developers at MTS AI had several goals. Firstly, we wanted to support the linguistic diversity existing in Russia, helping them to develop and remain relevant in the digital age. Secondly, this project demonstrated that we are capable of adapting our models to any scientific and business tasks, including such non-trivial ones as processing information in the languages of the peoples of Russia,» said Dmitry Markov, Executive Director of MTS AI.
To enable Cotype Lite to understand an unfamiliar language, the developers compiled a dataset and translated it from Russian to Tatar. All the data and the model’s responses were then checked by experts in Turkic languages and native speakers.
According to the developers, Cotype Lite ranks among the best LLMs in its class: it contains 8 billion parameters. If needed, MTS AI can create an LLM in Tatar with a larger number of parameters—up to 70 billion parameters, as well as a larger context window of up to 32,000 tokens—allowing the model to perform tasks such as translation and long text generation. Additionally, MTS AI is ready to adapt the Cotype family models for other regional languages of Russia.