10.10.2025

MWS AI Releases the Russian Language VLM Benchmark

MWS AI has released MWS Vision Bench, an open benchmark that evaluates vision language model capabilities in handling multimodal content in the Russian language.

MWS Vision Bench became the first benchmark focused on evaluating vision language models in real business scenarios where they have to handle mixed content in the Russian language. The new tool allows testing generative AI’s abilities to recognize and understand documents containing both text and visual data.

Modern models can analyze contracts, invoices, forms, schematics, and spreadsheets. However, all existing international benchmarks, such as OCRBench, AI2D, and MMMU, focus only on English and Chinese, and do not allow to objectively assess the models in solving practical business tasks in Russian.

MWS Vision Bench offers a comprehensive set of tasks to evaluate models’ document processing capabilities such as reading text from images, understanding document structure, extracting necessary information, recognizing element placement, and answering complex content-related questions.

This dataset includes 800 images and 2,580 tasks that mirror real-world document handling scenarios in Russian organizations. It covers office and personal documents, schematics, handwritten notes, spreadsheets, drawings, diagrams, and graphs. All images are fully depersonalized. For ease of use, the original dataset has been randomly divided into two parts: a validation subset with 400 images and 1,302 tasks, and a test subset with 400 images and 1,278 tasks. The validation subset is available under open access.

‘Right now, there are lots of AI models, but very few tools that can evaluate their suitability for practical business applications. This makes it more challenging to compare results and choose right solutions for business tasks. So, it is very hard to say which model is better in analyzing documents, retrieving data or automating client support. For companies that operate in Russian, it is particularly important to have an objective comparison tool that takes language and business document particulars into account,” said Denis Filippov, CEO, MWS AI.

The benchmark’s code is published on GitHub, the dataset is available via the Hugging Face platform. Companies can test both proprietary and third party models there. As of now, Gemini 2.5 Pro, Claude Sonnet 4.5 and ChatGPT-4.1 mini got the best results.