28.09.2023

Data compression and AI bot evolution

Many things have happened in the past two weeks while you were waiting for the new issue of #InfocusAI. DeepMind has investigated how efficient LLMs are in data compression, Microsoft has developed an MLLM for machine reading of text-heavy images, and MIT has figured out how to make it easier to personalise models for 3D printing. Also, Google, OpenAI and Amazon have revealed updates to their bots, China has made some noise about new technology for semiconductor chip manufacturing, and Russia has found out how businesses are using AI. And now we’ll tell you about this.

AI-focused digest – news from the AI world

Issue 26, September 14-28, 2023

DeepMind experts uncovered LLM’s impressive abilities in data compression

DeepMind suggests taking a closer look at large language models as powerful data compressors, VentureBeat reports. It has long been established that predictive models can be transformed into lossless compression algorithms and vice versa, and since LLMs have impressive predictive capabilities, this power extends to them even further. DeepMind researchers prove this through a series of experiments with the Chinchilla 70B model. No wonder that the model mostly trained on text is perfect in compressing text data – up to 8.3% of the original size. What’s truly impressive is that in compressing images and audio, it outperformed PNG and FLAC, which were initially designed for this. In particular, Chinchilla 70B compresses ImageNet (annotated image database) patches to 43.3%, while PNG’s result is 58.5%, and as for audio compression (on the LibriSpeech samples), Chinchilla’s result is 16.4% against 30.3% of FLAC’s. More numbers and comments you will find in this research paper.

Microsoft developed an MLLM for machine reading of text-intensive images

Researchers from Microsoft present Kosmos-2.5, their multimodal large language model (MLLM) for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, the model, according to the developers, successfully tackles such tasks as generating spatially-aware text blocks via assigning spatial coordinates to blocks within the images (document recognition) and text generation via image-to-markdown. They also note that Kosmos-2.5 can be fine-tuned for any other text-intensive image understanding task. Read more details here.

MIT figured out how to personalise models for 3D printers without sacrificing functionality

Scientists at the Massachusetts Institute of Technology found a better way to pair aesthetics with functionality when designing personalised 3D-printable objects, MIT News reports. Modern technology allows almost anyone to design and fabricate any object, but its purpose is often overlooked. For example, in pursuit of beauty and originality, the base of a vase can be changed so that it will constantly fall. Style2Fab, the tool proposed by MIT, helps avoid such mistakes. It is based on algorithms of generative artificial intelligence, which allow for automatic partition of 3D models into functional and aesthetic parts. This enables users to change the design of their objects as they see fit, but in a way that still fulfils their functionality. The hardest part of this work is classifying the object’s segments by their effect on functionality. This preprint describes how the scientists solved this problem.

Google, OpenAI and Amazon announced significant upgrades of their AI assistants

Google, OpenAI and Amazon have revealed what new features are coming or have just arrived for their brainchildren — Bard, ChatGPT and Alexa respectively. Let’s start with Google. Bard can now use information from other Google services — Gmail, Docs, Drive, Google Maps, YouTube, etc — if you install appropriate extensions. Moreover, users now can continue conversations that were started with the bot by other users – for example ask additional questions (you need to have a link to the conversation in the public domain). But the most valuable feature, perhaps, is double-checking answers. If the bot’s statement is verifiable, you just need to click Google it to get information from Google Search and see if the statement contradicts the additional sources found. OpenAI focused on the voice and graphics capabilities of ChatGPT. In a couple of weeks, Plus and Enterprise users will be able to communicate with ChatGPT by voice and show pictures while doing so. For instance, you could show the bot a photo from a trip and discuss what’s notable about it with voice, or you could take a picture of your fridge and request voice instructions for a step-by-step recipe for dinner from what you have. Meanwhile, Alexa has loads of updates (expected before the end of the year) aimed at creating a more accessible and safe home environment and making creativity easier. Some of the most interesting things: you’ll be able to control pre-set actions with your gaze (a feature essential for people with speech and mobility limitations), get calls translated from a foreign language in real time, and use only your voice and imagination to create AI-generated images on Fire TV. Learn more on the corresponding blogs at the links in the beginning of this post.

China is working through a new technology for semiconductor chip production

There has been a great deal of discussion in China’s media about a new technology based on particle acceleration as a photon source for photolithography used in semiconductor chip production. The South China Morning Post, citing scientists, enthusiastically reports that it’s a breakthrough that will allow China to bypass the US sanctions and soon become a leader in semiconductor chip production. It is also noted that the negotiations are already under way to build a huge particle accelerator so as to put the advanced technology into practice. However, not everyone shares such enthusiasm. In particular, Wang Jie (汪诘), a very influential science blogger in China, argues that this technology, which was discussed as early as 2010, is now at the verification stage, and establishing the production of lithographic machines is a very complex and science-intensive project. By his estimate, this will take at least 13 years of effort. He grounds his opinion in this article (in Chinese).

65% organisations in Russia use AI in test mode

HSE Institute for Statistical Studies and Economics of Knowledge published the results of monitoring the spread and development of AI in Russia. Inspection of over 2.3 thousand businesses in 36 regions of the Russian Federation has revealed that approximately 65% of them use AI in test mode, verifying whether this technology is really of significant benefit to business. In most cases, AI applications are combined with other digital solutions: various industrial software, communication services for marketing and customer interaction, etc. The most demanded products among Russian companies are those based on computer vision and speech synthesis.

News