This issue of #InFocusAI unveils how AI helps to learn the dolphin language and interpret cellular data for the needs of biology and medicine. We will also discuss Anthropic’s research on expressing values in AI-human dialogues, MIT’s new LLM detox method and a short-term forecast for the development of AI technologies.
AI-focused digest – News from the AI world
Issue 64, April 10 – 24, 2025
Google and Wild Dolphin Project learn dolphin language with AI
Last week, Google and researchers at the Wild Dolphin Project (WDP) presented DolphinGemma, an AI model for learning the structure of dolphin vocalizations and imitating sounds that dolphins make. DolphinGemma has been trained on WDP’s extensive acoustic database of wild Atlantic spotted dolphins. It processes sequences of naturally produced dolphin sounds to identify patterns and structures and to ultimately predict probable subsequent sounds, much like large human language models predict the next word or token in a sentence. Scientists hope the new tool will help them more quickly understand the structure and potential meaning of natural sound sequences in these marine mammals and to find signals that may prove they “speak” a language. This is AI’s contribution to interspecies communication. More details about the project are available on Google’s blog.
Anthropic explores how AI expresses values in dialogues with humans
Researchers at Anthropic have recently released a paper outlining their approach to studying how artificial intelligence expresses values in real-world conversations with users. They analyzed over 700,000 anonymized conversations between humans and the Claude model and identified 3,307 values that their model could display and read from interlocutors. All these values are systematized into five categories, namely practical, epistemic, social, protective and personal values. The most interesting finding was that the values expressed by the model changed depending on the dialogue context. For example, when Claude is asked for relationship advice, it prioritizes values such as healthy boundaries and mutual respect, while in conversations about history it deals with factual accuracy. To learn more about Anthropic’s taxonomy of AI values and other insights, read the article.
AI development forecast until 2027
Researchers from several influential AI institutes have recently published a short-term forecast for the development of artificial intelligence technologies until 2027, inclusive. The prediction is based on current scientific achievements, AI-related trends and expert assessments. As soon as in 2025, AI agents for coding and research are said to start transforming professions; by the beginning of 2027 they are to embark on the path of continuous learning and reach the level of top human experts, while somewhere in 3Q 2027 we will begin talking about artificial superintelligence (ASI). We do not think everyone agrees with this forecast…
MIT develops novel method of LLM self-detoxification
MIT and IBM Research have figured out how to prevent a model from generating harmful outputs without resorting to additional reward models or repeated learning. They called their LLM self-detoxification method “self-disciplined autoregressive sampling” or SASA. In a nutshell,
- The decoding algorithm learns the boundary between toxic and non-toxic subspaces within its own internal LLM understanding.
- Then it assesses toxicity of the partially generated phrase, i.e., generated and accepted tokens along with each potential new token.
- After that it selects a word option that places the phrase in a non-toxic space.
Testing was performed on Llama-3.1-Instruct (8B), Llama-2 (7B) and GPT2-L* using RealToxicityPrompts, BOLD and AttaQ tests. SASA significantly improves the quality of generated sentences and is as productive as other detoxification methods. More details are available in the article.
*Llama is a large language model released by Meta. The organization is considered extremist and banned in the Russian Federation.
New Google models to help interpret cell data
And finally, one more piece of news from Google. In collaboration with several universities, the company has developed a new family of large language models, Cell2Sentence-Scale (or C2S-Scale for short), that can interpret cell data and understand biology at the cellular level. C2S-Scale transforms complex cell data into simple readable “cell sentences” so that researchers can ask the model questions about specific cells, such as “Is this cell cancerous?” or “How will the cell respond to drug X?”, and receive clear biologically substantiated answers in natural language. The new models are expected to boost drug development, help researchers better understand and prevent diseases and contribute to the democratization of science. More details can be found in the Google Research blog.