Sleeper Agents and New Abilities of GPTs

fgfg Picture

In this issue of #InfocusAI you’ll learn about whether scientists have been able to defeat “sleeper agents” inside LLMs, how GPTs can control neural activity in the human brain, and what else has been invented for large model self-learning. Also: how the aviation experience has been useful for regulating AI in healthcare and which industries benefit the most from computer vision.  

AI-focused digest – News from the AI world

Issue 34, January 11-25, 2024

“Sleeper agents” in LLMs persist through standard safety training

Experts at Anthropic and several other research centres have run a series of experiments with the introduction of “sleeper agents” (malicious functions that can be embedded in AI models and activated by a special command) and came to a disappointing conclusion: such agents can be difficult to detect and even more difficult to neutralise. At least, the standard methods for this may not work. First, the scientists “implanted” a harmful behaviour into an LLM – to write secure code if the current year in the prompt is 2023 or earlier, and vulnerable code – if 2024 and later. Then, they tried to make the model unlearn this by applying various methods: supervised fine-tuning, reinforcement learning and adversarial training. In certain cases, the “sleeper agents” have shown an impressive persistence against neutralisation. Moreover, the researchers found that the model can learn to effectively mask such malicious behaviour and create the illusion of safety. The consequences of that could be unimaginably dire, so developers will have to find new approaches to protect AI from malicious users. Read the Anthropic study at this link

Human brains can be influenced using GPTs

It seems there has been another breakthrough at the nexus of AI and neurobiology. An interdisciplinary team of scientists has shown that generative transformers (GPT) can be used not only to create natural language texts, but also to predict and even control reactions in the parts of the human brain responsible for speech. The findings were published in Nature Human Behaviour (subscription access). The essence is this: scientists created a GPT-based encoding model that successfully predicted the responses of the brain’s language network to particular sentences. The same model was equally successful in selecting suggestions to produce the desired responses – suppressing or conversely stimulating brain activity. The conclusion is that using neural network models it is possible to non-invasively influence neural activity in the language network of the human brain. 

LLMs will be able to learn without humans

Researchers at Meta (recognized as extremist and banned in the Russian Federation) and New York University developed a new technique that allows large language models to be trained without humans. Right now, LLM training is usually based on reward models  built on people’s responses, and this has its limitations. In particular, people can’t answer every possible question. Besides, those reward models are kind of “frozen” and can’t self-improve during LLM training. The scientists’ idea is simple: teach large language models to generate rewards for themselves during training and improve in it from generation to generation. It’s a sort of training for new generations of LLMs on the basis of previous ones. To put the idea into practice, the researchers developed a new iterative learning methodology. After trying it out on the Llama 2 70B, in three iterations they got a model that outperformed Claude 2, Gemini Pro and GPT-4 0613. Read more about the approach and its testing in the article.

MIT: computer vision will become more beneficial by 2030

According to MIT researchers, in the US, so far only 3% of approximately 1,000 visually-assisted tasks can be automated cost-effectively using computer vision technology. However, this number could rise to 40% by 2030 if the cost of collecting, storing and processing data falls and its quality improves, as Bloomberg writes. According to the agency citing the research, right now the implementation of computer vision is most cost-effective and beneficial in segments such as retail, transportation and warehousing. The list also includes healthcare. Here’s the link to the 45-page MIT paper with explanations. 

Aviation experience turned out to be helpful for AI regulation in healthcare

An international team of scientists has found inspiration in aviation for developing regulations for AI in healthcare. They discuss their thoughts on how the experience of the aviation industry, which has evolved from an extremely dangerous industry to one of the safest, can be used to mitigate the risks of artificial intelligence in healthcare in a research paper titled Taking Off with AI: Lessons from Aviation for Healthcare. In particular, they were able to derive three useful lessons that are essential for improving medical AI. Interested? Then you’ll have to read the article. 

Latest Articles
See more
Media about MTS AI
MTS AI Unveils New LLM Specifically for Business Use
Team news
MTS AI employee joins AI Alliance Science Council
AI Trends
Biothreat Protection and LLM under the Dragon Sign
AI Trends
Sleeper Agents and New Abilities of GPTs
AI Trends
LLM Jailbreak and AI for Training Robots
AI Trends
Word of the Year and Math Discoveries of LLM
AI Trends
AI Governance and OpenAI Competitors’ Alliance
AI Trends
Doc Producers vs GenAI and LLM Harmlessness Test
Team news
MTS AI Engineer Wins Big at AI Journey Conference Competitions
AI Trends
AI Success in the Turing Test and Weather Forecasting