16.11.2023

AI Success in the Turing Test and Weather Forecasting

In this issue of #InfocusAI we’ll discuss why scientists are in no rush to announce that GPT-4 has successfully passed the Turing test, whether AI possesses emotional intelligence and which models hold primacy in solving computer vision tasks. You will also learn about an LLM from China capable of summarising long texts and Google DeepMind’s ground-breaking AI-tool for weather forecasting.

AI-focused digest – News from the AI world

Issue 29, October 26 – November 16, 2023

GPT-4 passed the Turing test with a score of 41% but researchers are hesitant to give it the crown

Researchers at the University of California in San Diego subjected GPT-4 to the Turing test and found that the world’s most advanced AI model is able to successfully pretend to be human 41% of the time. This is an impressive result, considering that the best score of the previous version – GPT-3.5 – was 14%, and 30% is considered a “passing score” for the test. However, the researchers are not sure that 30% is an appropriate criterion for measuring success. 50% seems a more conventional benchmark – then we could say that interrogators on average are unable to distinguish an AI from a human. Still, even reaching the 50% mark can be interpreted as a fluke. Based on the results of their experiments, the caveats above and some limitations of the study, the scientists carefully state that they have found no evidence of GPT-4 passing the Turing test. It didn’t cross either the level of randomness or the baseline of 63% set by human participants, but it seems achievable. Another important conclusion of that experiment is that intelligence alone is not enough to pass the Turing test, it also needs emotional intelligence. This follows from the fact that participants’ judgements in determining whether a human or a machine was interacting with them were based mainly on the language style and social and emotional characteristics of the subjects’ responses. Read the details in this article. There is another interesting study on the topic, claiming that LLMs like GPT-4 have a grasp of emotional intelligence, and emotional stimuli in prompts can positively influence their performance.

Researchers from the US tested the abilities of different types of pre-trained models in solving CV tasks

Now developers can more easily choose from the plethora of available types of pre-trained models for various computer vision tasks. Researchers from several US universities and Meta AI Research (Meta is recognised as extremist and banned in the Russian Federation) conducted a large-scale study, comparing a whole set of pre-trained models for CV solutions. All of them were tested for their ability to handle such computer vision tasks as classification, object detection, Out-of-Distribution classification and object detection, and image retrieval. The researchers’ key findings evaluating the models studied are outlined in the article Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks. Also, the researchers released the raw data and code which developers can use to test their models on GitHub. The major finding of this large-scale research is: while visual transformers and self-supervised learning gain more popularity, the best performance on most computer vision tasks is still demonstrated by convolutional neural networks pre-trained in a supervised manner on large training datasets.

Chinese LLM outperformed the Anthropic model in summarising long texts

Chinese AI start-up Baichuan has launched a new version of its large language model capable of processing documents and entire novels as large as 350,000 Chinese characters. It has a bigger context window and it surpasses Anthropic’s Claude 2, particularly, in the quality of responses, as well as understanding and summarising long texts, according to The South China Morning Post, citing the development company. It seems that Chinese AI is almost catching up to the US AI.

Google DeepMind’s AI model breaks records for speed and accuracy in weather forecasting

Recently, the Google DeepMind team presented its AI model for weather forecasting that works with unprecedented accuracy and speed. This model is called GraphCast. The quality of its medium-range forecasts overtakes the so-called “gold standard” – the HRES system of the European Centre for Medium-Term Weather Forecasts (ECMWF). GraphCast’s efficiency and speed are also unrivalled: it takes less than a minute to make a 10-day forecast and it only needs one Google TPU v4. In comparison: HRES takes several hours of supercomputer calculations to do this. GraphCast can also forecast extreme weather conditions that could pose a danger to people. Other features and skills of the new model from Google DeepMind are detailed in the Science journal and on the company’s blog.

CB Insights published The GenAI Bible

If you’re not too tired from long reads yet, here’s another one. Analysts at CB Insights have prepared a 120-page document titled “The Generative AI Bible”. It briefly describes the current landscape of the generative AI industry, assesses the activities of leading technology companies in this field, and describes the potential applications of generative technology in medicine, financial sector and retail. You’ll also find a list of 50 GenAI start-ups to keep an eye on and a list of trends worth keeping in mind. You can download the document by filling out the form at this link.

News