AI News: Gorilla, Model Evaluation for Extreme Risks and More

3rd June 2023

Jun 03, 2023

Welcome to another edition of AI News.

The Gap Between Open and Proprietary LLMs

"The False Promise of Imitating Proprietary LLMs" by researchers from Berkeley suggests that open-source alternatives can't match proprietary models like ChatGPT in terms of capabilities, regardless of the high ratings received from crowd workers after imitation training. They suggest that focusing on enhancing base capabilities through scaling and pretraining could be more impactful than gathering more imitation data. Read more

Evaluating Factual Precision

"FactScore: Fine-grained atomic evaluation of factual precision in long form text generation" offers an automated way to assess the factual precision of language models. The authors observe that factual error rates are higher when generating content about rare entities or facts mentioned later in the text. Among LLMs, GPT-4 exhibits substantially higher factual accuracy than StableLM. Read more

Gorilla: API calls with LLMs

A collaborative effort between Berkeley and Microsoft has led to the creation of "Gorilla: Large Language Model Connected with Massive APIs". By fine-tuning from a LLaMA model, this Gorilla surpasses GPT-4 in writing API calls. Read more

Evaluating Extreme Risks

Model evaluation isn't just about performance—it's also crucial in managing extreme risks. A study led by DeepMind, "Model evaluation for extreme risks", sheds light on the evaluation of potential extreme dangers posed by LLMs. Read more

Chatbot Arena Leaderboard

In the chatbot domain, we have seen some interesting updates with PaLM-2 lagging behind models from OpenAI, Anthropic, and LMSYS ORG. However, there are some caveats (the PaLM-2 available through the API is likely not the latest version, and often underperforms due to refusals). Read more

GPT-4 Surpassing RL Algorithms

"SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning" suggests that GPT-4 can significantly outperform Reinforcement Learning (RL)-based approaches in open-ended games such as Crafter. Read more

Making Finetuning More Efficient

The article "QLoRA: Efficient Finetuning of Quantized LLMs" demonstrates how memory usage can be dramatically reduced during the finetuning of LLMs without sacrificing performance. Read more

Improving Factuality through Multiagent Debate

"Improving Factuality and Reasoning in Language Models through Multiagent Debate" explores an innovative method of enhancing factual validity and mathematical reasoning in LLMs by leveraging multiple LLM instances in a debate format. Read more

Hallucinations in LLMs

"How Language Model Hallucinations Can Snowball" investigates how early mistakes made by LLMs can lead to 'hallucination snowballing', causing a cascade of errors in longer responses. Read more

AlignScore: A New Metric for Factual Consistency

Lastly, "AlignScore: A metric for evaluating factual consistency" introduces a unified alignment function to evaluate the factual consistency of generated content. Read more

Quality Diversity through AI Feedback

CarperAI's latest piece, "Quality Diversity through AI Feedback", introduces a new approach to producing high-quality solutions across a design space. In particular, this paper proposes to use Language Models to assess quality and diversity. The approach is put to the test in generating poetry and movie reviews, offering interesting insights into AI-assisted creativity.

Data-constrained Language Models

As the era of big data continues to scale, the question arises: What happens when we run out of data? "Scaling Data-constrained Language Models" sets out to answer this by training over 400 models, fitting a new data-constrained scaling law that generalizes the Chinchilla scaling law for repeated data usage. The study offers intriguing insights into data usage efficiency and the return on investment in computational resources.

In Other News...

ChatGPT and the Legal Realm

An unusual case saw a lawyer apologizing for fake court citations from ChatGPT. CNN's report further validates the necessity for careful AI use and regulation in professional fields.

Tiny Corp Takes on NVIDIA

George Hotz's Tiny Corp is aiming to rival NVIDIA by leveraging the tinygrad stack. More details here.

Open, Active, and Responsible AGI Research

As voices for pausing AI research grow, Forbes reports on LAION petitioning governments to keep AGI research open, active, and responsible.

AI & Leadership

UK Prime Minister Rishi Sunak met with AI leaders from Anthropic, DeepMind, and OpenAI to discuss future developments. More here

Nvidia Joins the $1 Trillion Club

Bolstered by the booming AI demand, Nvidia has briefly joined the trillion-dollar valuation club.

Mitigating AI Risk

A single-sentence statement led by the Centre for AI Safety attracted many signatures in the AI research community. The statement is:
Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.
I signed the statement.

It is probably fair to say that there remains significant disagreement within the ML/AI community about the level of risk posed by AI.

Useful Resources

The safetensors library, a key component in AI safety, has had an external audit ordered by Hugging face, Stability, and Eleuther AI. It will soon be a default installation in the transformers library.

Microsoft's Andrej Karpathy gave an insightful overview of how GPT models are currently trained. Find the full video here

Book recommendation

This week, I recommend "The Structure of Scientific Revolutions" by Thomas S. Kuhn. Offering a novel perspective on scientific progress, this book is a must-read for anyone interested in the dynamics of scientific discovery.

Filtir - fact-checking AI outputs

Lastly, I'm working with colleagues on a project called Filtir with the goal of catching AI hallucinations. If you’re interested in finding out more, we’re on Discord here.

If you prefer video summaries, you can find a video version of the newsletter here:

Samuel Albanie