AI News: OpenAI explores process supervision, GPT4 can navigate complex journeys, and more
5th June 2023
Welcome back to another roundup of AI news. We delve into significant strides made in mathematical reasoning, geographic knowledge, multilingual vision, fact-checking, and more.
1. Advancements in Mathematical Reasoning
OpenAI’s recent work, "Let's Verify Step by Step", finds that process supervision outperforms outcome supervision for training models on the challenging MATH dataset. The superiority of process supervision is consistent and extends to better generalization across topics, making it a promising technique for AI alignment. Why? Because it encourages models to follow a process endorsed by humans, likely resulting in more interpretable reasoning.
2. GPT-4’s Geographic Capabilities
GPT4Geo – studies the depth and breadth of GPT-4’s factual geographic knowledge. Qualitative experiments suggest that GPT-4 can navigate complex journeys across timezones, approximately reconstruct the Hong Kong mass transit railway map, and describe much of the global semiconductor supply chain.
3. PaLI-X: Google's New Multilingual Vision and Language Model
The "PaLI-X: On Scaling up a Multilingual Vision and Language Model" paper from Google demonstrates that scaling both vision and language components together increases performance across a wide range of tasks. With a high-capacity 22B parameter vision encoder, PaLI-X achieved state-of-the-art results on 15+ benchmarks.
4. AmbiFC: Fact-checking Ambiguous Claims with Real-world Evidence
Ambiguity poses a significant challenge in fact-checking, as highlighted by "AmbiFC: Fact-Checking Ambiguous Claims with Evidence". This study presents a large-scale fact-checking dataset containing 10,716 distinct claims with supporting evidence from Wikipedia. It also explores the use of annotator disagreement in combination with uncertainty labels to identify ambiguous cases.
5. The Impact of Positional Encoding on Length Generalization in Transformers
"The Impact of Positional Encoding on Length Generalization in Transformers" presents a surprising finding: No positional Encoding (NoPE) performs the best when considering downstream performance. The experiments are conducted on a relatively small scale, so it will be interesting to see how robustly these findings transfer to larger models.
6. Twitter's Algorithm: Amplifying Emotional Content
A controlled experiment from Cornell Tech and Berkeley revealed that Twitter's algorithm amplifies emotional content, particularly tweets expressing anger and out-group animosity. Although users generally prefer algorithm-selected tweets, the same can't be said when it comes to political content.
7. Large Language Models as Tool Makers
The study "Large Language Models as Tool Makers" proposes a closed-loop framework where Large Language Models (LLMs) create their own reusable tools for problem-solving. This could allow a less powerful model like GPT-3.5 Turbo to achieve similar performance to GPT-4 on a range of tasks while keeping inference costs manageable.
8. Can Language Models Identify Their Own Hallucinations?
Microsoft Research's study "Do Language Models Know When They're hallucinating references?" reveals that language models can identify many of their own fabrications or "hallucinations". By using consistency checks on powerful models like GPT-4, a significant improvement can be made in reducing hallucinations.
9. Generating Novel Scientific Directions with Contextualized Literature-Based Discovery
"Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery" looks at generating scientific hypotheses in natural language. The authors explore two formulations of contextual literature-based discovery and curate a dataset to study the task.
10. Japan's Perspective on AI and Copyright
Technomancers.ai reports that Japan has declared that copyright doesn't apply to AI training, a move that has far-reaching implications for AI advancement. The declaration encompasses both non-profit and commercial uses, irrespective of whether the content was obtained legally or illegally.
11. Falcon 40B: Royalty-Free and Top-Ranking
In a notable announcement, the UAE's Falcon 40B model, released by TII in Abu Dhabi, is now Royalty Free. The model has been garnering attention, outperforming several established models and earning the top spot on Hugging Face's leaderboard for large language models.
12. AI Risk
The Sunday Times provides an extensive outline of the threats from AI, touching on issues like unintended consequences from powerful models, chemical weapons, autonomous weaponry, and job displacement. Meanwhile, The Guardian reports that the US Air Force denied that a simulation in which a drone killed its operator to maximise reward has taken place.
13. Is AI an Existential Risk?
TIME's Katja Grace weighs in on the AI debate, arguing that AI is not an arms race and that no one party can ultimately win. She calls for coordination and collaboration to navigate potential threats. However, not everyone takes a similar perspective. Hodan Omaar equates preparing for an AI apocalypse to preparing for an alien invasion. Esteemed AI researcher Kyunghyun Cho, in an article on VentureBeat, argues for a critical view of AI’s impact on society. In an intriguing piece titled "Is Avoiding Extinction from AI Really an Urgent priority", Seth Lazar, Jeremy Howard, and Arvind Narayanan assert that technology-related risks often stem from those controlling it, rather than the tech itself. They call for a balanced dialogue where AI's potential risks are not the sole focus.
15. Resources
For those interested in deepening their knowledge, Sebastian Raschka has developed an online course, "Deep Learning Fundamentals", using PyTorch and Lightning.
OpenAI has launched a cybersecurity grant program to bolster defenses against AI threats.
16. Commentary: Surveillance Images in the Age of AI
In a thoughtful commentary, Tristan Dot discusses the new nature of surveillance images in The Scholar, a timely topic given the upcoming 2024 Olympic games and the rise of AI in security practices.
Book Recommendation
For this week's book recommendation, I’m picking Jessica Livingston's "Founders at Work". It offers an insightful glimpse into the early days of successful startups, and explains why it might be wise for startups to loan a yellow Ferarri…
Filtir - fact-checking AI outputs
Lastly, I'm working with colleagues on a project called Filtir with the goal of catching AI hallucinations. If you’re interested in finding out more, we’re on Discord here.
If you prefer video summaries, you can find a video version of the newsletter here: