12 Comments
User's avatar
Alexander Clinton's avatar

This writing is so good it makes me wish I wrote it myself.

Expand full comment
Jackie's avatar

Sitting here considering the ramifications of evaluation and defining generalist intelligence, it's a downstream effect I hadn't considered it yet. I need more coffee.

Fantastic post, I bookmarked almost every link - cheers!

Expand full comment
Neural Foundry's avatar

Brilliant breakdown of how the compute thesis keeps proving itself. The part about evaluation teams measuring the 'expanding coastline' while AIs get more general really captures the headache we're all feeling in 2025. I've spent way too many late nights debuging test suites that models solved in ways we didnt anticipate, so the 'wrong answer is quick, novel right answer is violence' comparison made me laugh out loud. The futarchy angle for Britain is interesting but dunno if prediction markets can outpace the regulatory tangle alone.

Expand full comment
Uzay's avatar

great stuff

Expand full comment
Adam Gidwitz's avatar

As a fellow lover of bewildered walks in University Parks and a writer who is related to AI people but not an AI person myself, I thank you for the writing and the clarity of thought. Here in the US, we have a somewhat better balance, and I fear protecting .0083 salmon—but I also fear killing 8.3 million salmon. I think we will need to use compute to help us identify where the line is, and then of course use our human hearts to choose our threshold for salmon death. (We don’t get to smoke and put on bagels the ones who die of What Is Love). I work with state governments and we are in the early stages of trying to figure out how to use this new compute power to rationalize our laws and regulations to allow for growth and salmon. I know Deloitte has already started a consulting practice based on this, which I trust as much as I trust the last sausage roll in the case at 4:55pm. I’ll keep visiting this space now that I’ve found it, and if you’ve got any recommendations on using compute for rationalizing governance, I would love to see them. Also, again, thanks for the prose.

Expand full comment
David's avatar

Great read! Makes me think about the future of evals.

As we shift toward longer-horizon, team-scale task evaluation, do traditional prompt-and-generation benchmarks still serve a purpose?

And related as task complexity grows, how do we avoid manual review becoming the limiting factor? We’d need either (1) AI-as-judge approaches that scale but risk measurement circularity, or (2) new evaluation methodologies I’m not seeing yet. Curious what your instinct is on this.

Expand full comment
Oliver Bruce's avatar

This is phenomenal writing. Thank you.

I too remain a hopeful believer in the everything theory of compute and hope that futarchy comes to rule - the ossification of Westminster democracies combined with stagnant growth is fraying the social contract. The challenge will be how to navigate the transition without disaffected hoards defaulting us back to Luddite stongmanism.

Expand full comment
Samuel Albanie's avatar

Thanks!

Expand full comment
Chris's avatar

Some wonderful lines.

With you I am disappointed with our recent run rate.

The white heat of Technology did not get Britain out of a rut and nor will AI.

A system so inflexible that a PM cannot override a decision on a football game in Birmingham is a temporarily broken system. It will be revived when sovereignty returns to Parliament.

Expand full comment
Samuel Albanie's avatar

Thanks, and touché on Wilson's white heat.

But I remain optimistic. I think we do our best work with the difficulty set to "hard".

To quote my favourite Henry V speech: "The fewer men, the greater share of honour. God's will! I pray thee, wish not one man more."

Expand full comment
Ben Schulz's avatar

I would only add that the compute is in service to the Platonic space.

Per Michael Levin (biologist), most recently taking up the mantle of a universal geometric latent ideal being utilized.

Expand full comment
Samuel Albanie's avatar

Thanks for reading, and for the Levin reference (I wasn't familiar, but had fun reading about his work).

Still, even if we take the view of the Demiurge and treat the tech tree as a pre-existing, perfect piece of foliage waiting to be uncovered, the "unreasonable effectiveness" of compute in performing the topiary (and at a somewhat predictable speed) is the part I spend the most shower thoughts on.

To quote the great philosopher of the ascent Miley Cyrus, "it's the climb".

Also, sorry it took a while to reply. I found this https://www.theguardian.com/music/2023/mar/02/miley-cyruss-20-greatest-songs-ranked article arguing that the climb is only Miley's 4th best contribution, and got somewhat distracted by the process of determining if I agree. To keep it reasonably fair, but reasonably efficient Θ(n log n), I sketched out a quick swiss-style tournament rather than a full round-robin. Tldr: 4th place is within the margin of error, though I have serious concerns about their ranking of Midnight Sky.

Expand full comment