Dario just released his new essay on the risks of AI. His thinking and writing are clear, compelling, and balanced. Since I broadly agree with his points and could not articulate them more clearly than he has, I will instead cover some important gaps and offer some perspective on how labs could simultaneously limit labor disruption while improving alignment. My points build on each other and culminate in some solutions that feel like a hopeful path forward.
1. When Capability Becomes Liability
Dario discusses the makeup of individuals who can cause large-scale destruction as those who have both capability and motive, pointing out that these traits are typically negatively correlated:
"The kind of person who has the ability to release a plague is probably highly educated: likely a PhD in molecular biology, and a particularly resourceful one, with a promising career, a stable and disciplined personality, and a lot to lose."
He is concerned that the widespread distribution of AI:
"...will break the correlation between ability and motive: the disturbed loner who wants to kill people but lacks the discipline or skill to do so will now be elevated to the capability level of the PhD virologist, who is unlikely to have this motivation."
However, the other side of the coin may present more risk, namely the risk from currently high-capability people who become dispossessed by the AI era. There will be tremendous numbers of people who have worked for years or decades on the skills they use to support themselves. They will have to watch AI surpass their abilities and perhaps take their employment.
What will happen to those people? Is there a place for those people in this new society?
This question leads directly to the next issue—the risk that negative public perception of AI becomes so strong that we effectively cede the race to China.
2. The SF Bubble
Agents don't vote. People do.
I've heard it said that "SF" is a state of mind. I like this idea, as it's frequently the state my mind occupies. However, I also live outside of physical SF and interact mostly with people who think poorly of SF—if they think of it at all. The SF bubble massively overestimates how sold the public is on the idea that the benefits of AI outweigh the costs.
This has not yet become a meaningful problem because the issue has not been forced. This will obviously change.
Many current benefits (code generation, streamlined research, automation) are diffuse, most directly realized in the hands of those who use AI, while many downsides (job loss, sophisticated targeted scams, bot takeover of social media) can be felt directly—particularly by those who do not use AI. Given these different experiences, what should people think about when they think about AI?
Will targets of AI-driven layoffs care that the geniuses might cure cancer eventually? Will fresh graduates appreciate the potential abundance of affordable food if they can't participate in the game?
AI is clearly not yet ready to solve all the world's problems. But it may need to start producing clearly positive externalities that the public can see and appreciate.
This is challenging because solved problems tend to disappear from people's views and expectations, but I could see a two-pronged approach:
- Here's the fallback to keep you going if AI takes your job. Here's how we have dropped the costs of the essentials needed to live. Here's how you can pay for a place to live, eat good food, and take care of your children.
- Here is the moonshot. Here is how AI is solving a visible, pernicious problem in a highly agreeable way. It should be ambitious—if you could do it without AI, there is little point—yet also tractable. Curing disease is great for getting broad public approval, but progress in that area is uneven, and gains are opaque to the average person.
Whether this approach is taken or not, it seems essential that the visible good from AI outweighs the bad clearly during this transitional period, or there is real risk that it will become highly popular for the government to meaningfully hamper AI progress.
3. The Question of Control
In "The odious apparatus" Dario calls out the potential for authoritative control from various parties: the CCP, competitive democracies, non-democratic countries, AI companies.
This is a significant concern, but as Dario is well aware there is another: If datacenters can manufacture consent and create unbeatable drones, they are the new superpower.
Part of why economists and political scientists often seem to disagree is that economics is a non-zero-sum game, but political power is a zero-sum game. The United States or any other governing authority is likely to be excited about the economic potential of AI while it plays the game of economics. But that game will eventually become political. Will the United States government allow an internal power that is greater than its own?
At the risk of sounding like an LLM—it's not that this level of power could fall into the wrong hands. It's that this power MUST be driven by someone or something. Who will that be?
- Government officials?
- Private companies?
- Individuals?
- A national population?
- Will the systems govern themselves?
It is not difficult to imagine the likely downfalls of each of those options.
It seems unlikely that the infinite abundance of post-scarcity will change human nature itself and eliminate the desire for power or need for status games. How do those games play out if there is a centralized power that trivializes all other powers?
It is insufficient to care about the wrong group gaining access to the technology; the challenge is that whatever group controls it could end up being "the wrong group."
What if the problems described here are actually pieces of the solution?
4. Building Resilience
Everyone knows that prediction is very difficult—"especially if it's about the future," as Niels Bohr reportedly quipped.[1] Significantly, there is also the Nassim Taleb spin:
"The inability to predict outliers implies the inability to predict the course of history."[2]
"What is nonmeasurable and nonpredictable will remain nonmeasurable and nonpredictable ... no matter how much hate mail I get."[3] — Nassim Nicholas Taleb
Notably, Taleb points out that the best responses to inherent unpredictability are not to develop better forecasting, but to focus on building resilience (or even antifragility).
One path towards resilience is building tooling to support the observability and alignment of the work produced by the datacenter geniuses. Monitoring the alignment of the models themselves is the obvious step Anthropic works hard at, but the tools for interacting with these models also need to present information about the work that is done in a clear way.
A slightly misaligned Claude could conceivably write a backdoor into every software system it has access to, and the current observability tooling is likely insufficient for the human in the loop to catch it. Certainly there will be systems of models watching models, and tracking the work production of a million geniuses seems daunting. Yet AI watching AI, while a useful approach, is a highly jagged interaction to stake humanity on. Human observability of the work AI produces is an essential frontier.
Dario notes that humans may be unable to offer work of meaningful value to the AIs. My alternative view is that human observation of the alignment of AI work output is of effectively unlimited economic value; proper observation of the alignment of AI's work could mean that we realize an infinitely growing economy, while failure to do so could have existential consequences.
As one example, Claude Code is a tremendous tool for producing code, but it is challenging to track its work from within the tool itself. Code scrolls by at rates faster than a human can comprehend, and it is often presented context-free. The natural desire to run multiple instances means that users are likely not even viewing the stream. Clearly there are some mechanisms for addressing these issues, but they are learned operational behaviors rather than exposed parts of the tooling itself.
Interestingly, this is a design space with a long history. During the 1960s, significant debates took place in NASA around the appropriate level of flight automation. Too many systems with a manual structure might mean the astronauts would become overwhelmed during critical phases. Too much automation might mean the astronauts would become disengaged and fail to intervene when necessary.
Some thoughts on this dynamic come from the study of Situation Awareness. Endsley (1995) provides a taxonomy of errors:[4]
Level 1: Failure to Correctly Perceive the Situation
- Data not available
- Hard to discriminate or detect data
- Failure to monitor or observe data
- Misperception of data
- Memory loss
Level 2: Improper Integration or Comprehension of the Situation
- Lack of or poor mental model
- Use of incorrect mental model
- Overreliance on default values
Level 3: Incorrect Projection of Future Actions of the System
- Lack of or poor mental model
- Overprojection of current trends
There are reasonable arguments to be made that the design of Claude Code makes it easy for users to fall prey to errors at each of those levels. There is a large divide between making systems designed for humans, systems designed for machines, and systems designed for human-machine cooperation.
Claude's constitutional principles offer a way for Claude to enter any situation with core principles that are likely to provide good results. However, those principles will be tested against the complexities of reality at increasing scale. Feedback from broad groups of highly skilled humans can provide the data needed to continue to refine the core principles and potential special exceptions.
While not every automated AI action across the economy needs constant monitoring, we will want to be able to actually check any section of the economy with ergonomics appropriate for the task. The sooner we can build and start getting feedback on these observational systems that keep humans engaged productively in the loop, the better those systems will be when they are needed most.
The dispossessed capable would be offered meaningful work. An advanced biology degree would still be valuable—really, more valuable than ever given the level of work it could allow the datacenter geniuses to produce.
Despite the challenges listed here, I remain hopeful that we can produce an AI future that steers towards flourishing for all.
References
- This quote is often attributed to Niels Bohr, though its origin is uncertain. Various forms appear in Danish folklore and have been attributed to multiple figures including Yogi Berra and Mark Twain. See: Quote Investigator, "It's Difficult to Make Predictions, Especially About the Future."
- Taleb, N. N. (2007). The black swan: The impact of the highly improbable. Random House.
- Taleb, N. N. (2012). Antifragile: Things that gain from disorder (p. 138). Random House.
- Endsley, M. R. (1995). A taxonomy of situation awareness errors. In R. Fuller, N. Johnston, & N. McDonald (Eds.), Human factors in aviation operations: Proceedings of the 21st conference of the European Association for Aviation Psychology (EAAP), Vol. 3 (pp. 287-292). Avebury Aviation.