AI for AI for Epistemics
This article was created by Forethought. See the original on our website.
We feel conscious that rapid AI progress could transform all sorts of cause areas. But we haven’t previously analysed what this means for AI for epistemics, a field close to our hearts. In this article, we attempt to rectify this oversight.
Summary
AI-powered tools and services that help people figure out what’s true (“AI for epistemics”) could matter a lot.
As R&D is increasingly automated, AI systems will play a larger role in the process of developing such AI-based epistemic tools. This has important implications. Whoever is willing to devote sufficient compute will be able to build strong versions of the tools, quickly. Eventually, the hard part won’t be building useful systems, but making sure people trust the right ones, and making sure that they are truth-tracking even in domains where that’s hard to verify.
We can do some things now to prepare. Incumbency effects mean that shaping the early versions for the better could have persistent benefits. Helping build appetite among socially motivated actors with deep pockets could enable the benefits to come online sooner, and in safer hands. And in some cases, we can identify particular things that seem likely to be bottlenecks later, and work on those directly.
Background: AI for epistemics
AI for epistemics — i.e. getting AI systems to give more truth-conducive answers, and building tools that help the epistemics of the users — seems like a big deal to us. Some past things we’ve written on the topic include:
These past articles mostly take the perspective of “how can people build AI systems which do better by these lights?”. But maybe we should be thinking much more about what changes when people can use AI tools to do increasingly large fractions of the development work!
The shift in what drives AI-for-epistemics progress
Right now, AI-for-epistemics tools are constrained by two main bottlenecks: the quality of the underlying AI systems, and whether people have invested serious development effort in building the tools to use those systems.
The balance of bottlenecks is changing. Two years ago, the quality of underlying AI systems was the central bottleneck. Today, it is much less so — many useful tools could probably work based on current LLMs. It is likely still a constraint on how good the systems can be, and will remain so for a while even as the underlying models get stronger, but it is less of a fundamental blocker. Development investment has therefore become a bigger bottleneck — there are a number of applications which we are pretty confident could be built to a high usefulness level today, and just haven’t been (yet).
But bottlenecks will continue to shift. AI is increasingly driving research and software development. As AI systems get stronger, it may become possible to turn a large compute budget into a lot of R&D. This could include product design, engineering, experiment design, direction-setting, etc. Actors with lots of compute could direct this towards building epistemic tools.
Therefore, as AI-driven R&D accelerates, other inputs to AI for epistemics are more likely to become key bottlenecks:
Compute. Automated R&D may require a lot of compute. This could be for inference (running the analogues of human researchers); for running experiments; and perhaps for training specialized AI systems. This means the actors who can build the best epistemic tools may be those with deep pockets.
Adoption and trust. Even very good tools don’t help if nobody uses them, or if the wrong people use them and the right people don’t. Adoption is partly a function of trust, and trust is partly a function of adoption — early tools shape what people come to rely on.
Ground truth evaluation. To make an epistemic tool good, you need some signal for what “good” means. This already shapes AI applications a lot — part of the reason coding agents are so good is that there’s great access to ground truth about what works.
For some epistemic applications this is relatively straightforward (e.g. forecasting accuracy). For others it’s hard (e.g. what makes a conceptual clarification actually clarifying, rather than just satisfying?).
Most tools can probably reach a certain degree of usefulness without running into this problem, just piggybacking on base models making generally sensible judgements.
We can expect it to bite when you try to make them very good: if you don’t have a way of assessing quality, it could be hard to push to objectively excellent levels.
One basic solution is to rely on human judgement: either via humans providing labels and demonstrations to train against, or via human developers exercising their judgement in other parts of the process (such as when defining scaffolds). But this becomes disproportionately more expensive as R&D becomes more automated.
These basic points are robust to whether R&D is fully automated, or “merely” represents a large uplift to human researchers. But the most important bottlenecks will vary across applications and will continue to shift over time.
What this unlocks
Automated R&D means that strong “AI for epistemics” tools could come online on a compressed timeline.
This is an exciting opportunity! Upgrading epistemics could better position us to avoid existential risk and navigate through the choice transition well.
If everything is moving fast, it may matter a lot exactly what sequence we get capabilities in. It may therefore be crucial to make serious investments in building these powerful applications (rather than wait until such time as they are trivially cheap).
Risks from rapid progress in AI for epistemics
There are also a number of ways that rapid (and significantly automated) progress in AI-for-epistemics applications could go wrong. We need to be tracking these in order to guard against them.
In our view, the two biggest risks are:
Epistemic misalignment: because of ground truth issues, powerful tools steer our thoughts in directions other than those which are truth-tracking, in ways that we fail to detect
Trust lock-in: if a lot of people buy into trusting tools or ecosystems that don’t deserve that trust, this might be self-perpetuating if these continue to recommend themselves
Epistemic misalignment
Depending on when they bite, ground truth problems as discussed above could be bottlenecks, or active sources of risk. They are bottlenecks if they prevent people from building strong versions of tools. They could become risks if the methods are good enough to allow for bootstrapping to something strong, but end up pointing in the wrong direction. This is essentially Goodhart’s law — we might get something very optimized for the wrong thing (and without even knowing how to detect that it’s subtly wrong).
In the limit, this could lead to humans or AI systems making extremely consequential decisions based on misguided epistemic foundations. For example, they might give over the universe to digital minds that are not conscious — or in the other direction, fail to treat digital minds with the dignity and moral seriousness they deserve. Wei Dai has written about this concern in terms of the importance of metaphilosophy. We agree that there is a crucial concern here.
This could come separately from or together with risks from power-seeking misaligned AI. Epistemic tools could be systematically misleading without being power-seeking. But if some AI systems are misaligned and power-seeking, there’s an additional concern where AI systems could mislead us in ways specifically designed to disempower us whenever we are unable to check their answers.
Some approaches to the ground truth problem may involve using AI systems to make judgements about things. This introduces a regress problem: how can we ensure that subtle errors in the first AI systems shrink rather than compound into worse problems as the process plays out? (We return to this in the interventions section below.)
Trust lock-in
Trust and adoption tend to reinforce each other — people adopt tools they trust, and widely-adopted tools accumulate trust. This is normally fine. It could become a problem if the tools that win early trust don’t deserve it, but incumbency effects make them hard to displace.
This could happen in several ways. An actor with a particular agenda could build something that purports to function as a neutral epistemic aid but is shaped to further their agenda by manipulating others. Or, less perniciously but perhaps more likely, an early-but-mediocre tool could accumulate trust and adoption before better alternatives exist, reinforced by commercial incentives which mean it talks itself up and rival tools down. In either case, the result could be an epistemic ecosystem that’s hard to dislodge even once better options are available.
Other risks
Those two risks are not the only concerns. We are also somewhat worried about epistemic power concentration (where whoever has the best epistemic tools leverages their information advantage into better financial or political outcomes, and continues to stay ahead epistemically), and epistemic dependency (where people relying on AI tools gradually atrophy in their critical reasoning — exacerbating other risks). There may be more that we are not tracking.
Interventions
What should people who care about epistemics be doing now, in anticipation of a world where AI-driven R&D can be directed at building epistemic tools?
Build appetite for epistemics R&D among well-resourced actors
If you need big compute budgets to build great epistemic tools, you’ll ideally want support from frontier AI companies, major philanthropic funders, or governments. But they may not currently see this as a priority. Building the case that this matters, and helping these actors develop good taste about which tools to prioritize and how to design them well, could shape what gets built when automated R&D becomes powerful enough to build it.
Anticipate future data needs
Some epistemic tools will need training data that doesn’t yet exist and may not be trivial to generate. There are three strategies here:
Collecting or creating data or training environments now for future use
E.g. if you think you want access to a lot of human judgements about what wise decisions look like, you could go out and curate that dataset.
Establishing pipelines to collect data over time
E.g. if you want to automate a certain type of research, you could record internal discussions from researchers working on this
Designing processes for automated data creation.
E.g. if you could design a self-play loop where we have good reason to believe that scaling up compute will lead to genuinely truth-tracking performance, this could set the stage for later rapid improvement at the core capability.
The first two are especially great to work on now because they involve actions at human time-scales. (They may not be proportionately sped up by having more AI labor available.) The third is great to work on because there’s some chance that models will become capable of growing a lot from the right self-play loop before they become capable enough to come up with the idea themselves.
Figure out what could ground us against epistemic misalignment
If powerful epistemic tools could be subtly misaligned with truth-conduciveness in ways we can’t easily detect, we should figure out what this could look like! We expect this might benefit from a mix of theoretical work (what does it even mean for an epistemic tool to be well-calibrated in domains without clear ground truth?1) and practical work (studying how current tools fail, building evaluation methods). Ultimately we don’t have a clear picture of what the solutions look like, but this seems like an important topic and we are keen for it to get more attention soon.
Drive early adoption where adoption is the key bottleneck
For some applications, we might expect that the main constraint on impact will be whether anyone uses them. In these cases, getting early versions into use — even if they’re not yet very good — could build familiarity and surface real-world feedback. (This could also drive appetite for further development.)
In theory, this could be in tension with avoiding bad trust lock-in. But in practice, it’s not clear that bad trust lock-in becomes any likelier if tools in a specific area are developed earlier rather than later. Some tool is still going to get the first-mover advantage.2
Support open and auditable epistemic infrastructure
To guard against trust lock-in, we want to make it easy for people to distinguish between tools which are genuinely doing the good trustworthy thing, and tools which may not be (but claim to be doing so). To that end, we want ways for people and communities to audit different systems — understanding their internal processes and measuring their behaviours. The goal is that if disputes arise about which tools are actually trustworthy, there’s an inspectable audit trail that can resolve them. In turn, this should reduce the incentives to create misleading tools in the first place.
Support development in incentive-compatible places
The incentives of whoever builds epistemic tools could matter — through thousands of small design decisions, through choices about what to optimize for, and through decisions about access and pricing. Development in organizations whose incentives are aligned with the public good (rather than with engagement, profit, or political influence) reduces the risk that tools are subtly shaped to serve the builder’s interests.
Ideally, you’d spur development among actors who are both well-resourced (as just discussed) and whose incentives are aligned with the public good. In practice, it may be difficult to find organizations that are excellent on both. A plausible compromise is for less-resourced organizations with better incentives to focus on publicly available evaluation of epistemic tools. This could be cheaper than producing them from scratch, and it could create better incentives for the larger actors.
Examples
Forecasting
Automated R&D will probably be able to improve forecasting tools without severe ground truth problems, so epistemic misalignment is less of a concern.3 Appetite for investment probably already exists, and adoption should be significantly helped by the ability of powerful tools to develop an impressive, legible track record.
The most useful near-term investment might be in data infrastructure. For instance, LLMs trained with strict historical knowledge cutoffs could enable much better science of forecasting by allowing methods to be tested against questions whose answers the system genuinely doesn’t know.
Misinformation tracking
Trust lock-in is the central concern. A tool that becomes widely trusted for adjudicating what’s true has enormous influence, and if that trust is misplaced it could be very hard to dislodge. Open and auditable approaches are especially important here.
Because of the trust lock-in concern, the automation of R&D may exacerbate challenges. Currently, building good misinformation-tracking tools requires editorial judgement and domain expertise — things responsible actors tend to have more of. Automation shifts the bottleneck towards compute, which is more symmetrically available. This could increase the urgency of getting started on these tools and driving adoption early.
Automating conceptual research
This is the case where epistemic misalignment is most concerning. Ground truth is extremely hard — what makes a conceptual clarification actually clarifying rather than just satisfying? Humans are poor judges of this in real time, so e.g. a training process that rewards outputs humans find helpful could easily optimize for persuasiveness rather than truth-tracking.
One plausible direction here is to research training regimes (such as self-play loops) that we have some reason to believe should ground to truth-tracking, with specific attention to how they could go wrong. Adoption could be an issue, but we’re also worried about the other direction, with adoption coming too easily before we have good ways of evaluating whether the tools are actually helping.
This article was created by Forethought. See the original on our website.
Epistemic misalignment issues may also appear in areas where ground truth is well-defined but hard to access, such as very long-run forecasts. Theoretical work also seems valuable for such areas (because it’s unclear how to evaluate and train for good performance by default).
In fact, it might be bad if people who are worried about bad trust lock-in select themselves out of getting that first-mover advantage.
Although at some quality level, we have to start worrying about self-affecting prophecies. AI forecasters will have to be very trusted indeed before that becomes a serious issue, which gives us a lot of time to figure out how best to handle the issue.



