Will the Need to Retrain AI Models from Scratch Block a Software Intelligence Explosion?
Once AI fully automates AI R&D, there might be a period of fast and accelerating software progress – a software intelligence explosion (SIE).
One objection to this is that it takes a long time to train SOTA AI systems from scratch. Would retraining each new generation of AIs stop progress accelerating during an SIE? If not, would it significantly delay an SIE? Tom’s new post investigates this objection.
Here are his tentative bottom lines:
Retraining won’t stop software progress from accelerating over time.
Suppose that, ignoring the need for retraining, software progress would accelerate over time due to the AI-improving-AI feedback loop.
A simple theoretical analysis suggests that software progress will still accelerate once you account for retraining. Retraining won’t block the SIE.
Retraining will cause software progress to accelerate somewhat more slowly.
But that same analysis suggests that your software progress will accelerate more slowly.
Quantitatively, the effect is surprisingly small – acceleration only takes ~20% longer.
However, if acceleration was going to be extremely fast, retraining slows things down by more.
Retraining means that we’re unlikely to get an SIE in <10 months, unless either training times become shorter before the SIE begins or improvements in runtime efficiency and post-training enhancements are large.
Today it takes ~3 months to train a SOTA AI system.
As a baseline, we can assume that the first AI to fully automate AI R&D will also take 3 months to train.
With this assumption, simple models of the SIE suggest that you’re unlikely to complete the SIE within 10 months. (In the model, the SIE is “completed” when AI capabilities approach infinity – of course in reality they would reach some other limit sooner).
BUT if by the time AI R&D is fully automated, SOTA training runs have already shortened to one month or less, then completing an SIE in less than 10 months is much more plausible.
And improvements in runtime efficiency and other post-training enhancements (which are not modelled in the analysis) could potentially allow very fast takeoff without needing to retrain from scratch.