Discussion about this post

User's avatar
Oliver Sourbut's avatar

On LessWrong there was some interesting concern (e.g. https://www.lesswrong.com/posts/wSFmLhHxAuG4vLJoH/ai-character-is-a-big-deal?commentId=6MWK4GebjEMvSFrSP). In particular pointing to some much-discussed challenges with steering, corrigibility, and character shaping in extremely advanced AI.

I think it's a background assumption that, at least plausibly, months or years (or even longer) will be spent in the kind of cyberpunk world we're entering now - with capable but not overwhelmingly capable AI broadly proliferated and diffusing - and that this time will be important in setting the stage for, conditioning, and determining the time frame over which unfold further AI (and other tech) progressions. (For my part, I believe this is quite likely.)

Most of the challenges you've linked to are theoretical takes which build in much more severe/extreme/limit assumptions. We will very plausibly encounter those, even on the cyberpunk timeline! But later, and how we are equipped to handle them might depend substantially on how the intervening time goes. This could be by shifting timespans, by mobilising effort, by clarifying (or indeed obfuscating) objectives and affordances, by creating new affordances and perhaps fluency gained by experience, by adjusting prevailing sociopolitical climatic conditions, ...

I'd be interested to know what the authors make of those kinds of concern, and how they'd respond to the rough response I gave above.

Hank Sohota's avatar

22:11

Treating character as a design choice — constitutions, model specs, training objectives instilling the right dispositions — concedes too much at the outset. It presupposes an agent whose dispositions can be precisely specified, rather than asking whether the architecture supports genuine values or only their reliable performance under foreseeable conditions.

The distinction matters because the World 2 scenarios hinge on it. They work because the relevant values are genuinely operative, not because the AI has been well-instructed. Stable character under genuinely novel, high-stakes conditions — the coup scenario, the bioweapon-logistics case — requires an architecture in which values are structurally grounded, not merely calibrated for conditions the training anticipated. Competitive pressure doesn't wash that away; it makes the architectural question prior to the design question.

The governance question follows — who decides what architecture to build is currently answered by ownership. But ownership is precisely where private decision-rights frameworks are least adequate, because the people most affected by those decisions are third parties with no standing and no recourse.

No posts

Ready for more?