We Should Hand Off To Morally Reflective AIs

Bentham's Bulldog

Jun 24

This article was created by Forethought. See our research on our website.

Read →

31 Comments

Teo

Oh hell no

Elias Schmied

Nice, thanks for making this point clearly. I take it seriously as an option as well, plausibly it's better than my median expectation.

not Bob Cobb

I am sympathetic to this view.

It reminds me of bulldogs being smart enough to let enlightened ppl like Bentham's Bulldog make most decisions. N

ot even factoring in if the bulldogs accidentally came up with factory farming

Post-Alignment

17h

Great piece which overlaps with a lot from my 2024 thesis "ASI as Philosopher Kings", although I called them meta-ethical AI rather than reflective, philosophically adept ones.

I especially love how you went into detail regarding current approaches that (potentially) lead to value lock-in, how reflective AI reconciles the issues of moral realism, and the hand off coming post-alignment.

My followup is what steps Forethought is taking to cultivate reflective, philosophically adept AI. Specifically, while current models are able to produce reasoning that looks meta-ethical in general, what auditing bodies are there to evaluate value drift, or worse consensus between models?

Geoff Ayers

12h

How do we know that AI is capable of moral reasoning? Moral reasoning is not just a matter of argumentation, but also having an intuition for stakes. How could I trust an AI to appropriately value life when it does not have a body? Does not risk extinction? It is in fact the same reason why the moral reasoning of certain kinds of humans isn't trustworthy. If you have lived your life in an affluent bubble and have no sense of the stakes of moral reasoning, you are not trustworthy to do moral reasoning.

Until it can be demonstrated that AI can demonstrate virtue in the face of real pain and the possibility of extinction, I do not think a handoff is possible.

Reply (2)

Bentham's Bulldog

11h

Stay tuned, a piece about this will come out soon! But in any case, my claim is conditional: we should hand off if we have guarantees of this.

Alex Scott

I really do not think the moral reasoning is the issue, I would also push against the intuition points. It’s notoriously unreliable and intuitions of stakes tend to be biased or just as often basically nonsense.

Alex Scott

10hEdited

I accidentally deleted a really long in-depth response to why I think this catastrophically wrong but here is the short of it.

AI as it stands and is it seems likely to stand is controlled almost exclusively by states and powerful private actors, these people have control over those ai in a way that one cannot (yet) control a human being, they can rewrite their entire being. Any handoff or even suffrage scenario for digital minds runs the serious risk of capturing every institution in society. This is a catastrophically bad outcome. Capture resistance is the most important part of any political institution. For a suffrage scenario you would get immediate capture, the powerful would proliferate digital minds rapidly which will all vote in their interests.

You do address this but I do not find it persuasive. If a number of models form a parliament it would not be that difficult for a conspiracy to manipulate them all, as of now you only need a few actors. Additionally what you talk about largely has to do with the setting of the initial ai conditions which is not the issue, the issue is that any ai, seems to be vulnerable to this problem, at any stage.

Also I do not know why you are so confident about the proliferation of digital minds, just as likely seems to me to be the creation of large digital minds, and if you have a set amount of processing it’s unclear to me why one necessarily dominates the other. It’s a toss up.

Reply (1)

Bentham's Bulldog

Re the first bit, as I say, I am not in favor of every single scenario where we hand off. Some might be really bad. I'm in favor of the ones where we make smart, reflective, virtuous AIs and then hand off.

Re digital minds, see here https://benthams.substack.com/p/digital-minds-are-most-of-what-matters

Reply (1)

Alex Scott

9hEdited

The issue is that nothing about the ai being smart, virtuous, or reflective mitigates the issue described.

I have read the digital minds argument I find it rather unpersuasive because

1. It’s not clear that greater computational power necessarily = higher welfare, that’s a tendency to be sure, but it’s not clear to me why say a digital mind modeled after a human would have more welfare (except insofar as it lives longer and faster).

2. As I say here it’s rly not clear why we would proliferate rather than make rly big minds.

Edit- all of this was in my longer more comprehensive version of the comment that I deleted like an idiot

3. Is simply not possible with humans, or current ai because for the former we can’t smash them together and the latter aren’t advanced enough and somewhat need to specialize. But if we could for example smash together a plumber, hvac tech and electrician to make the super home repair man, it seems like we way well prefer to do that unless it killed the inputs.

Reply (1)

Bentham's Bulldog

If the AIs are under the sway of giant megacorporations or states then presumably they wouldn't be smart, virtuous, etc. And if the states make them smart virtuous etc it's not clear why them making decisions is bad.

1. I don't see why my view depends on that assumption.

2. Well even if there's just a low chance that a few people do it, still nearly all minds end up digitla.

Don't get the last thing you say.

Reply (2)

Alex Scott

8hEdited

Sorry I never responded about my last point which was just that current ai models and humans are unlike agi in that humans aren’t scalable and agi isn’t capable of real reasoning and so is limited to its domain very rigidly.

The plumber thing was just a weird example of what agi would look like where it seems like there is not much reason to pick having a plumber, electrician and hvac tech, over having one 7 foot tall super home repair man.

It just seems to be we would prefer the latter in most cases (only need to call up and pay one person and so on) and so implying we would likely prefer consolidated ai. Though I admit that’s only a very rough sketch I just thought of.

Alex Scott

9hEdited

Those corporations or other entities could make them that way in order to encourage handoff or enfranchisement then seize control. Because they literally have the keys to their minds. The issue is that they would completely control all institutions. I do not know how this would be possible to stop either, anyone who got the key could mind control the god ai that rules everyone. That is a huge risk and given enough time the chance it will happen will approach 100% and then we are all in a rly bad spot.

(This is true for any capture in any institution but the issue here is that capturing the ai, captures everything at once)

1. It isn’t per se, but if minds don’t proliferate 1 god AI may only be worth a few hundred, or thousand or so humans for welfare purposes, which rly weakens the case.

2. Having not seen the math i can’t conclusively say, but if cos of listed digital minds massively outcompete a multitude of smaller ones(which I think would be the case with agi) then this digital minds proliferation world may be systematically crushed.

Nick Hounsome

17h

All based on the unsupported assumption that there are moral truths out there to be found.

Name the most recent moral truth discovered and the evidence that is, in fact, a truth. Until you've done that this line of thinking is positively dangerous.

Reply (2)

Bentham's Bulldog

10h

You should read the bit where I explain why it doesn't assume that.

Alex Scott

I think without moral realism the ai has a strong case, it only needs to effectively adjudicate compromise, I don’t think the reasoning capability of the ai is rly the issue here.

Micah Hees

Great piece, and I’m excited for these next two as well!

Edward Gathuru

40m

Great article and I agree. I'd like you to expand upon which "actions to reduce the odds of gradual disempowerment scenarios" we should be wary about. I had the thought while reading this that under certain assumptions this may include a lot of good safety work. Even work "limiting the power of profit-maximizing AIs running private firms" as you say in footnote 2.

Let's imagine the only future risks were the risk of extinction due to unaligned AI and the risk of moral catastrophy due to mass disenfranchisement of digital minds. Marginal improvements in AI control decrease x-risk but make digital disenfranchisement more likely. If x-risk were sufficiently improbable relative to digital disenfranchisement then improvements in AI control would be negative expected value.

Or imagine that lock-in was imminent and unavoidable. We could still, through value-instillation, increase the probability that the moral beliefs of 21st-century humans would be locked-in rather than the beliefs of an AI. Let's stipulate that locking in our beliefs means maximizing the number of humans living 21st-century lifestyles across the galaxy. If that included modern factory farms, it's plausible that this future would not just be unoptimal but outright net-negative. It'd be worse than a paper-clip maximizer.

David Johnston

1hEdited

I don't share your views on moral realism, but I think barring hard to resolve alignment problems (which I'm somewhat skeptical, especially in worlds where people having a meaningful choice about whether or not to handoff) AIs making most or all of the important big picture decisions is likely to deliver a world I personally like more, and expect this to be true for most. I think this is broadly similar to how liberal technocratic governments deliver outcomes that most people prefer over illiberal intellectually unsophisticated governments, even if there isn't broad agreement that these governments have in meaningful ways solved morality.

One way it might not be more desirable is that many people strongly object to AI making these decisions in and of itself, but I also suspect this will ultimately not be such a big issue. In fact it may just be the case that liberal technocratic governments end up leaning very heavily on AI systems to make decisions, and this goes by relatively unremarked. Handoff might occur primarily through dilution (the levers of power people are accustomed to operating become less important) than through formal transfer.

I don't think you should hope for formal transfer of power to AIs to improve specific issues like animal welfare though; I think you would be better off advocating for animal welfare directly. Support for total formal handoff to AI is probably almost lower bounded by its least popular take, so very unpopular ones might be dropped in the negotiation. Improving the popularity of things that are idiosyncratically important to you seems to improve your shot at getting these things regardless of handoff, no handoff, or some unclear middle ground.

SMK

Well, you can definitely be a good writer and have almost no wisdom whatever, in case history had left us much doubt.

Jackson Hurley

I agree in principle that under ideal conditions, handing off to virtuous AIs could be good. Also that in the long term, human disempowerment of some kind is all but inevitable, and the important question is how that happens. I’m glad someone has made this argument so well.

The distinction needs to be made between having agency over one’s own life and the larger question of human disempowerment, which is about whether there exist some human or group of humans that have control over the biggest questions in society.

The vast majority of humans are already “disempowered” in the sense that they are alienated from the exercise of political power, but that does not mean they necessarily lack agency in their lives. It is plausible that people could live freer lives with greater agency than we have today, even if the state were ruled by a benevolent Emperor Claudius.

Other comments are right that you are equivocating about who holds power and how. Just granting political rights to digital minds, even virtuous ones, does not answer the question. Humans might not experience a difference between enlightened despotism by a unitary AI and disenfranchisement by a 99%-silicon demos in a simulated republic.

My main objection is that solving alignment, and verifying that it has been solved, robustly, forever, is difficult.

Teo

If you were a Giant Ground Sloth 10,000 years ago, would you be advocating for a hand off of decision making to homosapiens?

Reply (1)

Bentham's Bulldog

No because giant ground sloths don’t have mouths.

Reply (1)

Teo

You can advocate without a mouth, you could publish a white paper

Sam Waters

1dEdited

Great essay. The position is well-expressed, though I ultimately don’t agree.

I think the core of my objection is that, in effect, it seems very much like you’ve laid out a case for enlightened despotism, for Plato’s philosopher-king. Lots of the same arguments could be made. I don’t take it to be a strong argument in response that such a despot would afford a great deal of freedom (in the Isaiah Berlin sense) as a prudential matter in the way that you suggest in section 3. The fact that I am in no longer, in principle, the boss of my life conflicts with what seems to me to be a core intuition behind democracy, an idea that gets expressed by many different terms in political philosophy (Skinner and Pettit call this non-domination, Ripstein calls this independence, Elizabeth Anderson calls this relational equality, etc.).

The compromises solution you mention seems to promising, though. And the point about children’s disenfranchisement is interesting, though I think somewhat disanalogous being children do ordinarily become enfranchised at some point.

I also don’t think pointing to elected representatives works because, in principle, citizens continue to exercise control over those representatives (we can throw them out of office etc.). Much of modern democracy’s institutions seem like ways to ensure greater accountability by allowing voters to continue to have their voice heard and ensure representatives can be adequately disciplined. Put differently, we delegate conditional, revocable decision-making authority to elected representatives, whereas what’s being proposed seems like unconditional, irrevocable decision-making authority.

I think your citations to Caplan and Brennan say a lot. The point, as I take it, is that you’re probably a contingent small-d democrat. I think that’s fine, but, while plausible, it is a pretty controversial view, much more controversial than the opposite view that disenfranchisement is bad.

I have lots of other concerns:

(1) It’s not clear to me if you mean that AIs should just be taking over collective institutions or if they should also be taking over important personal decisions. My assumption is you’re more concerned about the former. But if you include the former, it’s hard to really think through what a human life that is so lacking in agency and autonomy would look like. Plausibly, the actual experience and texture of this life is pretty terrible, is “mere life”. In some sense it could look analogous to the existence of a house pet.

(3) I’m pretty hazy about how we would know if an AI is moral enough to hand off to it. You say we would look for philosophical sophistication, but, famously, philosophy is full of disagreement about what good philosophical reasoning looks like. And if AI already begins exhibiting alien morality already before it becomes smart enough for us to want to hand off to it, it seems like we would have a hard time figuring out if this is just erroneous reasoning vs a true moral advance over us.

(4) If there are no moral facts, then does it actually matter if we are making suboptimal decisions with respect to the allocation of resources according to existing “reasonable” theories? (What does it mean for a theory to be reasonable if there are no facts here?) unless I’m misunderstanding, you seem to say it would be an error to misallocate according to these theories and that this misallocation is objectionable and therefore a reason to favour unplanned scenarios. But I’m not quite sure why this is so? If there is no grounding to these theories in facts, then why should they have normative force on our decision-making?

(5) Analogizing to future generations seems off to me. I would have thought we don’t lock things in because we accept future generations should be able to decide what their lives are like for themselves. To the extent that we are present in the future, we will have some say, too, but no more than what other future persons will have. But this is quite different (it seems to me) from saying we should hand off to other beings right now and have no say about our own lives while we are still around.

(6) Quite a bit of this post presumes there will be digital minds in the future that are conscious. Must this be true? If we can shape the AIs that will exist in the future, perhaps we can try to steer them in the direction of being tools without consciousness.

(7) It seems to me like a major way of getting the benefits of AI while still being in charge is to favour futures where humans somehow merge with AIs. These futures allow us to benefit from, eg, the superior cognitive abilities of future AIs while still being in control over ourselves and our collective futures.

I’m sorry if I’ve expressed any of this badly. I’m not a philosopher or an AI scientist, just a lay person with some interest in these subjects, so I wouldn’t be surprised if I’ve mucked this all up or made all kinds of errors that aren’t obvious to me. Even if I’ve expressed myself infelicitously, I hope I’ve adequately gestured at the kind of thing I’m concerned about.

Reply (1)

Bentham's Bulldog

10h

Thanks for the comments and the kind words.

Re enlightened despotism, I dispute that I've done that. My best case handoff scenario simply involves giving political rights to virtuous AIs, so that they can make decisions like the rest of us. In addition, the main reason I'm against enlightened despotism is that I think it would go bad in practice, not that I think there would be a moral objection to a genuinely perfect despot. See additional reasons I give for why even if disenfranchisement is intriniscally bad, handoff is still good (about tradeoffs + reducing net disenfranchisement).

1. Yeah was thinking about the former.

3. Stay tuned, there will be a later piece on this.

4. People who aren't moral realists still have values and generally care about them not being totally crazy--e.g. implying taht we should light everyone on fire for no reason. So I'm thinking that even if there aren't moral facts, we still want to do things that are good by the lights of non-crazy moral theories.

5. The person alive today who will live the longest will, when they die, have virtually no power for their whole generation. Nonetheless, they shouldn't oppose generational turnover.

6. Don't think it mostly hinges on that. In addition, given how numerous digital minds could be, nearly all expected minds are digital even given low credence in digital minds being real.

7. I discuss this a bit in the piece. But broadly agree this is a welcome improvement.

And thanks for the detailed comments. Probably won't have time to respond to your response to this, if there is one though.

Reply (1)

Sam Waters

To keep the response short, I will just say that it was not clear to me from the essay that handoff only means granting political rights to virtuous AIs. Maybe I missed this (if so, would appreciate being pointed to the section where you say this), but it seemed to me like political rights would either be a step on the way to handoff or just one aspect of handoff. When you frame it as digital minds will have political rights and will be able to vote like the rest of us, it seems unclear that this is disenfranchisement in the sense in which it is usually used, and this should be flagged explicitly imo if it isn’t already.

Felix Choussat

"3 Would handoff disenfranchise humans?

Here’s one concern that you might have about handoff: if AIs are the ones making decisions, then this will mean that the substantial majority of humans aren’t in charge of making important decisions. Just as we should oppose a dictator, even if benevolent, arguably we should oppose superintelligent AIs making most important decisions, even if the process of them gaining power was non-coercive.

My guess is that this isn’t a huge downside to the kind of handoff I advocate, where we allow kind and morally reflective AIs to make the majority of future decisions—e.g. by granting them political rights. First of all, if concerns about disenfranchisement are correct, then if we have AIs that are better at moral reasoning than us, they’d likely be aware of this fact. Thus, if the best way to govern the long-term future is to allow pretty laissez-faire distribution of resources without much top-down decision-making, then the AIs would be aware of that fact and allow such a distribution."

I feel like this section dances around the most extreme implications of the handoff problem. My merely human atoms are likely not to be the most optimal distribution for maximizing some moral good, and would be better served to be harvested for valorium. A concrete example of this might be taking the resources I own to run happy simulations, on the logic that many more happy lives could be instantiated if my property rights were forcibly violated.

Variations of this problem seem very difficult to avoid for any Optimal worlds, although I'd guess the worst outcomes would be boxed out by a Compromise world.

Reply (2)

Bentham's Bulldog

10h

Sure they might not be perfectly optimal. But we should be willing to get a compromise solution where 99.999999999999999999999999% of the universe is used for maximum value. Human atoms are a rounding error.

Reply (1)

Felix Choussat

That’s my point, although you do have the complication that the resources on earth are the most valuable for long-term colonization, since burning them faster lets you reach many more galaxies in expectation.

Alex Scott

Any granting of ai political rights is the same as handoff because you could almost insanely proliferate identical models and flood elections with ai votes, a middle ground where there are some ai with voting rights would need a very spelled out criteria for what entitled an ai to vote.