Superintelligence Risk Project Update

Kaufman, Jeff T.

Superintelligence Risk Project Update	July 10th, 2017
	airisk, ea

I've now been working on my project of assessing risk from superintelligence for a little over a week, though I was traveling for the end of last week. To keep me motivated, and let other people understand how I'm approaching this, here's what I've done so far:

More reading. The three most useful ones were probably:
- Daniel Dewey's My current thoughts on MIRI's "highly reliable agent design" work.
- The comments on Ben Hoffman's OpenAI makes humanity less safe, such as this one of Paul's and this thread under Sarah's.
- Luke Muelhauser's Replies to people who argue against worrying about long-term AI safety risks today and links.
More talking to people:
- Five ML researchers (two I know through EA and AI safety, two I went to school with, and a friend of a friend).
- Four ML practitioners (two I know through EA and AI safety, two I went to school with).
- Daniel Dewey, Open Phil program officer for risks from AI
- Owen Cotton-Barratt, Research Fellow at FHI.
Wrote up notes from two conversations
- Dario Amodei (work at the intersection of ML and AI safety is very valuable from both perspectives)
- Michael Littman (AGI is too far off for us to work on now)

I currently see three main views:

AGI is too far away for us to tell what it will be like, and don't think we can make progress now. The approach laid out in Concrete Problems (pdf) is good ML, but it's not additionally valuable from a superintelligence risk perspective. —I think this is most ML researchers (ex) and a high fraction of ML practitioners.
AGI may happen soon with systems similar to current ones, so we should improve their alignment, transparency, and robustness. Or AGI is farther off but what we learn on current systems is likely to be pretty transferable. —This seems especially common among AGI researchers (ex), less common among general ML researchers, and uncommon among ML practitioners.
Making AGI safe requires a solid understanding of what intelligence is, how to make decisions, how to handle logical uncertainty, and other questions. We need to build a theoretical foundation for provably aligned AGI. —This view is primarily associated with MIRI and is relatively popular within EA.

The three places where I see people disagreeing the most are:

Will AGI look like what we have now? The more similar you think it will be to what we have now, the more likely work on it today is to transfer. This seems to be the main difference between the "too soon to work on it" and "work on making current systems safer" groups.
Does progress require applicability? Can we advance our understanding with a theory-only approach that we only apply much later, or do we need to be constantly testing ideas in real systems? This seems like the main reason ML people are skeptical of theoretical-foundation style approaches.
Does safety require proof? Can we make a system we trust where we only have observational evidence that it's doing the sort of things we want it to do? This seems like the main reason theoretical-foundation people are skeptical of Concrete Problems style approaches.

Comparing this to my list when I was getting started:

How likely is it that current approaches are all we need for AGI with relatively straightforward extensions and a lot of scaling? Pretty much the same question as #1 above.
How valuable is it to work on solving problems that are probably not the right ones? ... I think some of the disagreement may be EAs being more comfortable valuing work on things that they're pretty sure won't be valuable but will be very valuable if they do. Not at the Pascal's Wager level, but at levels like 15%. But it's also just an open question, and people have pretty different senses of how much work now is likely to transfer.
How useful is it to have a strong theoretical foundation, vs just understanding the technology enough from an engineering perspective that we can make it do things for us? Pretty much my #3 above.
How similar is this to normal engineering? How much should we expect companies' desires that their AI systems do what they want them to do to work out? Talking to Dario convinced me this wasn't the right way to be thinking about it: "Dario's response was that transparency and safety are difficult research areas and, while they do pay off in the short run, they pay off more in the long run, so will tend to be underinvested in. There are also many more promising research directions than researchers right now, so what ends up getting explored is highly dependent on what researchers are interested in."
As we get closer to AGI, how likely is the ML community to take superintelligence risk seriously? I no longer think this is a cause of disagreement. People in the ML community who think we're close generally do take it seriously and many work on it.

Referenced in:

←

Technical Distance to AGI Conversation with Michael Littman

→

Comment via: google plus, facebook, substack

Superintelligence Risk Project Update

Recent posts on blogs I like:

Controversial smut as an AI alignment issue

Fiddle Practice

New Pony