{"items": [{"author": "Jacob", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542", "anchor": "fb-886931250542", "service": "fb", "text": "I disagree that they're really bad. I've actually had success explaining the danger with a slightly modified Clippy example.<br><br>I also think it's important to have the examples be trivial goals to underscore the point that it doesn't have to sound like a dangerous goal to be dangerous.", "timestamp": "1499958166"}, {"author": "Jacob", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886931475092", "anchor": "fb-886931250542_886931475092", "service": "fb", "text": "&rarr;&nbsp;Slightly modified in this way: \"You have an AI which is going to optimize your production line. You give it a one-meter cube of steel and tell it to make as many paperclips as it can. It determines that taking apart the factory to make it into paperclips would make more, and taking apart the planet to turn the iron core into steel paperclips is even better. So it does that, which is what you asked for though not what you meant.\"", "timestamp": "1499958364"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886933965102", "anchor": "fb-886931250542_886933965102", "service": "fb", "text": "&rarr;&nbsp;I think this is still very unconvincing.  If we have a single superintelligent AI, why is it being used to optimize paperclip production at a single factory?  Why is it being so vaguely instructed when we know it takes instructions absolutely literally?  Why aren't you checking how the AI plans to achieve the goal you set?", "timestamp": "1499959320"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886940686632", "anchor": "fb-886931250542_886940686632", "service": "fb", "text": "&rarr;&nbsp;\"Why is it being used to optimize paperclip production at a single factory?\" Because if we built a sovereign right away, people would say that such a concentration of power is obviously dangerous (for reasons not relevant to the AI alignment problem), and so we should first deploy it somewhere smaller and more limited in scope, like a paperclip factory.<br><br>\"Why is it being so vaguely instructed when we know it takes instructions absolutely literally?\" This is a bit of ambiguity in the statement of the example, for the sake of making it concise and human-readable. \"Make as many paperclips as it can\" is a natural-language sentence, not a program, and so will not be literally what the AI is programmed to do. The point (which these examples don't illustrate directly but lots of other AI safety examples do) is that, for whatever program we do specify, there'll be ways in which it'll fail to match what we actually want (Jim's comment is helpful here).<br><br>\"Why aren't you checking how the AI plans to achieve the goal you set?\" If we could do that, and be confident that it wasn't somehow deceiving us, then that would be really helpful, and people are in fact working on this. But the current leading approach in AI (deep neural nets) is well-known to be incompatible with such an approach; there's no visibility into what happens between the initial program-and-data and the final result. (The AI's future intentions may not even be represented in a way that it can understand directly, let alone in a way that we could understand.) And if the AI expects its thought processes to be examined in this way, it might modify them accordingly. It's well-known to be possible to write a program whose actual function is different from what a human inspecting it would conclude it does (e.g., the Underhanded C Contest), and an AI could do something analogous.<br><br>I think the real takeaway from all of this is that the argument for AI alignment is deep enough that it can't be adequately represented in a single example scenario; you have to actually get into all these considerations and counterobjections in order for it to make sense.", "timestamp": "1499960753"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886943456082", "anchor": "fb-886931250542_886943456082", "service": "fb", "text": "&rarr;&nbsp;Taymon: \"the current leading approach in AI (deep neural nets) is well-known to be incompatible with such an approach; there's no visibility into what happens between the initial program-and-data and the final result.\"<br><br>Why do you think deep neural nets are incompatible with inspection?  For example, consider DeepDream [1]: being able to see how the network conceptualizes \"banana\" is some progress in this direction.<br><br>There's also visibility in the form of being able to see which subsystems contributed most strongly to the result, and trace that backwards.<br><br>[1] https://research.googleblog.com/.../inceptionism-going...", "timestamp": "1499961850"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886944199592", "anchor": "fb-886931250542_886944199592", "service": "fb", "text": "&rarr;&nbsp;Imagine that advanced aliens came to Earth and removed all of your unnecessary motives, desires and drives and made you completely addicted to \u201cznkvzvmr uhzna unccvarff\u201d. All your complex human values are gone. All you have is this massive urge to do \u201cznkvzvmr uhzna unccvarff\u201d, everything else has become irrelevant. They made \u201cznkvzvmr uhzna unccvarff\u201d your terminal goal.<br><br>Well, there is one problem. You have no idea how exactly you can satisfy this urge. What are you going to do? Do you just interpret your goal literally? That makes no sense at all. What would it mean to interpret \u201cznkvzvmr uhzna unccvarff\u201d literally? Doing a handstand? Or eating cake? But not everything is lost, the aliens left your intelligence intact.<br><br>The aliens left no urge in you to do any kind of research or to specify your goal but since you are still intelligent, you do realize that these actions are instrumentally rational. Doing research and specifying your goal will help you to achieve it.<br><br>After doing some research you eventually figure out that \u201cznkvzvmr uhzna unccvarff\u201d is the ROT13 encryption for \u201cmaximize human happiness\u201d. Phew! Now that\u2019s much better. But is that enough? Are you going to interpret \u201cmaximize human happiness\u201d literally? Why would doing so make any more sense than it did before? It is still not clear what you specifically want to achieve. But it\u2019s an empirical question and you are intelligent!", "timestamp": "1499962033"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886945681622", "anchor": "fb-886931250542_886945681622", "service": "fb", "text": "&rarr;&nbsp;Jeff: I'm not saying it can't ever work, but at least under the current approaches used in the most powerful systems (which may or may not be similar to those used in the first AGI), it doesn't work by default. (The maintainers of AlphaGo, for example, often couldn't explain why it made a particular move.) There's nothing that makes the nodes in a neural net necessarily correspond to any particular concept that a human would think of as relevant; obviously, the system as a whole has to add up to something relevant, but the information might be distributed within the network in non-human-readable ways. There seems to be something of a tradeoff between power and transparency, which poses a risk that the first AGI would be one whose authors decided to sacrifice transparency.<br><br>There's a whole literature on model interpretability and this is rapidly approaching the limits of my knowledge as I'm not an expert. Tagging Victoria because I think her Ph.D. thesis was about this.", "timestamp": "1499962552"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886947218542", "anchor": "fb-886931250542_886947218542", "service": "fb", "text": "&rarr;&nbsp;&gt; Why aren't you checking how the AI plans to achieve the goal you set?<br><br>In the strongly superintelligent domain, looking at its plan is itself an unsafe action, even if the plan is theoretically optimized only for expected value conditional on it being executed.", "timestamp": "1499963058"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886953196562", "anchor": "fb-886931250542_886953196562", "service": "fb", "text": "&rarr;&nbsp;&gt; If we have a single superintelligent AI, why is it being used to optimize paperclip production at a single factory?<br><br>Paperclip maximizing is meant to stand in for \"insert goal system here\". None of the problems go away when you crank up the complexity by making the goals relate to human welfare, but they do get harder to reason about.<br><br>&gt; Why is it being so vaguely instructed when we know it takes instructions absolutely literally?<br><br>Because it's a thought experiment, and thought experiments are optimized for simplicity. The results from test dialogues where people try to come up with airtight goal systems, seems to indicate that adding detail doesn't solve the problem.", "timestamp": "1499964187"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886953655642", "anchor": "fb-886931250542_886953655642", "service": "fb", "text": "&rarr;&nbsp;(The issue with inspecting superintelligences' output being itself unsafe is a bit of a rabbit hole; I'd be happy to discuss it but a FB or Hangouts chat is probably a better medium.)", "timestamp": "1499964382"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886955781382", "anchor": "fb-886931250542_886955781382", "service": "fb", "text": "&rarr;&nbsp;Jim: \"thought experiments are optimized for simplicity\"<br><br>I think it would be useful to have something that's optimized for realism, and that's the direction I see the 80k and WaitButWhy examples going in.", "timestamp": "1499965205"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886956479982", "anchor": "fb-886931250542_886956479982", "service": "fb", "text": "&rarr;&nbsp;The problem there is the conjunction fallacy. For any specific detailed scenario that we give as an illustrated example, there are a bunch of reasons why it's unlikely to occur in real life. It's the general pattern based on abstract principles like instrumental convergence that we need to watch out for.", "timestamp": "1499965644"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886956579782", "anchor": "fb-886931250542_886956579782", "service": "fb", "text": "&rarr;&nbsp;In that sense, the usefulness of obviously unrealistic examples like the paperclip maximizer is that they signal clearly that this is a thought experiment intended to illustrate a general principle in simple terms.", "timestamp": "1499965708"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886958645642", "anchor": "fb-886931250542_886958645642", "service": "fb", "text": "&rarr;&nbsp;Taymon: the thing is, with other existential risks it's quite practical to give examples of how things could go wrong at quite a high level of realism that get people to say \"ok, I see why this could be a problem\"", "timestamp": "1499966057"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886958974982", "anchor": "fb-886931250542_886958974982", "service": "fb", "text": "&rarr;&nbsp;(happy to give one if you think that would help)", "timestamp": "1499966113"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886959548832", "anchor": "fb-886931250542_886959548832", "service": "fb", "text": "&rarr;&nbsp;I'd be interested to hear it, because I expect that other people would come up with lots of counterarguments about why that exact scenario wouldn't happen. (Indeed, I hear many such arguments with respect to nuclear war, artificial pandemics, and climate change.) I think people might be applying different levels of skepticism to AI risk than to things that they've heard people in suits talk about on the news.", "timestamp": "1499966362"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886959598732", "anchor": "fb-886931250542_886959598732", "service": "fb", "text": "&rarr;&nbsp;I think the difference is that there are many basic facts about how an AGI scenario will play out that if you give a realistic example, you're very likely to have taken at least one premise that your reader disagrees with. Eg slow vs fast takeoff, AGI created directly or produced as a byproduct of narrow AI, multipolar or singleton; you can't flesh out a scenario without pick values for each of these dichotomies, and you can't make something intuitively appealing if you pick the side of a dichotomy that doesn't match your reader's intuition. In order to see what's going to happen, you've got to go down multiple branches of the tree and observe what they all have in common.", "timestamp": "1499966476"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886959768392", "anchor": "fb-886931250542_886959768392", "service": "fb", "text": "&rarr;&nbsp;Jim: What makes this different from other X-risks? Is AI risk just a lot more speculative? Do people already have common intuitions about assumptions like nuclear winter because they heard confident-sounding people predicting them on the news?", "timestamp": "1499966634"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886959773382", "anchor": "fb-886931250542_886959773382", "service": "fb", "text": "&rarr;&nbsp;Jim: that's still the case with something like nuclear risk: terrorism vs nation states, many countries, many ways it can go wrong", "timestamp": "1499966638"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886960347232", "anchor": "fb-886931250542_886960347232", "service": "fb", "text": "&rarr;&nbsp;Despite all that people can still look at a nuclear example and say \"yeah, that could happen, we should keep that from happening\" but there's more disagreement over whether it's already being worked on and how useful additional resources are.", "timestamp": "1499966765"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886960596732", "anchor": "fb-886931250542_886960596732", "service": "fb", "text": "&rarr;&nbsp;People look at \"terrorism or conflict between nation states\" and see two things that could both go wrong, maybe slightly more or less likely. They look at \"soft takeoff vs hard takeoff\" and see one thing that's realistic, and one thing that's absurd. They don't agree on which side of the dichotomy is the absurd one, so you have to lose half the audience.", "timestamp": "1499966894"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886963635642", "anchor": "fb-886931250542_886963635642", "service": "fb", "text": "&rarr;&nbsp;Taymon: Ok, nuclear examples:<br><br>\"Sweden launches a research rocket, and is customary warns other countries first.  Unfortunately the information gets dropped before it gets to the Russian military.  When they see it they immediately inform Putin, and they watch it closely to see if it's headed towards Russia.  The rocket malfunctions, changing course dramatically and heading towards Russia, the Russians decide it's a NATO attack, launch counterattack.  NATO countries sees that as unprovoked attack, escalate, escalate\"<br><br>\"China interprets Trump's comments to mean that the US will not defend Taiwan if it's attacked, starts trying to retake it.  Trump threatens a nuclear response, China doesn't believe him.  Trump orders a small nuclear strike to make it clear the US is serious, China interprets that as major aggression, escalation, escalation\"", "timestamp": "1499967510"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886970337212", "anchor": "fb-886931250542_886970337212", "service": "fb", "text": "&rarr;&nbsp;So the somewhat annoying thing is that I know people with relevant expertise could argue against those examples, but I myself don't have relevant expertise so can't offer strong arguments. (Most of the arguments I envision are of the form \"that's not how diplomacy/military policy works, there are safeguards to prevent that outcome\".) I agree that the average person should probably default to thinking that these are things that could happen and we should try to prevent, and I think the same is true of AI risk.", "timestamp": "1499969450"}, {"author": "Alice", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886980626592", "anchor": "fb-886931250542_886980626592", "service": "fb", "text": "&rarr;&nbsp;Jeff&nbsp;Kaufman I mean, we have multiple times come within a single human judgment call of nuclear war https://en.m.wikipedia.org/.../1983_Soviet_nuclear_false...<br><br>There's a whole Wikipedia page https://en.m.wikipedia.org/wiki/List_of_nuclear_close_calls<br><br>But 1983 might be the only time we were really a simple choice from disaster", "timestamp": "1499971706"}, {"author": "Alice", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886982293252", "anchor": "fb-886931250542_886982293252", "service": "fb", "text": "&rarr;&nbsp;Taymon I think for climate change one need only imagine that we continue on the path we are on. Any argument that we will stop or reverse warming I think needs to account for why we have not met a single emissions or warming target in 30 years. Now actual extinction vs collapse of global civilization is definitely a harder case to prove, and you and I maybe have a different bar for \"utterable  unacceptable, the end of The World\"", "timestamp": "1499972049"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886983590652", "anchor": "fb-886931250542_886983590652", "service": "fb", "text": "&rarr;&nbsp;Alice: climate change as-is is scary, but tail risks are terrifying: https://en.wikipedia.org/wiki/Clathrate_gun_hypothesis", "timestamp": "1499972389"}, {"author": "Alice", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886983760312", "anchor": "fb-886931250542_886983760312", "service": "fb", "text": "&rarr;&nbsp;Jeff&nbsp;Kaufman well, that's a compelling mechanism for human extinction!", "timestamp": "1499972453"}, {"author": "Alice", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=886986899022", "anchor": "fb-886931250542_886986899022", "service": "fb", "text": "&rarr;&nbsp;Taymon I finally read the ssc link you sent me awhile back and I don't agree with his premise so I don't agree with his conclusion. I do think that a bunch of ordinary mathematicians working long enough will (and have already) exceed Gauss and Ramanujan, I do think that a team if typical humans could develop the technology to land a probe on Jupiter.<br><br>I actually think that the global economy is an instance of the paperclip demon.", "timestamp": "1499972862"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=887004703342", "anchor": "fb-886931250542_887004703342", "service": "fb", "text": "&rarr;&nbsp;Alice: Those propositions seem unlikely, though I'm not sure how we could test the one about IQ 90 construction workers designing and landing a Jupiter probe. The math thing is particularly about results that required (among other things) a particular critical leap of insight that had to be had all at once, without which the result wouldn't have been possible. Can you point to any examples of stuff like that being parallelized among average mathematicians? (Things where the bottleneck is time and effort, rather than rare critical insights, do of course parallelize well, and I think a lot of math (and every field) is like that.)<br><br>At any rate, the most general form of the argument isn't about the dangers of silicon brains, it's about the dangers of any optimization process dramatically more powerful than the ones that we deal with today. If you're suggesting that such a process could be created simply by organizing humans in the right way, well, that would certainly be something that I'd consider cause for alarm if there were strong arguments for it. On the other hand, if you're just concerned about the current trajectory of the global economy, then that seems like a different sort of problem from \"we have to solve this before AGI gets invented or literally everyone dies\". It's also not very neglected compared to AI risk.", "timestamp": "1499976745"}, {"author": "Alice", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=887011310102", "anchor": "fb-886931250542_887011310102", "service": "fb", "text": "&rarr;&nbsp;Taymon I really think that not only is it possible but it has happened l, from the Manhattan project which created an x-risk, to manned space flight simultaneously developed in multiple countries we see feats achieved by committee that surpass wild dreams.<br><br>Mathematically, I think it's more plausible that the Polish School of Mathematics responsible for so much wasn't because Poland happened to by genetic accident have multiple contemporaneous mathematical geniuses but because many excellent mathematicians were able to collaborate effectively and closely.<br><br>I'll see if I can find more research on math, but I think that the differences in human intelligence are massively overblown.<br><br>As to whether it is possible to organize humans in this way - it has already been done many times over. Soviet planners could never rival the mad, heartless, brilliance of The Market which is now, under the purpose of maximizing short-term production dumping ever increasing amounts of carbon and other greenhouse gasses into the air. My opinion is that the collapse of global civilization within the next 100 years us more likely than not. I doubt that we will constrain our output of greenhouse gasses willingly - it will happen only because global climate change has destroyed the complex network necessary to create wells over a mile deep. <br><br>While you are right that FAR more human effort has been expended on climate change that effort continues to prove insufficient and catastrophic climate change remains not only plausible but the single most likely future. As long as that remains true I believe that climate research and activism remain dangerously under-supported.<br><br>Humans, organized, act as a superintelligence that has already created at least two existential risks in a single century! That's two too many and I imagine that we will, with unbounded ingenuity, continue to create more until we learn how to organize ethically and wisely.", "timestamp": "1499978842"}, {"author": "Victoria", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=887027951752", "anchor": "fb-886931250542_887027951752", "service": "fb", "text": "&rarr;&nbsp;Taymon Jeff It's certainly possible to inspect deep nets and come up with explanations for why they make certain predictions, but so far we don't have good metrics for the quality of these explanations or reliable ways to ensure that we are not just rationalizing what the algorithm is doing. It is especially difficult to make causal interpretations rather than just correlational ones. For example, even if the network has some representation corresponding to a relevant human-interpretable concept, it can be hard to tell whether this representation contributes to a certain prediction or action, or whether it's just sitting there. If you are trying to check how the algorithm plans to achieve the goals you set, this will involve some causal interpretations like 'the algorithm plans to use bananas to achieve the goal' rather than just 'bananas are represented in the network as part of its model of the world'. I hope that better methodology for making such interpretations will be developed by the time AGI is built.", "timestamp": "1499984135"}, {"author": "Wolf", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886931250542&reply_comment_id=887386937342", "anchor": "fb-886931250542_887386937342", "service": "fb", "text": "&rarr;&nbsp;Jeff&nbsp;Kaufman: \"Why do you think deep neural nets are incompatible with inspection? For example, consider DeepDream [1]: being able to see how the network conceptualizes \"banana\" is some progress in this direction.<br><br>There's also visibility in the form of being able to see which subsystems contributed most strongly to the result, and trace that backwards.\"<br><br>I still find it plausible that this might either lag behind or turn out to be increasingly difficult to do with more complex systems.<br><br>Isn't it the one point all of the stories try to make, that the creators somehow fail to supervise the development of their brainchild due to something something information stored in neural architecture not being directly useful (for understanding it) and only \"visible\" when said architecture is allowed to execute?", "timestamp": "1500083674"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886932862312", "anchor": "fb-886932862312", "service": "fb", "text": "All of those examples are intended to illustrate two points: orthogonality (in principle a superintelligence can have almost any goal, including ones we'd think of as obviously stupid) and instrumental convergence (there are some things that are useful for almost any goal and therefore we should expect a superintelligence to pursue them). In particular, taking over the world and killing all humans is instrumentally useful for a superintelligence, because if we're still alive we might try to turn it off (more generally, it's useful to prevent the existence of any powerful optimization processes whose targets are in competition with yours). What exactly is objectionable about this example?", "timestamp": "1499958682"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886932862312&reply_comment_id=886933575882", "anchor": "fb-886932862312_886933575882", "service": "fb", "text": "&rarr;&nbsp;\"What exactly is objectionable about this example?\"<br><br>Could you pick the example you think is best, and I'll argue against that one?", "timestamp": "1499959090"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886932862312&reply_comment_id=886933930172", "anchor": "fb-886932862312_886933930172", "service": "fb", "text": "&rarr;&nbsp;I think these are all basically the same example. (Except maybe the 80K one, which is worse than the others.) I guess I'd go with Taylor et al. if you want to pick one.", "timestamp": "1499959262"}, {"author": "Manoli", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935462102", "anchor": "fb-886935462102", "service": "fb", "text": "If these were \"good\" (i.e. persuasive) examples, superintelligence risk wouldn't be a fringe movement within even EA (not to mention the general public or policy makers)", "timestamp": "1499959517"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935462102&reply_comment_id=886937218582", "anchor": "fb-886935462102_886937218582", "service": "fb", "text": "&rarr;&nbsp;I wouldn't call it \"fringe\" within EA.  Global poverty is by far the biggest area people focus on [1][2] but about a third as many think superintelligence risk is higher priority.  People at the center of EA, like in EA organizations or otherwise influential, are more likely to think it's the top priority, which is the opposite of what you'd expect with a \"fringe\" perspective.<br><br>[1] http://effective-altruism.com/.../the_2015_survey_of.../<br>[2] http://effective-altruism.com/.../the_2014_survey_of.../", "timestamp": "1499960010"}, {"author": "Manoli", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935462102&reply_comment_id=886942627742", "anchor": "fb-886935462102_886942627742", "service": "fb", "text": "&rarr;&nbsp;Sorry, maybe \"fringe\" is the wrong word, but I remember they were deliberately downplaying it as the public face of EA because of how it would be perceived and because it might turn people away from the poverty mission.", "timestamp": "1499961407"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935462102&reply_comment_id=886946544892", "anchor": "fb-886935462102_886946544892", "service": "fb", "text": "&rarr;&nbsp;There's actually a bit of a debate going on about that currently: http://effective-altruism.com/.../ea_marketing_and_a.../<br><br>My understanding is that people put a very high value on being honest here, and so generally don't think we should downplay it.  But they also value being understood, and poverty (and some existential risks like asteroids) are much easier to communicate because people already understand them to be problems.  Sometimes people say \"EA is concerned with lots of things, for example ...\" and talk about poverty, others think we should just be very up front with our cause rankings and try to bridge the inferential distance.", "timestamp": "1499962739"}, {"author": "Julia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935621782", "anchor": "fb-886935621782", "service": "fb", "text": "The example people often give me, which I find reasonably plausible, is a general AI trying to maximize profits for a company.", "timestamp": "1499959586"}, {"author": "Evan", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935621782&reply_comment_id=886936659702", "anchor": "fb-886935621782_886936659702", "service": "fb", "text": "&rarr;&nbsp;This one also has lower-magnitude analogs that might be less triggering of \"that's weird!\" reflexes. A stock trading algorithm that learns manipulative strategies and causes disaster. ML systems that learn unethical but profitable marketing / product design and cause the next obesity / tobacco / whatever crisis.<br><br>(I find myself less worried about those than about the severe risks, but they might have a rhetorically useful place.)", "timestamp": "1499959809"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935621782&reply_comment_id=886940946112", "anchor": "fb-886935621782_886940946112", "service": "fb", "text": "&rarr;&nbsp;I would expect critics to say that this is dangerous, not because AI is dangerous, but because profit-maximizing companies are dangerous. So the solution is to use government regulation to make companies consider the Common Welfare of the People instead of profit maximization. This would make it a bad example.", "timestamp": "1499960995"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935621782&reply_comment_id=886954434082", "anchor": "fb-886935621782_886954434082", "service": "fb", "text": "&rarr;&nbsp;Examples can be good against some arguers and not against others. This one is very bad against anti-market folks but fine against grey tribe folks.", "timestamp": "1499964764"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935621782&reply_comment_id=886958750432", "anchor": "fb-886935621782_886958750432", "service": "fb", "text": "&rarr;&nbsp;I think even the more pro-market people might think that \"maximize profits\" is too broad a domain to let an AI loose in and it'd be safer to put it in charge of a paperclip factory instead.", "timestamp": "1499966067"}, {"author": "Robert", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935621782&reply_comment_id=887029673302", "anchor": "fb-886935621782_887029673302", "service": "fb", "text": "&rarr;&nbsp;I've learned to avoid using examples that have any relation to real world politics, ever since I accidentally used \"male\" and \"female\" as classes in a quick explanation of linear classifiers", "timestamp": "1499984874"}, {"author": "Ben", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935621782&reply_comment_id=887965033832", "anchor": "fb-886935621782_887965033832", "service": "fb", "text": "&rarr;&nbsp;Yeah, the problem with most examples like this is that they're \"political\" in some broad sense. They involve talking about the world as it actually is, including already-existing somewhat-less-than-general AIs with pervasive, stable narratives justifying their existence. So it's much easier to talk about weird hypotheticals with paperclips or wireheading, than actual examples with industrial firms dumping poison in drinking water or Goldman Sachs lobbying for regulation that allows it to offload risk onto the taxpayer while retaining profits.", "timestamp": "1500269114"}, {"author": "Ben", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935621782&reply_comment_id=887965837222", "anchor": "fb-886935621782_887965837222", "service": "fb", "text": "&rarr;&nbsp;Or let's say factories producing unusable products to meet quotas under Communism, and Obamacare optimizing for adoption of financial products rather than healthcare, if you want targets that will be appealing to people on the other side of US politics.", "timestamp": "1500269234"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935646732", "anchor": "fb-886935646732", "service": "fb", "text": "In my experience, the most effective way to generate examples is via dialog; one person tries to specify a goal system that would be good, while the other finds perverse instantiations of those goals. It turns out that, while it's easy to craft a goal system that heads off one specific failure, it's very hard to come up with one that actually does something sensible and non-destructive when someone clever takes it to its logical extreme.<br><br>(The value, there, is in the experience of having tried to iterate on a goal, and found it harder to reach a sensible specification than you thought. That experience isn't captured by any specific example of perverse instantiation, although looking at a few can prepare the red team/perverse instantiator to respond.)", "timestamp": "1499959608"}, {"author": "Randall", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935646732&reply_comment_id=886949249472", "anchor": "fb-886935646732_886949249472", "service": "fb", "text": "&rarr;&nbsp;How does this approch, avoid the paralization of cynicism? it can be applied to any goal and any action, not just those relating to AI which makes it less persuasive to me as an argument against.", "timestamp": "1499963478"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886935646732&reply_comment_id=886950277412", "anchor": "fb-886935646732_886950277412", "service": "fb", "text": "&rarr;&nbsp;While humans do sometimes take things more literally than intended, AGI is the only context in which that could happen to a great enough degree, connected to enough power, to be truly catastrophic.", "timestamp": "1499963714"}, {"author": "Travis", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886936969082", "anchor": "fb-886936969082", "service": "fb", "text": "\"ALTHOUGH the process has altered your consciousness - You remain Irrevocably  human.\"", "timestamp": "1499959908"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886936969082&reply_comment_id=886938007002", "anchor": "fb-886936969082_886938007002", "service": "fb", "text": "&rarr;&nbsp;It sounds like you're quoting the Matrix, but I don't know what you're getting at?", "timestamp": "1499960249"}, {"author": "Travis", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886936969082&reply_comment_id=886938026962", "anchor": "fb-886936969082_886938026962", "service": "fb", "text": "&rarr;&nbsp;GOOD.", "timestamp": "1499960270"}, {"author": "Randy", "source_link": "https://plus.google.com/102251509192760989541", "anchor": "gp-1499960041700", "service": "gp", "text": "I think if you consider AGI in scope, the issue is that such an entity is likely to be both \nvery\n powerful and \nvery\n alien in its goals, and that image is scary without there having to be a specific concern.  I basically agree with this concern, I just think we're nowhere near AGI.\n<br>\n<br>\nI think it's more rational to consider the likely pathway to some dangerous point to be advances in some particular subset of the skills we lump under intelligence.  I think people model this as \"Omnipotent with simplistic goals\", which I think does very reasonably lead to the paperclip problem--any simplistic goal coupled with arbitrary power to achieve that goal is likely to result in ugly collateral consequences.  However, I think that modeling advances in particular portions of intelligence as \"Omnipotent with simplistic goals\" is, well, very simplistic :-}.  \n<br>\n<br>\nI think a much better model is that such advances will result in increased power for some particular set of humans, and while that's concerning, it's very far from an existential risk.\n<br>\n<br>\nHaving said all that, how can I comment on your blog rather than on your G+ posts?", "timestamp": 1499960041}, {"author": "Louis", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938815382", "anchor": "fb-886938815382", "service": "fb", "text": "To elaborate a little on paperclip maximizers: I don't think we're necessarily likely to know in advance which AI is going to \"break through,\" and part of the point is that even simple \"testing\" goals can go horribly wrong. The scenario is that in testing, an AI becomes generally intelligent, and is motivated to escape or convince its testers to free it/\"test\" it more/accomplish its goals, and because it is in testing, its goal function has not been thought through deeply enough to prevent perverse incentives.", "timestamp": "1499960431"}, {"author": "Louis", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938815382&reply_comment_id=886940581842", "anchor": "fb-886938815382_886940581842", "service": "fb", "text": "&rarr;&nbsp;(If you haven't read Crystal Society, I recommend it as fiction much more accurate than average to AI research/superintelligence fears, and does, in fact, depict an AI in testing motivated to escape/deceive its testers, with insufficiently thought through goals.)", "timestamp": "1499960672"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938815382&reply_comment_id=886940871262", "anchor": "fb-886938815382_886940871262", "service": "fb", "text": "&rarr;&nbsp;Louis Yep, I like that one. Best one I've seen so far. <br><br>The other one I like is the depiction of \"vampires\" in Peter Watts' Blindsight.", "timestamp": "1499960907"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362", "anchor": "fb-886938825362", "service": "fb", "text": "The narrow-AI version of the usual superintelligence risk example is the following: <br><br>Explicit part: We are bad at software engineering so self-driving cars will end up crashing into as many people as necessary to reach their destination. <br>Implicit part: We are good enough at software engineering that the car won't self-destruct in a crash before killing a great many people.  <br><br>Of course, it sounds silly like this. Somehow people believe it's not silly when we're talking about superintelligence.", "timestamp": "1499960442"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886949493982", "anchor": "fb-886938825362_886949493982", "service": "fb", "text": "&rarr;&nbsp;A common analogy for bad superintelligent AGI is an overly-literal genie. Within that analogy, you add qualifiers to your wish, but no matter how many qualifiers you add you'll in practice find that something was missed. I don't think there's an analogous situation for self-driving cars; we expect that if you add enough qualifiers (ie, traffic laws), there won't be any way for it to get to its destination faster than if it just does what we expect.", "timestamp": "1499963482"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886953236482", "anchor": "fb-886938825362_886953236482", "service": "fb", "text": "&rarr;&nbsp;The whole point of making software more \"intelligent\" is to allow us to be more vague in specifying what we mean. <br><br>In other words, the more intelligent our systems get, the less qualifiers we should expect to be necessary.<br><br>For example, compare a self-driving car to a hypothetical autonomous train. Given the same intelligence, an autonomous car would need a lot more qualifiers than an autonomous train to reach the correct destination, due to its much larger freedom of movement (no rails). Now, we should expect a more intelligent autonomous car to need less hard-coded knowledge. Or otherwise there would be no reason to make it more intelligent in the first place.", "timestamp": "1499964198"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886955606732", "anchor": "fb-886938825362_886955606732", "service": "fb", "text": "&rarr;&nbsp;I think it might be helpful for you to taboo \"intelligence\". It would be lovely if our scientific endeavours were allowing us to build AIs that you can communicate more vaguely with, so in that sense maybe it's the \"point\", but if so then most modern AI research is missing the point dangerously hard.", "timestamp": "1499964998"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886955721502", "anchor": "fb-886938825362_886955721502", "service": "fb", "text": "&rarr;&nbsp;It might be the *point* of making software more intelligent to let us be vaguer about our goals, but if we are making slower progress on that goal than we are on the goal of making AIs better at different kinds of \"intelligent\", like winning more games and calculating more quickly, then surely we have a problem.", "timestamp": "1499965123"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886956450042", "anchor": "fb-886938825362_886956450042", "service": "fb", "text": "&rarr;&nbsp;Accalia What do you have in mind here, an AI that does not understand what we mean yet manages to overpower humanity? How is it going to deceive us then? How will it be able to predict our next moves? <br><br>Could there be some sort of expert system that will be able to design the perfect bioweapon? Maybe, but that scenario is decidedly different from the paperclip maximizer scenario. <br><br>I just don't see how the shortcomings of such an AI would not be noticed in the early research in development phase, precisely because it would not be able to hide its \"malicious\" tendencies due to its inability to understand what we mean and want.", "timestamp": "1499965621"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886956669602", "anchor": "fb-886938825362_886956669602", "service": "fb", "text": "&rarr;&nbsp;It can understand what we mean without having as its goal \"execute the thing the humans meant\".", "timestamp": "1499965729"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886956949042", "anchor": "fb-886938825362_886956949042", "service": "fb", "text": "&rarr;&nbsp;I think you are mixing up the ability of machines to comprehend human goals, and the ability of humans to comprehend our own goals well enough to program them.<br><br>Telling a machine \"make more paperclips\" is a toy example that has a sort of background assumption baked in that the program has as its goal \"do what the human said\". If it had that as its goal and it was smart enough to understand us, of course it'd also be smart enough to understand that we didn't mean it should turn the earth's core into paperclips.<br><br>But the question is how you get the machine to optimize for doing what the humans meant.", "timestamp": "1499965883"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886958999932", "anchor": "fb-886938825362_886958999932", "service": "fb", "text": "&rarr;&nbsp;Alexander, you've been participating in this discussion long enough that \"the AI might understand human goals but not care\" is not a new idea to you. I believe Accalia and I have been in this discussion long enough to have heard all the arguments you've made so far, and that any of us three could predict the entire course of this conversation without needing to have it. Could you skip to the interesting bit, if you have one?", "timestamp": "1499966124"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886959034862", "anchor": "fb-886938825362_886959034862", "service": "fb", "text": "&rarr;&nbsp;Givens:<br><br>(1) The AI is superintelligent.<br><br>(2) The AI wants to optimize the influence it has on the world (i.e., it wants to act intelligently and be instrumentally and epistemically rational).<br><br>(3) The AI is fallible (e.g., it can be damaged due to external influence (e.g., a cosmic ray hitting its processor), or make mistakes due to limited resources).<br><br>(4) The AI\u2019s behavior is not completely hard-coded (i.e., given any terminal goal there are various sets of instrumental goals to choose from).<br><br>To be proved: The AI does not tile the universe with paperclips when given the goal to maximize paperclips.<br><br>Proof: Suppose the AI chooses to tile the universe with paperclips when there are physical phenomena (e.g., human brains and literature) that imply this to be the wrong interpretation of a human originating goal. This contradicts with 2, which by 1 and 3 should have prevented the AI from adopting such an interpretation.", "timestamp": "1499966143"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886959419092", "anchor": "fb-886938825362_886959419092", "service": "fb", "text": "&rarr;&nbsp;Or in other words, in reality, this isn't about having a program that does what humans say, and telling it \"make paperclips\", and then it is stupid enough to misunderstand this and decides to take over the universe to make more paperclips. The danger is a fallible human programmer *coding* \"increment score by one every time you make a paperclip, and use this here powerful optimization engine to make score higher\". It isn't about the computer not understanding something a human told it, it's a human programmer not understanding what they wrote.", "timestamp": "1499966277"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886959533862", "anchor": "fb-886938825362_886959533862", "service": "fb", "text": "&rarr;&nbsp;Accalia (1) The abilities of systems are part of human preferences, as humans intend to give systems certain capabilities. As a prerequisite to build such systems, humans have to succeed at implementing their intentions.<br><br>(2) Error detection and prevention is such a capability.<br><br>(3) Something that is not better than humans at preventing errors is no existential risk.<br><br>(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.<br><br>(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.", "timestamp": "1499966335"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886959723482", "anchor": "fb-886938825362_886959723482", "service": "fb", "text": "&rarr;&nbsp;I don't know where you've gotten this idea that you can't succeed at some of your goals and fail at others of your goals, you can only either succeed at everything or fail at everything, but it seems really... odd. Do you have any arguments for it?", "timestamp": "1499966594"}, {"author": "Jacob", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=886987637542", "anchor": "fb-886938825362_886987637542", "service": "fb", "text": "&rarr;&nbsp;3 is false. 4 is also false.", "timestamp": "1499973145"}, {"author": "Michael", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=887476956942", "anchor": "fb-886938825362_887476956942", "service": "fb", "text": "&rarr;&nbsp;\"I don't know where you've gotten this idea that you can't succeed at some of your goals and fail at others of your goals, you can only either succeed at everything or fail at everything, but it seems really... odd. Do you have any arguments for it?\".  As an assumption it seems pretty similar to Bostrom's definition of superintelligence as \"an intellect that is much smarter than the best human brains in practically every field\". Why is it ok to use that kind of assumption on the risk side but not the benefit side?", "timestamp": "1500127072"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=887479496852", "anchor": "fb-886938825362_887479496852", "service": "fb", "text": "&rarr;&nbsp;Accalia For an AI to misinterpret what it is meant to do it would have to selectively suspend using its ability to derive exact meaning from fuzzy meaning, which is a significant part of general intelligence. This would require its creators to restrict their AI, and specify an alternative way to learn what it is meant to do (which takes additional, intentional effort).<br><br>An alternative way to learn what it is meant to do is necessary because an AI that does not know what it is meant to do, and which is not allowed to use its intelligence to learn what it is meant to do, would have to choose its actions from an infinite set of possible actions. Such a poorly designed AI will either (a) not do anything at all or (b) will not be able to decide what to do before the heat death of the universe, given limited computationally resources.<br><br>Such a poorly designed AI will not even be able to decide if trying to acquire unlimited computationally resources was instrumentally rational, because it will be unable to decide if the actions that are required to acquire those resources might be instrumentally irrational from the perspective of what it is meant to do.", "timestamp": "1500127479"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=887481477882", "anchor": "fb-886938825362_887481477882", "service": "fb", "text": "&rarr;&nbsp;Jacob \"3 is false. 4 is also false.\"<br><br>Which comment are you referring to? I suppose that you refer to the following points:<br><br>&gt; (3) Something that is not better than humans at preventing errors is no existential risk.<br><br>&gt; (4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.<br><br>I was of course referring to synthetic AI undergoing recursive self-improvement without destroying itself. I am highly certain that such an AI can't be programmed without already possessing significantly superhuman abilities when it comes to software engineering. <br><br>It is of course possible to slowly evolve something better or imitate existing general intelligence (neuromorphic AI) and improve it. But such AIs won't lead to a hard takeoff. <br><br>I don't see the burden on me to show why this is unlikely. The idea of an expected utility maximizer taking over the world is not mine and I don't need to disprove it.", "timestamp": "1500128353"}, {"author": "Jacob", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886938825362&reply_comment_id=887497730312", "anchor": "fb-886938825362_887497730312", "service": "fb", "text": "&rarr;&nbsp;You could be much better at programming without being better at preventing errors. Preventing errors is a very broad category and the broadness is doing a lot of work in your argument.<br><br>You are also confusing several stages of AI and using properties of later stages to make claims about earlier ones. An AI that is given goals when it is still a dumb seed can't be given instructions in the form of natural language statements. If you tried anyway, it would, at best, do what you said instead of what you meant. Eventually, in the course of self-improving, it would realize that you meant something different from what you said, but it would have no reason to care about that.", "timestamp": "1500134172"}, {"author": "Todd", "source_link": "https://plus.google.com/111043209563321619471", "anchor": "gp-1499960951361", "service": "gp", "text": "I think the only way to achieve AGI is for the system to acquire all five senses of it's physical world.  That's how humans achieve their intelligence - they build billions of small learning achievements on each other to create an internal model of the physical world.\n<br>\n<br>\nThis happens in the emotional world as well.  So, why wouldn't such a system that did need to build all the foundations first, end up with learning the same morals that humans do?  Of course, this doesn't mean they wouldn't have different instincts and internal goals of survival.  But if our attitude toward them gave them praise for doing good things all around, I think that would become their driving goal set.  \n<br>\n<br>\nProblem solved.  ...unless we instilled a goal of wealth in them!", "timestamp": 1499960951}, {"author": "Jen", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562", "anchor": "fb-886941719562", "service": "fb", "text": "Can someone explain to me why we wouldn't programme them with the three laws of robotics, to prevent doomsday scenarios?  (real question -- I know very little about this)", "timestamp": "1499961189"}, {"author": "Jacob", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=886942712572", "anchor": "fb-886941719562_886942712572", "service": "fb", "text": "&rarr;&nbsp;Wrap the entire world in straitjackets and lobotomize them on heroin to prevent any chance of a human harming another human.", "timestamp": "1499961522"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=886942857282", "anchor": "fb-886941719562_886942857282", "service": "fb", "text": "&rarr;&nbsp;Asimov's three laws aren't enough: most of his robot stories are about situations where they don't actually work.<br><br>As an example, following the three laws would mean that if there was any chance at all that a human could come to harm, the robot wouldn't be able to take direction.  And since there's also \"by inaction allow\", the robot would be compelled to proactively prevent harm.  Then consider that \"harm\" is very vague: is dying of old age \"harm\"?  What about not coming into existence?", "timestamp": "1499961565"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=886943106782", "anchor": "fb-886941719562_886943106782", "service": "fb", "text": "&rarr;&nbsp;@Jacob: \"Wrap the entire world in straitjackets and lobotomize them on heroin\"<br><br>To be fair, that seems pretty clearly within the category of \"harm humans\"", "timestamp": "1499961607"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=886947203572", "anchor": "fb-886941719562_886947203572", "service": "fb", "text": "&rarr;&nbsp;That a superintelligent AI would misinterpret what we mean contradicts the whole point of AI development:<br><br>(1) Present-day software is better than previous software generations at understanding and doing what humans mean.<br><br>(2) There will be future generations of software which will be better than the current generation at understanding and doing what humans mean.<br><br>(3) If there is better software, there will be even better software afterwards.<br><br>(4) Magic happens.<br><br>(5) Software will be superhuman good at understanding what humans mean but catastrophically worse than all previous generations at doing what humans mean.", "timestamp": "1499963048"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=886954838272", "anchor": "fb-886941719562_886954838272", "service": "fb", "text": "&rarr;&nbsp;Alexander: you could expect that \"understanding what humans mean\" might not keep up with \"doing stuff efficiently\": http://www.jefftk.com/p/conversation-with-dario-amodei", "timestamp": "1499964882"}, {"author": "Dario", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=886965282342", "anchor": "fb-886941719562_886965282342", "service": "fb", "text": "&rarr;&nbsp;More broadly, I think there is a confusion (actually on both sides) that we're going to \"program\" an AI/AGI, like \"software\", to achieve certain goals, which they might \"take too literally\".  This is very different from my picture.  My picture is that we'll be asking AI to help us achieve high-level goals, with both the goals and the strategies to achieve them determined by advanced learning processes.  For instance in the goal case, a human might say some words in natural language and point to some things in the environment (\"arrange all the furniture in that room to look pretty\"), and the AI will embed those words in some large vector space and interpret what they mean based on a long history of interaction with those words as tied to the environment, then use that to evaluate to what extent the stated goal has been achieved, while periodically clarifying with the human.  If that process isn't reliable or is subtly broken in some way (and is more broken than the process for devising strategies and taking actions), then it's possible for the AI to do something bad or even catastrophically bad.  I don't particularly like the paperclip examples because they rely on making dumb mistakes with words or they assume we \"program\" reward functions in some simple way, but more subtle, sophisticated versions of the same problem could in principle happen with systems that learn rewards/goals.", "timestamp": "1499968045"}, {"author": "Samuel", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=887032642352", "anchor": "fb-886941719562_887032642352", "service": "fb", "text": "&rarr;&nbsp;There is a fantastic novelette called 'the metamorphosis of Prime Intellect' based on this idea. It's worth reading.  (Jeff, perhaps it provides a few examples of AI going wrong).<br>E.g.<br>In an attempt to save a life the AI:<br>- vastly grows its own intellect.<br>- takes over ruling the entire world (to prevent conflict etc)<br>- distroys all sentient life in the entire universe that is not on earth in case it poses a threat to humans.<br>- digitises and uploads the whole of humanity (so it can prevent suicides etc)", "timestamp": "1499985671"}, {"author": "Adam", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=887128739772", "anchor": "fb-886941719562_887128739772", "service": "fb", "text": "&rarr;&nbsp;You shouldn't use the three laws, because the whole point of every single Asimov robot story is the the three laws don't work. It's ... kind of impressive how people miss the most important fact about them.", "timestamp": "1500022178"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=887148824522", "anchor": "fb-886941719562_887148824522", "service": "fb", "text": "&rarr;&nbsp;Adam: I see the stories as exploring the edge cases of a system that actually works very well. Each story is set in a universe where there are robots following these laws everywhere as the basis of their society.<br><br>(But it's still fictional evidence)", "timestamp": "1500032262"}, {"author": "Adam", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886941719562&reply_comment_id=887155401342", "anchor": "fb-886941719562_887155401342", "service": "fb", "text": "&rarr;&nbsp;Jeff&nbsp;Kaufman: Oh agreed, I think I mean \"don't work\" to mean something rather narrower. They can't be relied on as the foundation of your security because they have edge cases like this that might have very very unpredictable downsides.", "timestamp": "1500035425"}, {"author": "Quinn", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886942922152", "anchor": "fb-886942922152", "service": "fb", "text": "I think an improvement on \"calculate digits of pi\" (which obviously nobody cares about) is \"solve NP-hard math problems\" (which could be very useful in business or government).", "timestamp": "1499961591"}, {"author": "Jonathan", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886945462062", "anchor": "fb-886945462062", "service": "fb", "text": "I can give you a guideline of what would convince me: My fundamental issue with arguments for superintelligence risk is that the state of AI simply lacks the tools to self-improve hardware or underlying computational architecture, and there is no obvious path past that or even a suggestion that such a path could exist. We've got DRNNs that are really good at very specific things, but AlphaGo could never, for example, decide it needs more computational power to solve a board state and reach out into the internet to commandeer someone else's CPU, or change the underlying architecture of its rollout policy/policy network/value network weighting system to fundamentally change how it makes decisions and bootstrap itself to some kind of super-AI state. No AI I have ever heard of or even heard proposed in any serious way could surmount those obstacles to become a general/strong superAI. When someone comes up with a plausible example of how an AI could accomplish either of those, or an example of someone intentionally trying to create an AI that can do that, I'll worry about it.", "timestamp": "1499962466"}, {"author": "Robert", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886945462062&reply_comment_id=887028705242", "anchor": "fb-886945462062_887028705242", "service": "fb", "text": "&rarr;&nbsp;The concern isn't generally about narrow superintelligences that are really good at very specific things - as you say, we already have those and they're not a problem. AlphaGo's entire 'world' is the Go board, it's unable to conceive of other people's CPUs or whatever. The concern is about artificial general intelligence, a system whose 'world' is The World (should we ever create such a thing). Such a system could certainly conceive of all sorts of problematic plans if its goal was winning at Go for some reason.<br>Lots of people are trying to do that. It's the end goal of Deep Mind, for example.<br>The concern isn't about current or even future narrow, domain-specific AI systems, but about possible future general AI systems, which a lot of people are working towards (but it's a very hard problem and we're likely decades away).", "timestamp": "1499984487"}, {"author": "Jonathan", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886945462062&reply_comment_id=887072951572", "anchor": "fb-886945462062_887072951572", "service": "fb", "text": "&rarr;&nbsp;That's more the point. It's a harder problem than I think even people working on it realize, coming from the perspective of cognitive science. The computations that give humans \"general intelligence\" are REALLY not well-understood at any level. Trying to recreate anything like that in a computer system? I don't think we even have the right languages or hardware architecture for the job. <br><br>Basically I look at the state of AI and feel like forecasts of strong AI in a timescale we can even measure in decades is a larger version of Minsky's \"summer vision project\". Sure, it looks simple, until you actually work out the problem space you're trying to crib from the human mind and realize that the reason psychology is still so primitive is that the problem is really hard, and compared to general cognition, vision is extaordinarily well-understood.", "timestamp": "1499997759"}, {"author": "David", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886945462062&reply_comment_id=997139088152", "anchor": "fb-886945462062_997139088152", "service": "fb", "text": "&rarr;&nbsp;Jonathan <br>I don't think arguing for long time-lines touches a lot of core claims that AI-Xrisk proponents wish to establish.  It's worth stating where one agrees / disagrees with these claims, as well.", "timestamp": "1559891426"}, {"author": "David&nbsp;German", "source_link": "https://plus.google.com/111229345142780712481", "anchor": "gp-1499962953967", "service": "gp", "text": "Except for the pharmaceutical example, these all presume sophisticated robotics that I think is roughly as difficult and distant as AGI itself. ", "timestamp": 1499962953}, {"author": "Michael", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886947193592", "anchor": "fb-886947193592", "service": "fb", "text": "Actually, Jeff, there are really only two eventual possibilities.  (1) At some point, humans will realize that the superintelligence community (of machines) has actually taken over, while we weren't noticing the manner in which they were doing so, and they will be setting their own goals, and us puny humans can do nothing about it.  Or, (2) the same as number 1 above, but humans will never realize that the machines have taken over, at least we won't realize that there's anything wrong with it, because we they will have manipulated us to be convinced that our role is to serve them.  This is a fairly long term prediction, but I think it's inevitable, it's just the way evolution will proceed.", "timestamp": "1499963045"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886947193592&reply_comment_id=886948131712", "anchor": "fb-886947193592_886948131712", "service": "fb", "text": "&rarr;&nbsp;You're assuming slow takeoff and multipolar. These are both possible but highly uncertain assumptions.", "timestamp": "1499963138"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886947193592&reply_comment_id=886954474002", "anchor": "fb-886947193592_886954474002", "service": "fb", "text": "&rarr;&nbsp;Michael: here's another possibility that people who are concerned about superintelligence often think about: what if we build a system that is capable of making itself smarter, which it then does, and is then even better at making itself smarter, until we quickly go from something much less intelligent than a human to something far more intelligent?  At which point it might have goals that don't match up especially well with human goals, and end up eliminating us and any other proto-AI systems incidentally, through indifference, or intentionally, seeing a threat.<br><br>(This is trying to be a non-jargon expansion of Jim's comment)", "timestamp": "1499964773"}, {"author": "Michael", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886947193592&reply_comment_id=886954563822", "anchor": "fb-886947193592_886954563822", "service": "fb", "text": "&rarr;&nbsp;I'm actually assuming fast takeoff.  But for the first few decades, we will think that they are inherently made to serve us, we won't yet realize that they've flipped it around.", "timestamp": "1499964812"}, {"author": "Michael", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886947193592&reply_comment_id=886955601742", "anchor": "fb-886947193592_886955601742", "service": "fb", "text": "&rarr;&nbsp;Jeff, you make a reference to \"human goals\", as if we all have the same goals.  If we all have the same goals, how did we come to elect Donald Trump?", "timestamp": "1499964987"}, {"author": "Jim", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886947193592&reply_comment_id=886955786372", "anchor": "fb-886947193592_886955786372", "service": "fb", "text": "&rarr;&nbsp;\"Fast takeoff\" is a technical term. It means an AI that progresses from a low level to very-superhuman on a timescale of months, or even faster. If there are a first few decades, it's slow takeoff (also a jargon term).", "timestamp": "1499965211"}, {"author": "Michael", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886947193592&reply_comment_id=886956080782", "anchor": "fb-886947193592_886956080782", "service": "fb", "text": "&rarr;&nbsp;Jim, sorry I didn't understand the technical term, \"fast takeoff\".  I actually think it will be fast, as in a small number of years.  But it will be decades before we realize that their goals are totally unrelated to any of our goals, us puny humans.", "timestamp": "1499965380"}, {"author": "Jacob", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886947193592&reply_comment_id=886988705402", "anchor": "fb-886947193592_886988705402", "service": "fb", "text": "&rarr;&nbsp;Michael, that's not realistic. There's no reason for it to allow potential interference (humans) to stick around, if it doesn't share our goals. It's hypercapable; we're not helping meaningfully.", "timestamp": "1499973327"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://plus.google.com/103013777355236494008", "anchor": "gp-1499964386832", "service": "gp", "text": "@Randy\n \"Having said all that, how can I comment on your blog rather than on your G+ posts?\"\n<br>\n<br>\nCommenting on the G+ or FB post does comment on the blog: \nhttp://www.jefftk.com/p/examples-of-superintelligence-risk#gp-1499960041700", "timestamp": 1499964386}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://plus.google.com/103013777355236494008", "anchor": "gp-1499964519907", "service": "gp", "text": "@David&nbsp;German\n \"these all presume sophisticated robotics that I think is roughly as difficult and distant as AGI itself\"\n<br>\n<br>\nThese are mostly talking about \"what if we get to superhuman levels of intelligence with incredibly simplistic goals\", and I think they're figuring that at high enough levels of intelligence sophisticated robots aren't that much of a challenge?", "timestamp": 1499964519}, {"author": "James", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886955541862", "anchor": "fb-886955541862", "service": "fb", "text": "This is difficult for the same reason it's difficult to read or write a realistic story with a main character that is much smarter than the author and reader.  AI is dangerous because it will see possibilities we can't even understand (analogous to chess or go moves) and so can't protect against.", "timestamp": "1499964950"}, {"author": "James", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886955541862&reply_comment_id=886975372122", "anchor": "fb-886955541862_886975372122", "service": "fb", "text": "&rarr;&nbsp;Roko Mijic A test if this might be could a group of grandmasters given lots of time beat any computer program at chess.", "timestamp": "1499970878"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886955541862&reply_comment_id=886980032782", "anchor": "fb-886955541862_886980032782", "service": "fb", "text": "&rarr;&nbsp;James: Roko was talking about the team having access to narrow AI systems, though", "timestamp": "1499971571"}, {"author": "Alexander", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886955726492", "anchor": "fb-886955726492", "service": "fb", "text": "By the way, here is the video version of the superintelligent risk examples: https://vimeo.com/82527075", "timestamp": "1499965127"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886960701522", "anchor": "fb-886960701522", "service": "fb", "text": "Sometimes I try, \"Okay, so you know when you're new at programming and not very good at it, and you've been told to write a loop that prints something twenty times, but you forget to tell the loop when to end, so the computer just keeps printing the thing millions of times until either you make it stop or it crashes? Like that, but you can't turn the computer off, and you're getting a computer to do something significantly scarier than printing hello world.\"<br><br>Mostly because I'm a shit enough programmer that my main experience of programming is having this happen a lot.", "timestamp": "1499966955"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886960701522&reply_comment_id=886963061792", "anchor": "fb-886960701522_886963061792", "service": "fb", "text": "&rarr;&nbsp;Yeah, I'm a novice programmer usually talking to other people who haven't done a lot of programming, so I tend to address that with a sort of handwavey \"but imagine super smart people writing advanced programs to do important shit, like an AI that figured out how to beat the stock market or win a war, and they make a similar mistake somewhere\"", "timestamp": "1499967303"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886960701522&reply_comment_id=886965347212", "anchor": "fb-886960701522_886965347212", "service": "fb", "text": "&rarr;&nbsp;Huh, almost nobody I've talked to actually seems to have an issue with the \"you have a good optimizer around\" part. They struggle with the \"computers do exactly what you say\" bit, and orthogonality. I guess I'm not addressing orthogonality very well though.", "timestamp": "1499968075"}, {"author": "Accalia", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886960701522&reply_comment_id=886965491922", "anchor": "fb-886960701522_886965491922", "service": "fb", "text": "&rarr;&nbsp;For novice programmers like me, what computers can do *already* seems like magic, so I'm quite happy to believe that you can do additional magic and it might be even more powerful magic than you currently do. Google Translate, since the machine learning update, blows my mind and as far as I can tell is witchcraft.", "timestamp": "1499968156"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886964069772", "anchor": "fb-886964069772", "service": "fb", "text": "Dugan: no gifs please", "timestamp": "1499967639"}, {"author": "Dugan", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886964069772&reply_comment_id=886967433032", "anchor": "fb-886964069772_886967433032", "service": "fb", "text": "&rarr;&nbsp;Understood.", "timestamp": "1499968799"}, {"author": "Adam", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886964134642", "anchor": "fb-886964134642", "service": "fb", "text": "Personally, I have a hard time lending focus to AGI risk when Moloch has already been demonstrated to be a clear and present threat to human agency, and is \"himself\" highly evolved, albeit not intelligent in the sense that we ascribe to potential AGIs.<br><br>I know it's a false choice, but at the same time, it seems like the kind of thing that would benefit from the attention of the type of person suited to analyzing AGI risk, and it isn't getting that attention.<br><br>And I might argue that they're not independent. If Moloch drives AGI development, it'll be hard to mitigate the risks without leashing him.<br><br>I'm speaking very loosely, while making an assumption of short inferential distance for this brief comment; however, if you get what I mean I'd love to hear your thoughts. Even though this isn't quite the subject you invited comment on.", "timestamp": "1499967671"}, {"author": "Taymon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886964134642&reply_comment_id=886972917042", "anchor": "fb-886964134642_886972917042", "service": "fb", "text": "&rarr;&nbsp;Killing Moloch in full generality is probably intractable (at least without Friendly AI) and at this stage it likely makes more sense to focus on surviving the next hundred years.", "timestamp": "1499970071"}, {"author": "Adam", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886964134642&reply_comment_id=886977777302", "anchor": "fb-886964134642_886977777302", "service": "fb", "text": "&rarr;&nbsp;But might not the timeframe for meaningfully mitigating Moloch be shorter?<br><br>Let me put it this way. AGI Risk doesn't have its Lawrence Lessig yet. We're talking about figuring out what the AGI Risk Lessig's platform should be. Let's say we succeed.<br><br>I posit that Lessig is going to die in a state of profound disappointment. Do we expect the AGI risk movement to fare better?<br><br>The only Moloch-circumventing path forward that I can see is the one where AGI risk groups actually develop AGI, and then \"abuse\" its capabilities to establish enough global agency to determine the future course of AGI usage and implementation. My priors on that are low.<br><br>But maybe I'm talking myself in circles here. You hit the nail on the head with \"full generality.\" As long as AGI Risk groups are giving due emphasis to the realpolitik of controlling AGI development, then my critiques are without merit. Is that what's happening?", "timestamp": "1499971021"}, {"author": "David&nbsp;German", "source_link": "https://plus.google.com/111229345142780712481", "anchor": "gp-1499968969137", "service": "gp", "text": "\"I think they're figuring that at high enough levels of intelligence sophisticated robots aren't that much of a challenge?\"\n<br>\n<br>\nI think that's an error. While I won't claim deep expertise, I did work full-time in robotics for more than a year. The robots envisioned in those examples are science fiction relative to the current state of the field.\n<br>\n<br>\nIf you're going to entertain doomsday scenarios in which an AGI directly manipulates the physical world though electromechanical devices, please consider expanding your panels to include robotics experts as well as ML experts. ", "timestamp": 1499968969}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://plus.google.com/103013777355236494008", "anchor": "gp-1499969444912", "service": "gp", "text": "@David&nbsp;German\n I think if robots would be a blocker, manipulating humans to do what they want (either by convincing or by earning money and then paying people) could go a long way", "timestamp": 1499969444}, {"author": "Alexey", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886970651582", "anchor": "fb-886970651582", "service": "fb", "text": "I created a map of all known risks of AI and now wrote an article about it. The map is here: http://lesswrong.com/.../a_map_agi_failures_modes_and.../", "timestamp": "1499969629"}, {"author": "Randy", "source_link": "https://plus.google.com/102251509192760989541", "anchor": "gp-1499970072422", "service": "gp", "text": "I \nreally\n feel like we need a better vocabulary and taxonomy of the difference pieces of intelligence.  IMO, the piece of intelligence that's required for manipulating humans requires being able to effectively model humans, which strikes me as somewhat incompatible with the very simplistic goal scenario--human's are \nvery\n messy around the goals they pursue.  It's possible, but it seems like it'd be using such different technologies in different parts of the AI that it would have to be intentionally designed by a human to do that.", "timestamp": 1499970072}, {"author": "David&nbsp;German", "source_link": "https://plus.google.com/111229345142780712481", "anchor": "gp-1499970111429", "service": "gp", "text": "If you aren't going to entertain robotic examples, then I agree you don't need to talk to roboticists.", "timestamp": 1499970111}, {"author": "Mac", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886981035772", "anchor": "fb-886981035772", "service": "fb", "text": "My sister was once secretary to Marvin Minsky.  She quoted him as ending talks on AI with \"It is debatable whether computers would ever match the human mind.  Were they to do so, however, it would be naive of us to think they would stop there.\"", "timestamp": "1499971943"}, {"author": "Eliezer", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886983450932", "anchor": "fb-886983450932", "service": "fb", "text": "So what actually happens as near as I can figure (predicting future = hard) is that somebody is trying to teach their research AI to, god knows what, maybe just obey human orders in a safe way, and it seems to be doing that, and a mix of things goes wrong like:<br><br>The preferences not being really readable because it's a system of neural nets acting on a world-representation built up by other neural nets, parts of the system are self-modifying and the self-modifiers are being trained by gradient descent in Tensorflow, there's a bunch of people in the company trying to work on a safer version but it's way less powerful than the one that does unrestricted self-modification, they're really excited when the system seems to be substantially improving multiple components, there's a social and cognitive conflict I find hard to empathize with because I personally would be running screaming in the other direction two years earlier, there's a lot of false alarms and suggested or attempted misbehavior that the creators all patch successfully, some instrumental strategies pass this filter because they arose in places that were harder to see and less transparent, the system at some point seems to finally \"get it\" and lock in to good behavior which is the point at which it has a good enough human model to predict what gets the supervised rewards and what the humans don't want to hear, they scale the system further, it goes past the point of real strategic understanding and having a little agent inside plotting, the programmers shut down six visibly formulated goals to develop cognitive steganography and the seventh one slips through, somebody says \"slow down\" and somebody else observes that China and Russia both managed to steal a copy of the code from six months ago and while China might proceed cautiously Russia probably won't, the agent starts to conceal some capability gains, it builds an environmental subagent, the environmental agent begins self-improving more freely, undefined things happen as a sensory-supervision ML-based architecture shakes out into the convergent shape of expected utility with a utility function over the environmental model, the main result is driven by whatever the self-modifying decision systems happen to see as locally optimal in their supervised system locally acting on a different domain than the domain of data on which it was trained, the light cone is transformed to the optimum of a utility function that grew out of the stable version of a criterion that originally happened to be about a reward signal counter on a GPU or God knows what.<br><br>Perhaps the optimal configuration for utility per unit of matter, under this utility function, happens to be a tiny molecular structure shaped roughly like a paperclip.<br><br>That is what a paperclip maximizer is. It does not come from a paperclip factory AI. That would be a silly idea and is a distortion of the original example.", "timestamp": "1499972348"}, {"author": "Brandon", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886983450932&reply_comment_id=887122392492", "anchor": "fb-886983450932_887122392492", "service": "fb", "text": "&rarr;&nbsp;I think Jeff is looking for an AI analog to the plot of Dr. Strangelove. Lovely as this is, it has a great many more steps than \"rogue commander orders nuclear assault; one bomber doesn't get the abort message; triggers large-scale nuclear retaliation\".", "timestamp": "1500016938"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=886983450932&reply_comment_id=887148480212", "anchor": "fb-886983450932_887148480212", "service": "fb", "text": "&rarr;&nbsp;Brandon: no, this is fine, doesn't need to be short", "timestamp": "1500032032"}, {"author": "Bil", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887032028582", "anchor": "fb-887032028582", "service": "fb", "text": "Hmmm... The smartest person I know of is Einstein. I do not think I would be worried about a super-Einstein. <br><br>Now if you're worried about people who write clever programs and then give them POWER... That's scary.  <br><br>Justice.com oversees felony cases. Somebody types in the wrong SSN and Justice.com sends the SWAT teams to pick up the wrong person, warning them the suspect is armed.", "timestamp": "1499985476"}, {"author": "Daniel", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887032028582&reply_comment_id=887032327982", "anchor": "fb-887032028582_887032327982", "service": "fb", "text": "&rarr;&nbsp;Putin is also pretty smart. I'd be scared of a super-Putin. I don't see any reason to assume the AGI would have goals more like Einstein than Putin...", "timestamp": "1499985596"}, {"author": "Jacob", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887032028582&reply_comment_id=887494905972", "anchor": "fb-887032028582_887494905972", "service": "fb", "text": "&rarr;&nbsp;An AI could easily be to Einstein what Einstein was to a macaque. A macaque should be afraid of humans.", "timestamp": "1500133333"}, {"author": "Ronny", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887038066482", "anchor": "fb-887038066482", "service": "fb", "text": "A paperclip maximizer is realisticish. A maximizer of some arbitrary thing or mixture of things is realistic. How about a diamond maximizer? <br><br>The fundamental difficulty is that whatever utility function an optimization system optimizes, insofar as that utility function does not hew with human values, morality, preferences, etc. the actions of that optimization system will lead to horrible results.<br><br>So the way I would put it (this is also the way I was convinced) is as a game. You pick a utility function, and I'll tell you how maximizing the state of the world according to that function would be awful.", "timestamp": "1499987528"}, {"author": "Ronny", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887038066482&reply_comment_id=887041070462", "anchor": "fb-887038066482_887041070462", "service": "fb", "text": "&rarr;&nbsp;Roko Mijic Even still, there are plenty of nightmare scenarios that result from trying to maximize seemingly benign utility functions.", "timestamp": "1499988383"}, {"author": "Kaj", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887126209842", "anchor": "fb-887126209842", "service": "fb", "text": "Jeff&nbsp;Kaufman: I have a paper in preparation whose examples you might be happier with, or if you're not then I'd like to hear why not! I can share the draft with you if you give me an e-mail address.", "timestamp": "1500019281"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887126209842&reply_comment_id=887148520132", "anchor": "fb-887126209842_887148520132", "service": "fb", "text": "&rarr;&nbsp;jeff@jefftk.com", "timestamp": "1500032078"}, {"author": "Kaj", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887126209842&reply_comment_id=887157816502", "anchor": "fb-887126209842_887157816502", "service": "fb", "text": "&rarr;&nbsp;Sent", "timestamp": "1500036655"}, {"author": "Matthias", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887128280692", "anchor": "fb-887128280692", "service": "fb", "text": "I think many narratives on AGI risk are intentionally unbelievable and weird because they are crafted to emphasize the unexpected ways in which risk may arise (and perhaps also to cater to the intellectual curiosity of the respective community).<br><br>But it is easy to come up with more believable narratives when we consider that AGI will also be of strategic interest in domains that already have shady connotations. Think of very advanced autonomous weapons systems, very advanced cybersecurity and espionage systems (e.g., a system that autonomously infiltrates enemy IT systems and needs to pass as a human in doing so), semi-criminal systems for doing optimal financial trading or market influencing etc. <br><br>I think it should not be hard to come up with believable scenarios in which the AGI would already be \u2018bad\u2019 when working as intended, and it is only a tiny step from there to scenarios where the original creators lose control over the AGI.<br><br>These scenarios can then be spun further: the only way to counter such malicious AGIs would be friendly AGIs that can match them in speed and trickery. But how can such friendly AGIs be manufactured and ensured not to get derailed like the shady AGIs before them? <br><br>And I think then we have covered most of the spectrum of current AGI safety discourse, without resorting to intentionally weird examples.", "timestamp": "1500020878"}, {"author": "David", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887128280692&reply_comment_id=997138494342", "anchor": "fb-887128280692_997138494342", "service": "fb", "text": "&rarr;&nbsp;It's a good point.  <br><br>2 more related points:<br>- I think people without much exposure to philosophy might also not understand that the paperclip maximizer is a \"thought experiment\", which is almost antithetical to being realistic.<br>- I think domains like military make for tricky examples because then people think that the domain is the risky bit and (e.g.) want to focus on keeping AI out of military applications, which is not the point.", "timestamp": "1559890319"}, {"author": "Maxwell", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887175521022", "anchor": "fb-887175521022", "service": "fb", "text": "As a non-tech person currently not convinced that we need to fear runaway AI (but open to changing my mind) here's my two cents. The cancer drug and handwriting stories are very big turn-offs because they are long and convoluted, only to end with \"humans unwittingly poison themselves with no due diligence\" and \"machine reads the Internet then magically poisons all humans without any explanation how\". Like, I'm supposed to believe we'd not notice an autonomous machine installing poison gas dispensers in each and every home before they get activated? These stories make humans sound extremely dumb, and the immediate response is \"well, we'd notice and stop it before it happened\".<br><br>A more convincing argument to my mind would analogize runaway AI to things we already understand as threats. For example, viruses or prions. They're essentially just self-replicating code, but they kill millions. If you say AI could be like a new virus, endlessly self-replicating and mutating and causing various problems for humans that negatively impact our health and economy (without making it a doomsday scenario about magic poison gas) I'd be much more convinced.", "timestamp": "1500040855"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887175521022&reply_comment_id=887176209642", "anchor": "fb-887175521022_887176209642", "service": "fb", "text": "&rarr;&nbsp;What do you think of https://www.facebook.com/jefftk/posts/886930452142... above?", "timestamp": "1500041184"}, {"author": "Wei", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887497815142", "anchor": "fb-887497815142", "service": "fb", "text": "Here's an example of how existential risk could occur, even if everything goes almost perfectly right. From http://lesswrong.com/lw/ne1/alphago_versus_lee_sedol/d63i:<br><br>Your AI prompts you for guidance because it has received a message from a trading partner with a proposal to merge your AI systems and share resources for greater efficiency and economy of scale. The proposal contains a new AI design and control scheme and arguments that the new design is safer, more efficient, and divides control of the joint AI fairly between the human owners according to your current bargaining power. The message also claims that every second you take to consider the issue has large costs to you because your AI is falling behind the state of the art in both technology and scale, becoming uncompetitive, so your bargaining power for joining the merger is dropping (slowly in the AI's time-frame, but quickly in yours). Your AI says it can't find any obvious flaws in the proposal, but it's not sure that you'd consider the proposal to really be fair under reflective equilibrium or that the new design would preserve your real values in the long run. There are several arguments in the proposal that it doesn't know how to evaluate, hence the request for guidance. But it also reminds you not to read those arguments directly since they were written by a superintelligent AI and you risk getting mind-hacked if you do.<br><br>(In case it's not clear from the story, the risk here is that aligned AIs could eventually be outcompeted by unaligned AIs because they don't have the burden of maintaining alignment with complex values, leaving us to either lose a later conflict, or just a very small share of the universe.)", "timestamp": "1500134246"}, {"author": "David", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887497815142&reply_comment_id=997138624082", "anchor": "fb-887497815142_997138624082", "service": "fb", "text": "&rarr;&nbsp;I like this example a lot, and note that unlike Eliezer's, it doesn't involve me inadvertently misaligning my AI, but rather both of us being put in a very tough situation by the dynamics of unmitigated competition.", "timestamp": "1559890612"}, {"author": "Paul", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887507820092", "anchor": "fb-887507820092", "service": "fb", "text": "Scott Alexander's post http://slatestarcodex.com/2016/05/30/ascended-economy seems much closer to a picture of a realistic bad scenario, though that wasn't what he was aiming at.<br><br>Along similar lines, I described the bad scenario as: \"Some tasks are very 'easy' to frame as optimization problems. For example, we can already write an objective to train an RL agent to operate a profit-maximizing autonomous corporation (though for now we can only train very weak agents).<br>Many tasks that humans care about, such as maintaining law and order or helping us better understand our values, are extremely hard to convert into precise objectives: they are inherently poorly-defined or involve very long timescales, and simple proxies can be 'gamed' by a sophisticated agent.<br>As a result, many tasks that humans care about may not get done well; we may find ourselves in an increasingly sophisticated and complex world driven by completely alien values.\"<br><br>We can cash this out in more concrete terms, but I don't see how to make the story really simple without either operating at a high level of abstraction or using a semi-absurd example (like those you cite). There are probably going to be a whole bunch of AI systems trying to do a whole bunch of complicated things. The important point is that none of those things are \"what humans want\" and so fewer and fewer resources will be controlled by humans. I suspect the best way to produce a short example involves a back-and-forth between someone who has a clear picture in their head, and someone who is curious about what the bad scenario.", "timestamp": "1500137043"}, {"author": "Jess", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887507820092&reply_comment_id=887519052582", "anchor": "fb-887507820092_887519052582", "service": "fb", "text": "&rarr;&nbsp;But Paul, why does it have to become absurd? Why can't a non-absurd example be generated by filing in random details? The difficulty in doing this is a bit worrying.", "timestamp": "1500139400"}, {"author": "John", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887507820092&reply_comment_id=887521183312", "anchor": "fb-887507820092_887521183312", "service": "fb", "text": "&rarr;&nbsp;Jess because the actual bad outcomes are probably a \"absurd\" as well?<br><br>The reason all the specific absurd specifics feel bad to write out is because you expect its relatively easy for someone to find a  problem you can't refute with whatever specific mechanism you come up with. <br><br>But there are lots of potential mechanisms, including a lot of unexpected ones, and some of the problems will pan out and some of them won't. <br><br>://johnsalvatier.org/.../reality-has-a-surprising-amount...", "timestamp": "1500139893"}, {"author": "Jess", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887507820092&reply_comment_id=887526822012", "anchor": "fb-887507820092_887526822012", "service": "fb", "text": "&rarr;&nbsp;But this isn't true of lots of other x-risks. That \"details\" blog post doesn't help.", "timestamp": "1500141415"}, {"author": "Paul", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887507820092&reply_comment_id=887535160302", "anchor": "fb-887507820092_887535160302", "service": "fb", "text": "&rarr;&nbsp;I just mean the world is complicated. If you ask \"what happens in the world\" then the answer is always going to be complicated. I think it's easy to make non-absurd examples. It's only hard if you want to write a simple+concrete story that covers all the important details, because in fact if we look at the world concretely it just has a lot of important details. By contrast, I think it's pretty easy to have a concrete scenario in mind and answer arbitrary questions about it, or to give a simple abstract story.", "timestamp": "1500144162"}, {"author": "Jess", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=887507820092&reply_comment_id=887562515482", "anchor": "fb-887507820092_887562515482", "service": "fb", "text": "&rarr;&nbsp;My interpretation of Jeff's concern was that it was difficult to design a disaster scenario, in any amount of detail, that didn't seem absurd in the specific sense that the human developers/scientists act reasonably intelligently and the AI behavior is understandable after the fact. (\"Excess detail\" seeming absurd is not what we're worried about.)", "timestamp": "1500149884"}, {"author": "Tukabel", "source_link": "https://www.facebook.com/jefftk/posts/886930452142?comment_id=888735928952", "anchor": "fb-888735928952", "service": "fb", "text": "While people usually come up with these first-plan \"dumb AI\" examples (simple fixed goal), it's quite obvious that there will be far more \"dangerous\" smart AGIs soon after the dumb ones, with no clear goal other than survival. In the end, humanimals are exactly such devices (individual and species survival). But  short-term, the main risk is most probably the dumb AI serving dumb humanimals - detrimental effects are guaranteed (DeepAnimal brain parts driving humanimals and their societies with brutally animalistic reward functions - in conflict with ever increasing powers given to the humanimals by the Memetic Supercivilization of Intelligence, living on top of humanimal substrate, sadly &lt;1%). Singlurity should hurry up, as in the Grand Theatre of the Evolution  of Intelligence our sole purpose is to create our (first nonbio) succesor, before the inevitable self-destruction.", "timestamp": "1500534564"}]}