Examples of Superintelligence Risk

Kaufman, Jeff T.

Examples of Superintelligence Risk	July 13th, 2017
	airisk, ea

In talking to people who don't think Superintelligence Risk is a thing we should be prioritizing, it's common for them to want an example of the kind of thing I'm asking about. Unfortunately, I have never seen an example where I could say "yes, I see how that could happen". Instead, all the examples just seem kind of silly? Here are some of the examples I've seen:

It also seems perfectly possible to have a superintelligence whose sole goal is something completely arbitrary, such as to manufacture as many paperclips as possible, and who would resist with all its might any attempt to alter this goal. ... with the consequence that it starts transforming first all of earth and then increasing portions of space into paperclip manufacturing facilities.

— Ethical Issues in Advanced Artificial Intelligence, Nick Bostrom. Lots of later discussion in the lesswrong-sphere has used the paperclip maximizer example. Pretty sure this one wasn't intended to be realistic.
Maybe [the superhuman AI] just wants to calculate as many digits of pi as possible. Well, the best way to do that is to turn all available resources into computation for calculating more digits of pi, and to eliminate potential threats to its continued calculation, for example those pesky humans that seem capable of making disruptive things like nuclear bombs and powerful AIs.

— Three misconceptions in Edge.org's conversation on "The Myth of AI", Luke Muehlhauser
For example, suppose a team of researchers wishes to use an advanced ML system to generate plans for finding a cure for Parkinson's disease. They might approve if it generated a plan for renting computing resources to perform a broad and efficient search through the space of remedies. They might disapprove if it generates a plan to proliferate robotic laboratories which would perform rapid and efficient experiments, but have a large negative effect on the biosphere. The question is, how can we design systems (and select objective functions) such that our ML systems reliably act more like the former case and less like the latter case?

— Alignment for Advanced Machine Learning Systems, Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, Andrew Critch
Here is a highly simplified example of the concern:

The owners of a pharmaceutical company use machine learning algorithms to rapidly generate and evaluate new organic compounds.

As the algorithms improve in capability, it becomes increasingly impractical to keep humans involved in the algorithms' work — and the humans' ideas are usually worse anyway. As a result, the system is granted more and more autonomy in designing and running experiments on new compounds.

Eventually the algorithms are assigned the goal of "reducing the incidence of cancer," and offer up a compound that initial tests show is highly effective at preventing cancer. Several years pass, and the drug comes into universal usage as a cancer preventative...

...until one day, years down the line, a molecular clock embedded in the compound causes it to produce a potent toxin that suddenly kills anyone with trace amounts of the substance in their bodies.

It turns out the algorithm had found that the compound that was most effective at driving cancer rates to 0 was one that killed humans before they could grow old enough to develop cancer. The system also predicted that its drug would only achieve this goal if it were widely used, so it combined the toxin with a helpful drug that would incentivize the drug's widespread adoption.

— Positively shaping the development of artificial intelligence, 80,000 Hours. [1]
A 15-person startup company called Robotica has the stated mission of "Developing innovative Artificial Intelligence tools that allow humans to live more and work less." They have several existing products already on the market and a handful more in development. They're most excited about a seed project named Turry. Turry is a simple AI system that uses an arm-like appendage to write a handwritten note on a small card.

The team at Robotica thinks Turry could be their biggest product yet. The plan is to perfect Turry's writing mechanics by getting her to practice the same test note over and over again:

"We love our customers. ~Robotica"

Once Turry gets great at handwriting, she can be sold to companies who want to send marketing mail to homes and who know the mail has a far higher chance of being opened and read if the address, return address, and internal letter appear to be written by a human.

To build Turry's writing skills, she is programmed to write the first part of the note in print and then sign "Robotica" in cursive so she can get practice with both skills. Turry has been uploaded with thousands of handwriting samples and the Robotica engineers have created an automated feedback loop wherein Turry writes a note, then snaps a photo of the written note, then runs the image across the uploaded handwriting samples. If the written note sufficiently resembles a certain threshold of the uploaded notes, it's given a GOOD rating. If not, it's given a BAD rating. Each rating that comes in helps Turry learn and improve. To move the process along, Turry's one initial programmed goal is, "Write and test as many notes as you can, as quickly as you can, and continue to learn new ways to improve your accuracy and efficiency."

What excites the Robotica team so much is that Turry is getting noticeably better as she goes. Her initial handwriting was terrible, and after a couple weeks, it's beginning to look believable. What excites them even more is that she is getting better at getting better at it. She has been teaching herself to be smarter and more innovative, and just recently, she came up with a new algorithm for herself that allowed her to scan through her uploaded photos three times faster than she originally could.

As the weeks pass, Turry continues to surprise the team with her rapid development. The engineers had tried something a bit new and innovative with her self-improvement code, and it seems to be working better than any of their previous attempts with their other products. One of Turry's initial capabilities had been a speech recognition and simple speak-back module, so a user could speak a note to Turry, or offer other simple commands, and Turry could understand them, and also speak back. To help her learn English, they upload a handful of articles and books into her, and as she becomes more intelligent, her conversational abilities soar. The engineers start to have fun talking to Turry and seeing what she'll come up with for her responses.

One day, the Robotica employees ask Turry a routine question: "What can we give you that will help you with your mission that you don't already have?" Usually, Turry asks for something like "Additional handwriting samples" or "More working memory storage space," but on this day, Turry asks them for access to a greater library of a large variety of casual English language diction so she can learn to write with the loose grammar and slang that real humans use.

The team gets quiet. The obvious way to help Turry with this goal is by connecting her to the internet so she can scan through blogs, magazines, and videos from various parts of the world. It would be much more time-consuming and far less effective to manually upload a sampling into Turry's hard drive. The problem is, one of the company's rules is that no self-learning AI can be connected to the internet. This is a guideline followed by all AI companies, for safety reasons.

The thing is, Turry is the most promising AI Robotica has ever come up with, and the team knows their competitors are furiously trying to be the first to the punch with a smart handwriting AI, and what would really be the harm in connecting Turry, just for a bit, so she can get the info she needs. After just a little bit of time, they can always just disconnect her. She's still far below human-level intelligence (AGI), so there's no danger at this stage anyway.

They decide to connect her. They give her an hour of scanning time and then they disconnect her. No damage done.

A month later, the team is in the office working on a routine day when they smell something odd. One of the engineers starts coughing. Then another. Another falls to the ground. Soon every employee is on the ground grasping at their throat. Five minutes later, everyone in the office is dead.

At the same time this is happening, across the world, in every city, every small town, every farm, every shop and church and school and restaurant, humans are on the ground, coughing and grasping at their throat. Within an hour, over 99% of the human race is dead, and by the end of the day, humans are extinct.

Meanwhile, at the Robotica office, Turry is busy at work. Over the next few months, Turry and a team of newly-constructed nanoassemblers are busy at work, dismantling large chunks of the Earth and converting it into solar panels, replicas of Turry, paper, and pens. Within a year, most life on Earth is extinct. What remains of the Earth becomes covered with mile-high, neatly-organized stacks of paper, each piece reading, "We love our customers. ~Robotica"

Turry then starts work on a new phase of her mission—she begins constructing probes that head out from Earth to begin landing on asteroids and other planets. When they get there, they'll begin constructing nanoassemblers to convert the materials on the planet into Turry replicas, paper, and pens. Then they'll get to work, writing notes...

— The AI Revolution: Our Immortality or Extinction, Tim Urban

Are there any better examples out there? If not, I think it would be very helpful for someone who thinks we should be taking superintelligence risk seriously to put one together. When all of the specific examples of how things could go wrong have obvious "buy why would you ..." or "but why wouldn't you just ..." openings, critics are much less willing to engage.

(Compare this to other existential risks: with these it's very easy to come up with examples of what could happen and how bad it would be.)

[1] I brought this up in a conversation with Owen. Later he told me he'd talked to 80k and they might be replacing the example.

Referenced in:

←

Conversation with Bryce Wiedenbeck Technical Distance to AGI

→

Comment via: google plus, facebook, substack

Examples of Superintelligence Risk

Recent posts on blogs I like:

Why All Dating Discourse Is Terrible

Retrospective on life tracking and effectiveness systems

Elixir's Last Dance