clock menu more-arrow no yes mobile

Filed under:

A poetry-writing AI has just been unveiled. It’s ... pretty good.

You can try out OpenAI’s controversial language AI for yourself.

Javier Zarracina/Vox
Kelsey Piper is a senior writer at Future Perfect, Vox’s effective altruism-inspired section on the world’s biggest challenges. She explores wide-ranging topics like climate change, artificial intelligence, vaccine development, and factory farms, and also writes the Future Perfect newsletter.

This spring, the Elon-Musk-founded AI research lab OpenAI made a splash with an AI system that generates text. It can write convincing fake reviews, fake news articles, and even poetry.

Now the public has a chance to give it a try — at least, a limited version of it. Initially, the company had released an extremely restricted version of the system, citing concerns that it’d be abused. This month, OpenAI released a more powerful version (though still significantly limited compared to the whole thing). You can check it out for yourself.

The way it works is amazingly simple. A user gives the system, called GPT-2, a prompt — a few words, a snippet of text, a passage from an article, what have you. The system has been trained, on data drawn from the internet, to “predict” the next words of the passage — meaning the AI will turn your prompt into a news article, a short story, or a poem. (You can give the newest version of GPT-2 a try on a private site hosted by machine learning engineer Adam King.)

The results can be quite sophisticated. When I tested it, I fed GPT-2 the beginnings of stories about snowstorms in the Northwest, about college students, and about GPT-2 itself. The system then took it from there, inventing imaginary scientists to quote and imaginary organizations to cite (and it even enthused about the rapid progress of AI).

OpenAI initially decided not to release the full system to the public, out of fears it could be used by malicious actors to swamp us all with fake news. Instead, it released smaller and less capable versions — a staggered rollout that OpenAI hopes will allow researchers to explore the system and learn from it, while still keeping the potential risks at bay.

AI is getting more sophisticated — and that’s a big deal. It has the potential to assist us in tackling some of the biggest problems of our day, from drug development to clean energy. But researchers worry it can have unintended consequences, increase inequality, and, when systems get powerful enough, even pose real danger. We’re still figuring out how to balance AI’s benefits against its potential hazards.

People used to say AI couldn’t be creative. Now it can.

Even the smaller, less capable version of GPT-2 is powerful enough to compose interesting poetry and fiction, and it’s easy to see how the more powerful versions write such convincing fake news.

Here are some excerpts from poems that GPT-2 (the smallest public version) has written, thanks to Gwern Branwen, a researcher who trained the model to do poetry specifically by using a large corpus of poems for data.

In their little room with the door ajar

And the candle hanging on the wall ajar,

I have come across the word “Rise”

With a face as grave and flat as you please.

The one thing I remember of “Rise”

Is the way it makes you feel — so bad, so bad.

And I’ve come across many words to-night

That are so like “Rise” — so like — so vague, so vague.

”Elegance,” and “Artistic Vigour,”

But “Rise” is far above the rest,

And I cannot hear — or see — the word,

I will just stop here (I’ll stop if I can).

If you don’t know what “Rise” means, try.

Here’s another one:

And, ere the cloud of the tempest blew,

His soul was with the world at play.

He looked to the stars, and the stars smiled,

And the moon in the heaven looked;

And, as he looked, he beheld her light,

And all the heaven smiled with him.

When winds and tempests fly,

When floods and fires fail,

As their wake doth meadow and fen,

Tis the man-child’s heart that craves.

And I — I shall be bound,

With the hoary-headed, strong, old,

To earth, and the graves of the dead,

Whose feet are mowed down, as they lie;

And I shall rest my weary head,

In the silence of Eternity,

In the peaceful arms of God.

These are ... not bad! But that doesn’t mean the AI can really understand poetry, right? That’s mostly true — but it does depend how you think about it.

One explanation of how humans understand the world is that we build a web of associations between related concepts and ideas, an understanding that lets us predict what will happen next. That sounds eerily close to what GPT-2 is doing.

Of course, the system is fundamentally very limited — it just works with text, it gets less coherent as it goes on, and it frequently produces nonsensical silliness. But even within those limits, its output is fascinating. As AI systems get more sophisticated, it gets harder to say things like “only humans can be creative” or “only humans can truly understand things.”

We’re seeing the potential of “unsupervised” learning

We’ve made huge strides in natural language processing over the past decade. Translation has improved, becoming high quality enough that you can read news articles in other languages. Google demonstrated last summer that Google Assistant can make phone calls and book appointments while sounding just like a human (though the company promised it won’t use deceptive tactics in practice).

AI systems are seeing similarly impressive gains outside natural language processing. New techniques and more computing power have allowed researchers to invent photorealistic images, excel at two-player games like Go, and compete with the pros in strategy video games like Starcraft and DOTA.

But even for those of us who are used to seeing fast progress in this space, it’s hard not to be awed when playing with GPT-2.

Until now, researchers trying to get world-record results on language tasks would “fine-tune” their models to perform well on the specific task in question — that is, the AI would be trained for each task.

OpenAI’s GPT-2 needed no fine-tuning: It turned in a record-setting performance at lots of the core tasks we use to judge language AIs, without ever having seen those tasks before and without being specifically trained to handle them. It also started to demonstrate some talent for reading comprehension, summarization, and translation with no explicit training in those tasks.

GPT-2 is the result of an approach called “unsupervised learning.” Here’s what that means: The predominant approach in the industry today is “supervised learning.” That’s where you have large, carefully labeled data sets that contain desired inputs and desired outputs. You teach the AI how to produce the outputs given the inputs.

That can get great results, but it requires building huge data sets and carefully labeling each bit of data. And it’s worth noting that supervised learning isn’t how humans acquire skills and knowledge. We make inferences about the world without the carefully delineated examples from supervised learning.

Many people believe that advances in general AI capabilities will require advances in unsupervised learning — that is, where the AI just gets exposed to lots of data and has to figure out everything else by itself. Unsupervised learning is easier to scale since there’s lots more unstructured data than there is structured data, and unsupervised learning may generalize better across tasks.

Learning to read like a human

One task that OpenAI used to test the capabilities of GPT-2 is a famous test in machine learning known as the Winograd schema test. A Winograd schema is a sentence that’s grammatically ambiguous but not ambiguous to humans — because we have the context to interpret it.

For example, take the sentence: “The trophy doesn’t fit in the brown suitcase because it’s too big.”

To a human reader, it’s obvious that this means the trophy is too big, not that the suitcase is too big, because we know how objects fitting into other objects works. AI systems, though, struggle with questions like these.

Before this paper, state-of-the-art AIs that can solve Winograd schemas got them right 63.7 percent of the time, OpenAI says. (Humans almost never get them wrong.) GPT-2 gets these right 70.7 percent of the time. That’s still well short of human-level performance, but it’s a striking gain over what was previously possible.

GPT-2 set records on other language tasks, too. LAMBADA is a task that tests a computer’s ability to use context mentioned earlier in a story in order to complete a sentence. The previous best performance had 56.25 percent accuracy; GPT-2 achieved 63.24 percent accuracy. (Again, humans get these right more than 95 percent of the time, so AI hasn’t replaced us yet — but this is a substantial jump in capabilities.)

Sam Bowman, who works on natural language processing at NYU, explained over email why there’s some skepticism about these advances: “models like this can sometimes look deceptively good by just repeating the exact texts that they were trained on.” For example, it’s easy to have coherent paragraphs if you’re plagiarizing whole paragraphs from other sources.

But that’s not what’s going on here, according to Bowman: “This is set up in a way that it can’t really be doing that.” Since it selects one word at a time, it’s not plagiarizing.

Another skeptical perspective on AI advances like this one is that they don’t reflect “deep” advances in our understanding of computer systems, just shallow improvements that come from being able to use more data and more computing power. Critics argue that almost everything heralded as an AI advance is really just incremental progress from adding more computing power to existing approaches.

The team at OpenAI contested that. GPT-2 uses a newly invented neural network design called the Transformer, invented 18 months ago by researchers at Google Brain. Some of the gains in performance are certainly thanks to more data and more computing power, but they’re also driven by powerful recent innovations in the field — as we’d expect if AI as a field is improving on all fronts.

“It’s more data, more compute, cheaper compute, and architectural improvements — designed by researchers at Google about a year and a half ago,” OpenAI researcher Jeffrey Wu told me. “We just want to try everything and see where the actual results take us.”

By not releasing the system, OpenAI courted controversy

OpenAI’s announcement that they were restricting the release of the system produced mixed reactions — some people were supportive, others frustrated.

OpenAI has been active in trying to figure out how to limit the potential for misuse of AI, and it has concluded that in some cases, the right solution is limiting what it publishes.

With a tool like this, for example, it’d be easy to spoof Amazon reviews and pump out fake news articles in a fraction of the time a human would need. A slightly more sophisticated version might be good enough to let students generate plagiarized essays and spammers improve their messaging to targets.

“I’m worried about trolly 4chan actors generating arbitrarily large amounts of garbage opinion content that’s sexist and racist,” OpenAI policy director Jack Clark told me. He also worries about “actors who do stuff like disinformation, who are more sophisticated,” and points out that there might be other avenues for misuse we haven’t yet thought of. So OpenAI is keeping the most powerful versions of the tool offline for now, while everyone can weigh in on how to use AIs like these safely.

But critics feel that holding back the largest versions of the model wouldn’t reduce the risks much. “I’m confident that a single person working alone with enough compute resources could reproduce these results within a month or two (either a hobbyist with a lot of equipment and time, or more likely, researchers at a tech company),” Bowman wrote me. “Given that it is standard practice to make models public, this decision is only delaying the release of models like this by a short time.”

Other critics complained that staggering the release of the model really mostly serves to get OpenAI more publicity, achieved by raising seemingly unreasonable fears about what the model could do.

People point out that other AI labs have developed programs just as sophisticated and released them without an extended release process or calls for a conversation about safety. That’s true as far as it goes, but I think there’s a strong case that those other labs aren’t being cautious enough — and that they, too, should try to prompt a conversation about the downsides and dangers of their new inventions before unleashing them on the internet.

That’s not to say that all AI research should proceed in secret from here — or even that the larger GPT-2 models shouldn’t be released. So far, people haven’t been using GPT-2 for spam; they’ve been using it for poetry. As AI grows more sophisticated, figuring out how to enable the good uses without the bad ones will be one of our biggest challenges.


Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good.

Sign up for the newsletter Today, Explained

Understand the world with a daily explainer plus the most compelling stories of the day.