The Robots Are Coming! Oh, They’re Here.

Sportswriting has long played host to some of the most glorious, along with the most cliché-ridden, exercises in journalism. As anybody on a sports beat can tell you, making game stories interesting day in and day out can be a brutal challenge, but for those with lesser ambitions, it can be something that you do in your sleep. Now some kids at the Intelligent Information Laboratory at Northwestern University are suggesting that an average game day story can be bolted together without human intervention.

Crawlers, using tried and true language, bolt in language and quotes automatically and before you know it, a story is, um, written. The project, called Stats Monkey, might have some sportswriters reaching for the bottle of Tums at a time when newspapers are reluctant to send reporters on the road to cover a team. The premise, from Stats Monkey:

Imagine that you could push a button, and magically create a story about a baseball game. That’s what the Stats Monkey system does. Given information commonly available online about many games — the box score and the play-by-play — the system automatically generates the text of a story about that game that captures the overall dynamic of the game and highlights the key plays and key players. The story includes an appropriate headline and a photo of the most important player in the game.

If that sounds like a recipe for robotic, lifeless journalism, you’re probably right, but the weird thing about Stats Monkey is how not-that-terrible the stories are.

By way of demonstration, the group plugged in stats from the Oct. 11 playoff game between the Angels and the Red Sox:

BOSTON — Things looked bleak for the Angels when they trailed by two runs in the ninth inning, but Los Angeles recovered thanks to a key single from Vladimir Guerrero to pull out a 7-6 victory over the Boston Red Sox at Fenway Park on Sunday.

Guerrero drove in two Angels runners. He went 2-4 at the plate.

“When it comes down to honoring Nick Adenhart, and what happened in April in Anaheim, yes, it probably was the biggest hit (of my career),” Guerrero said. “Because I’m dedicating that to a former teammate, a guy that passed away.”

Guerrero has been good at the plate all season, especially in day games. During day games Guerrero has a .794 OPS. He has hit five home runs and driven in 13 runners in 26 games in day games.

After Chone Figgins walked, Bobby Abreu doubled and Torii Hunter was intentionally walked, the Angels were leading by one when Guerrero came to the plate against Jonathan Papelbon with two outs and the bases loaded in the ninth inning. He singled scoring Abreu from second and Figgins from third, which gave Angels the lead for good.

The Angels clinched the AL Division Series 3-0.

Angels starter Scott Kazmir struggled, allowing five runs in six innings, but the bullpen allowed only one runs and the offense banged out 11 hits to pick up the slack and secure the victory for the Angels.

J.D. Drew drove in two Red Sox runners. He went 1-4 at the plate.

Drew homered in the fourth inning scoring Mike Lowell.

“That felt like a big swing at the time,” said Drew. “I stayed inside the ball and put a good swing on it. I was definitely going to be ready to battle again tomorrow, but it didn’t work out.”

Drew has been excellent at the plate all season, especially in day games. During day games Drew has a .914 OPS. He has hit five home runs and driven in 17 runners in 36 games in day games.

Papelbon blew the game for Boston with a blown save. Papelbon allowed three runs on four hits in one inning.

Reliever Darren Oliver got the win for Los Angeles. He allowed no runs over one-third of an inning. The Los Angeles lefty struck out none, walked none and surrendered no hits.

Los Angeles closer Brian Fuentes got the final three outs to record the save.

Juan Rivera and Kendry Morales helped lead the Angels. They combined for three hits, three RBIs and one run scored.

Four relief pitchers finished off the game for Los Angeles. Jason Bulger faced four batters in relief out of the bullpen, while Kevin Jepsen managed to record two outs to aid the victory.

According to the Web site for Stats Monkey: “The system is based on two underlying technologies. First, it uses baseball statistical models to figure out what the news is in the story: By analyzing changes in Win Probability and Game Scores, the system can pick out the key plays and players from any baseball game. Second, the system includes a library of narrative arcs that describe the main dynamics of baseball games (as well as many other competitions): Was it a come-from-behind win? Back-and-forth the whole way? Did one team jump out in front at the beginning and then sit on its lead? The system uses a decision tree to select the appropriate narrative arc. This then determines the main components of the game story and enables the system to put them together in a cohesive and compelling manner. The stories can be generated from the point of view of either team.”

That all sounds pretty great and pretty creepy at the same time, and these wunderkinds say that there are no limits to what the technology might do: “The Machine Generated Sports Stories system could be employed by news organizations or directly by organizations which wish to publish information about their activities, such as college sports teams or businesses.”

One not-so-small quibble: The robot did a fine job of reporting out the too-ing and fro-ing of the game, but it missed one implication — or at least buried the lead: The Angels swept the Sox and went on to play the Yankees for the league championship. Guess algorithms can’t do everything.

Comments are no longer being accepted.

I was one of the professors for the class where the StatsMonkey project was first demonstrated. For my take on whether journalists should be threatened by software like this, check out:

//www.pbs.org/idealab/2009/10/machine-generated-news-a-threat-to-journalists-i-think-not292.html

Why can’t the algorithm just be improved to not bury that lede?

I think the gamer-writing robot is a great thing. My critique of the example piece is it’s not robotic enough. Give me a story about what happened in the game. Describe the action for me. Leave the judgments and the quotes out. There is an army of hacks, pro and am, who can give me the sappy quote about Nick Adenhart.

And “Guerrero has been good at the plate all season, especially in day games. During day games Guerrero has a .794 OPS” is bad sportswriting because it’s bad judgment. (Though there is also an error.)

Robotically, a .794 OPS is “good,” because it’s higher than league average, .764. But for Vladimir Guerrero, who has a career OPS of .954, there is nothing good about an OPS of .794, 92 points lower than his previous full-season career low, last year. It’s a symptom of his rapid decline. The story on Vlad is he isn’t really Vlad anymore.

Just give me the facts, robot. And get them right. Guerrero’s overall OPS was .794. His daytime OPS was .850 — better, and “good” for most players, but bad for Vlad.

What makes me less impressed by this than I’d normally be is that back, like a million years ago, when I was in college, one of my professors had build a storytelling machine called BRUTUS1.

StatMonkey and Brutus seem to be very similar outside of the type of story they are trying to tell.

//www.cogsci.rpi.edu/homeless/research/brutus.html

Selmer Bringsjord book on BRUTUS1 was poublished around 1999, so the software portion is over 10 years old at this point.

I think this software is a great thing, but it will not completely replace journalists. However, it can be very usefull for them. If I were journalist, I would use Stats Monkey to create a draft and then edit it. journalists are always under time pressure and so they could save time.