If Google Could Search Twitter, It Would Find Topsy

Photo
A screen shot of a Topsy search.Credit

There is no question that real-time social networks like Twitter have become an important forum for public conversation, whether the discussion is about chemical weapons in the Middle East or the dance moves of Miley Cyrus at the MTV Video Music Awards.

But good luck trying to search and analyze the fire hose of information flowing through the real-time social network.

Sure, Twitter offers a search function, but its algorithms favor recent tweets and what it considers the most important items over the ones that might be most relevant to the searcher. For example, a search on Tuesday night for Bill de Blasio, who is leading the polls in New York’s Democratic mayoral primary, pulled up lots of tweets repeating his poll stats but virtually nothing on the night’s debate among the Democratic candidates. And search results for older material are quite limited.

Twitter’s weaknesses in search have left an opening for a start-up called Topsy, which has built a niche offering real-time indexing, search and analysis of the Twitter stream.

On Wednesday, the San Francisco company announced that it has now indexed every Twitter message since the first tweet was posted in 2006 — about 425 billion pieces of content when you include photos, pages linked from Twitter, and other related material. (Previously, its complete archive only went back to 2010.)

And the database is free for the public to search at Topsy.com. Want to see what people are saying about President Obama and the Syria vote in Congress? A quick search pulls up what Topsy’s algorithm thinks are the most relevant results, factoring in retweets and the past influence of the tweeter. You can narrow down results by time frame, search for tweets in 10 languages, and see a graph with the volume of tweets over time and an indicator of the general sentiment, positive or negative.

“How do you make sense of 400 billion pieces of content?” said Vipul Ved Prakash, Topsy’s co-founder and chief technology officer. “One, by ranking it. We do that ranking by looking at how much a particular piece of content is being cited by other people.”

Vipul Ved Prakash, Topsy’s co-founder and chief technology officer. Vipul Ved Prakash, Topsy’s co-founder and chief technology officer.

The system is similar to what Google does for Web search. (For a brief time, Google had a deal with Twitter that allowed it to index tweets. But after that deal expired in 2011, it became pretty much useless for searching the real-time social conversation.)

Topsy makes its money from more sophisticated tools — aimed at marketers, media companies, political operations, and hedge funds — that require a subscription fee that starts at $12,000 a year. Those allow searches that compare different terms, narrow down results by geography and surface the specific tweets with the most influence on the social conversation.

“When Sandy hit, I used them for tracking down information,” said Danny Sullivan, founding editor of Search Engine Land, referring to the 2012 storm that devastated much of the East Coast. “I think they’re a great resource.”

But Mr. Sullivan, who has followed the development of search technology since its infancy, questioned whether Topsy’s powerful tools are more than most Twitter users want.

“People aren’t turning to Twitter search the way they are with Google,” he said. “The people who are really into Twitter search are journalists.”

Mr. Prakash acknowledged that most of Topsy’s users work for businesses that need to mine Twitter for valuable information, such as Visa and USA Today. Competitors like DataSift and Gnip also offer access to the Twitter archive, he said, although their ability to deliver real-time information is more limited.

But Topsy knows it’s doing something right when Twitter itself uses the company’s tools, including for its Twitter Political Index that tracked voter sentiment during the 2012 presidential election and for its Twitter Oscars Index, which tried to predict this year’s Academy Award winners based on Twitter chatter.

“What we are doing is creating new products from the data,” Mr. Prakash said. “It becomes very complementary to the products that Twitter is providing.”