Advertisement

SKIP ADVERTISEMENT

Slipstream

When 2+2 Equals a Privacy Question

TIME to revisit the always compelling — and often disconcerting — debate over digital privacy. So, what might your movie picks and your medical records have in common?

How about a potentially false sense of control over who can see your user history?

While Netflix and some health care concerns say they have been able to offer study data to researchers stripped of specific personal details like your name, phone number and e-mail address, in some cases researchers may be able to re-identify you by correlating anonymous information with the digital trail that you’ve left on blogs, chat rooms and Twitter.

Of course, you may be fine with that. On the other hand, you may not want complete strangers rummaging around in your history of movie selections or medical needs.

For example, contestants in Netflix’s competition to improve its recommendation software received a training data set containing the movie preferences of more than 480,000 customers who had, as they say in the trade, been “de-identified.” But as part of a privacy experiment, a pair of computer scientists at the University of Texas at Austin decided to see if it was possible to re-identify those unnamed movie fans.

By comparing the film preferences of some anonymous Netflix customers with personal profiles on imdb.com, the Internet movie database, the researchers said they easily re-identified some people because they had posted their e-mail addresses or other distinguishing information online.

Vitaly Shmatikov, an associate professor of computer science at the University of Texas at Austin and a co-author of the “de-anonymization” study, says the researchers were able to analyze users’ public postings and connect that to their Netflix preferences — including how a person may have rated films with controversial themes. Those are choices a person may or may not want to make public, Mr. Shmatikov said.

Steve Swasey, a Netflix spokesman, disputed the study’s conclusions, saying the customers were not re-identifiable because Netflix had altered the data set before sending it to contestants.

“There is no way with certainty that anyone could link a Netflix member with the data Netflix has disclosed by linking it with any publicly available data,” he said. “The anonymity of the information is comparable to the strictest federal standards for anonymizing personal health information.”

Nevertheless, the Texas researchers say they were indeed able to positively identify Netflix customers, and some privacy advocates say their study raises questions about whether newly strengthened laws governing the security of electronic health records — which contain information on diagnoses and treatments entered by health care providers — may offer incomplete privacy protection. Leaked movie preferences might embarrass or stereotype you, they said. But information extracted from medical records and then linked back to you, they said, has the potential to cause social, professional and financial harm.

“Movie records can be sensitive in some cases; it could be embarrassing for someone to find out I like romantic comedies,” Mr. Shmatikov, the computer scientist, said in a recent phone interview. “But definitely for health records, this is a huge issue.”

And you don’t need records containing a person’s name and address to figure out to whom the records belong, he said, “As our research shows, pretty much any information that distinguishes one person from another can be used to re-identify records.”

Image
Credit...Darren Hauck for The New York Times

The idea of an entirely paperless medical system holds the promise of more efficient and cost-effective care. And, with the incentive of stimulus package money, many companies are rushing to sell clinical information systems to streamline services like patient scheduling, sample tracking, and billing at hospitals and clinics.

In some cases, the same companies that sell data management systems to hospitals and physicians also store that information and then repackage it to make money on other services.

The clinical information systems market in the United States has sales of $8 billion to $10 billion annually, and about 5 percent of that comes from data and analysis, according to estimates by George Hill, an analyst at Leerink Swann, a health care investment bank.

But by 2020, when a vast majority of American health providers are expected to have electronic health systems, the data mining component alone could generate sales of up to $5 billion, Mr. Hill said. Demand for the data is likely to be robust. Policy makers and hospitals will want to dig into it to analyze physician practices and glean information about patient health trends.

Big players like the Cerner Corporation, which maintains electronic health systems for 8,000 clients, including large hospitals and retail clinics, and smaller players like Practice Fusion, which offers its Web-based health record systems free to health care providers, say they make use of patient data collected from their clients.

A spokeswoman for Cerner, whose Web site promotes its “data mining of our vast warehouse of electronic health records,” said the company shares de-identified patient data with researchers or drug companies looking for patients to participate in clinical trials. The patient records are “double scrubbed,” she said, explaining that the company removes personal data like names and addresses before it runs a search using a numbered code for each patient.

Other sensitive information, like mental health records, might be removed before the patient data is sent out, she said.

The Web site of Practice Fusion, meanwhile, quotes Ryan Howard, the chief executive, as saying that the company subsidizes its free record-keeping systems by selling de-identified data to insurance groups, clinical researchers and pharmaceutical companies. In an interview, however, Mr. Howard said Practice Fusion had not yet started selling patient information but that it intended to do so.

NEW regulations require notifying patients if their personally identifiable medical information gets loose, and they prohibit selling protected health records. But privacy advocates said electronic health records remain vulnerable because no federal law now forbids the sale of de-identified health care data.

In 1997, for example, a researcher identified the medical records of William Weld, then the governor of Massachusetts, by correlating birthdays, ZIP codes and gender in voter registration rolls and information published by the state’s government insurance commission.

There are no current federal laws against re-identification, said Dr. Deborah Peel, a psychiatrist who is a director of Patient Privacy Rights, a nonprofit watchdog group in Austin, Tex.

“Once personal health data gets out there, it’s like the Paris Hilton sex tape,” Dr. Peel said. “It is going to be out there forever.”

A version of this article appears in print on  , Section BU, Page 4 of the New York edition with the headline: When 2+2 Equals a Privacy Question. Order Reprints | Today’s Paper | Subscribe

Advertisement

SKIP ADVERTISEMENT