Biz & IT —

Updated: How Verizon found child pornography in its cloud

Scanned files using hashes of known child pornography images.

Sean Gallagher - Mar 5, 2013 4:51 pm UTC

This story has been updated with information from the National Center for Missing and Exploited Children and the Baltimore County Police Department.

Cloud-based storage services are no doubt useful. They can back up your personal data and keep it from being lost if your system crashes. They can share your data across multiple computers. But cloud-based services are increasingly checking user-uploaded data for illegal content—particularly child pornography.

When Congress passed the PROTECT Our Children Act of 2008 mandating that service providers report suspected child pornography in the content that their customers surf and store, the law gave providers an out: if they couldn't check, they wouldn't know, and they wouldn't have to report it. But while checking is still voluntary, the National Center for Missing and Exploited Children has been pushing providers to use image-matching technology to help stop the spread of child pornography.

William Albaugh found this out the hard way when he backed up his home computer to Verizon's online backup service. The 67-year-old deacon of a Catholic church in Baltimore County didn’t realize he was giving away his secret—after he allegedly uploaded pornographic images and videos of children to his Online Backup and Sharing cloud account, they were scanned by a Verizon partner using technology that can automatically check images and videos for the presence of children known to be the victims of pornographers.

Since the passage of the PROTECT Act, sponsored by then-Senator Joseph Biden Jr., service providers have been required to register with the NCMEC's Cyber Tipline, operated in coordination with federal, state, and local law enforcement. Providers have a "duty to report" to the NCMEC if their users access or store child pornography; in the last six months of 2012, the Cyber Tipline handled 113,009 reports of child pornography from electronic service providers.

Verizon officials would not go into the particulars of how it scans customers' content. "All we do is follow the law," said Verizon spokesperson Linda Laughlin. But they acknowledged that the company uses a database of mathematical fingerprints of known images of children generated by the National Center for Missing and Exploited Children.

To serve and protect

Update: John Shenan, executive director of NCMEC's exploited children division said that the database shared with service providers has data for about 16,000 images, he said, all submitted by service providers themselves. Each of the 16,000 images in the commercial provider database has been checked by three criteria: it contains children who are prepubescent or infants, those children are being subjected to sexual abuse in the photos, and the children have been previously identified by law enforcement as victims. In other words, the child pornography caught by the service providers' scanners is what Shenan described as the "worst of the worst," and is content that has largely already been in wide distribution.

The provider database, while based on the same underlying technology, is separate from the e image database compiled by NCMEC for law enforcement as part of the part of the Child Victim Identification Program. For that database, the group reviewed over 17.3 million such files in 2011 alone.

"Those images do not go into this program," Shenan said. "If we were to extract images from the law enforcement database, we would have 4th amendment issues." By keeping the division, NCMEC avoids defendants in cases who are caught by service provider searches from claiming that the service providers were acting as agents of law enforcement.

Both NCMEC databases don't contain the images themselves. Instead, they contain a collection of image "fingerprints" created in two ways. By sharing the hashed "fingerprints" of images in which children have been identified performing sexual acts, NCMEC makes it possible for law enforcement officials, cloud storage services, and hosting providers to check large volumes of files for matches without having to keep copies of offending images themselves.

The original hash database method, which is still used for some detection applications, uses an MD5 hash of known bad files to create a unique identifier for that file. The second, added just over a year ago, uses using a technology called PhotoDNA, which was donated by Microsoft. PhotoDNA creates hash values not based on the files themselves, but from biometric information within the photos and videos. As a result, in theory, PhotoDNA-based scanning software can be used to recognize images even when they've been resized or cropped.

Shenan said that both PhotoDNA hashes and MD5 hashes are provided to service providers who've voluntarily signed up for content scanning, along with the raw PhotoDNA code for matching content for service providers to incorporate into their own networks after they sign a memorandum of understanding with NCMEC. There are only about a dozen commercial organizations that are part of the voluntary NCMEC program, and only two have identified themselves openly: Microsoft and Facebook, who were both early adopters of PhotoDNA.

Verizon doesn't provide cloud services itself; it contracts out with cloud storage providers who operate data centers to provide the backend for its Online Backup and Sharing service for FiOS and other cloud storage services. Laughlin said that for security reasons, Verizon would not discuss which vendors were involved in scanning customer's files—or how frequently that scanning happened. But Verizon's own terms of service documents say that the Online Backup and Sharing service is provided by Digi-Data Corporation of Broomfield, Colorado.

Crypto clearance

It's Digi-Data that actually performs the scan of users' content; the company in turn reports possible "hits" in content to Verizon's security team, who in turn associates those hits with a specific account and passes them to the NCMEC Cyber Tipline. So when Albaugh's computer allegedlyu uploaded the videos and images he had stored on his computer's hard drive, they traversed Verizon's network to a third party's data center. It was there that a scan detected images of children who were known to be victims of child pornography. Verizon submitted the details as a tip through NCMEC, which in turn passed them to the Baltimore County Police Department.

Update: Elise Armacost, the Director of the Office of Media for the Baltimore County Police Department, said in a phone interview that that tip resulted in a search warrant for Albaugh's residence on the morning of March 1. The forensics results of that search are still preliminary, she said, but Albaugh has so far been charged with one count of possession of child pornography, and released on $75,000 bond. Further charges could be added based on the results of the investigation.

If Albaugh had been a bit more technically aware, he might have encrypted his data locally, which would have kept him from being caught so easily. While the data passes over Verizon's network encrypted, it would have to be either stored unencrypted or decrypted with a local key at the data center to be detected by the PhotoDNA hash scan. It's more likely that user backups are stored encrypted at rest using AES encryption or a similar scheme and then decrypted programmatically for scanning and transmission back to the customer.

Verizon explicitly warns users in its terms of service that it may scan content for violation of its policies. And it explicitly calls out child pornography: "Verizon reserves the right to access your Storage Service account at any time with or without prior notice to you and to disable access to or remove content which in our sole discretion is or reasonably could be deemed unlawful...Verizon is required by law to report any facts or circumstances reported to us or that we discover from which it appears there may be a violation of the child pornography laws. We reserve the right to report any such information, including the identity of users, account information, images and other facts to law enforcement personnel."

Verizon isn't the only cloud provider that performs some level of scanning of its content. Dropbox, for example, spells out in its terms and conditions the many things users aren't allowed to do with the service, including "Don’t share 'unlawfully pornographic' material." The company will cancel your account or worse if you try to. Dropbox also says it "may collect" information on "all the files you upload or download."

And like all cloud providers, Dropbox and Verizon (and others) must be able to provide files stored in the cloud to law enforcement—in some cases without a warrant. The Electronic Communications Privacy Act Amendments Act of 2012, which would have offered cloud-based storage greater privacy protections, failed to get out of the Senate last year, so the "stored communications" that are your personal files will be open to scrutiny for the foreseeable future.

Listing image by Baltimore County Police Department

Sean Gallagher Sean was previously Ars Technica's IT and National Security Editor, and is now a Principal Threat Researcher at SophosLabs. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland.

Channel Ars Technica

← Previous story Next story →