NSA Mimics Google, Pisses Off Senate

In 2008, a small team of coders inside the National Security Agency started reverse-engineering the database that ran Google. They closely followed the Google research paper describing BigTable -- the sweeping database that underpinned many of the Google's online services, running across tens of thousands of computer servers -- but they also went a little further. In rebuilding this massive database, they beefed up the security. After all, this was the NSA.
Image may contain Human Person Crowd Indoors Room Jury and Audience
jim.greenhill/Flickr

In 2008, a team of software coders inside the National Security Agency started reverse-engineering the database that ran Google.

They closely followed the Google research paper describing BigTable -- the sweeping database that underpinned many of Google's online services, running across tens of thousands of computer servers -- but they also went a little further. In rebuilding this massive database, they beefed up the security. After all, this was the NSA.

Like Google, the agency needed a way of storing and retrieving massive amounts of data across an army of servers, but it also needed extra tools for protecting all that data from prying eyes. They added "cell level" software controls that could separate various classifications of data, ensuring that each user could only access the information they were authorized to access. It was a key part of the NSA's effort to improve the security of its own networks.

But the NSA also saw the database as something that could improve security across the federal government -- and beyond. Last September, the agency open sourced its Google mimic, releasing the code as the Accumulo project. It's a common open source story -- except that the Senate Armed Services Committee wants to put the brakes on the project.

In a bill recently introduced on Capitol Hill, the committee questions whether Accumulo runs afoul of a government policy that prevents federal agencies from building their own software when they have access to commercial alternatives. The bill could ban the Department of Defense from using the NSA's database -- and it could force the NSA to meld the project's security tools with other open source projects that mimic Google's BigTable.

The NSA, you see, is just one of many organizations that have open sourced code that seeks to mimic the Google infrastructure. Like other commercial outfits, the agency not only wants to share the database with other government organizations and companies, it aims to improve the platform by encouraging other developers to contribute code. But when the government's involved, there's often a twist.

The U.S. government has a long history with open source software, but there are times when policy and politics bump up against efforts to freely share software code -- just as they do in the corporate world. In recent years, the most famous example is NASA's Nebula project, which overcame myriad bureaucratic hurdles before busting out of the space agency in a big way, seeding the popular OpenStack platform.

That said, the Accumulo kerfuffle is a little different. In trying to determine whether Accumulo duplicates existing projects, the bill floated by the Senate Armed Services committee uses such specific language, some believe it could set a dangerous precedent for the use of other open source projects inside the federal government.

The NSA at 'Internet Scale'

Originally called Cloudbase by the NSA, Accumulo is already used inside the agency, according to a speech given last fall by Gen. Keith Alexander, the director of the NSA. Basically, it allows the NSA to store enormous amounts of data in a single software platform, rather than spread it across a wide range of disparate databases that must be accessed separately.

Accumulo is what's commonly known as a "NoSQL" database. Unlike a traditional SQL relational database -- which is designed to run on a single machine, storing data in neat rows and columns -- a NoSQL database is meant for storing much larger amounts of data across a vast array of machines. These databases have become increasingly important in the internet age, as more and more data streams into modern businesses -- and government agencies.

With BigTable, Google was at the forefront of the NoSQL movement, and since the company published its paper describing BigTable in 2006, several organizations have built open source platforms mimicking its design. Before the NSA released Accumulo, a search outfit called Powerset -- now owned by Microsoft -- built a platform called HBase, while social networking giant Facebook fashioned a similar platform dubbed Cassandra.

And this is what bothers the Senate Armed Services Committee.

The Senate Armed Services Committee oversees the U.S. military, including the Department of Defense and the NSA, which is part of the DoD. With Senate bill 3254 -- National Defense Authorization Act for Fiscal Year 2013 -- the committee lays out the U.S. military budget for the coming year, and at one point, the 600-page bill targets Accumulo by name.

The bill bars the DoD from using the database unless the department can show that the software is sufficiently different from other databases that mimic BigTable. But at the same time, the bill orders the director of the NSA to work with outside organizations to merge the Accumulo security tools with alternative databases, specifically naming HBase and Cassandra.

The bill indicates that Accumulo may violate OMB Circular A-130, a government policy that bars agencies from building software if it's less expensive to use commercial software that's already available. And according to one congressional staffer who worked on the bill, this is indeed the case. He asked that his name not be used in this story, as he's not authorized to speak with the press.

At this point, the staffer says, the committee isn't concerned with the man power the NSA required to build the database. But it doesn't want the government using Accumulo if there are larger, more active communities developing projects such as a HBase and Cassandra. He says that the committee encouraged the NSA to build its security controls into existing open source projects, but that the agency declined to do so.

The NSA press office could not immediately provide someone to officially discuss the matter. But for Gunnar Hellekson -- the chief technology strategist in U.S. Public Sector group at Red Hat, the open source software outfit -- the committee has gone too far. He was pleased to see a senate bill that has such intimate knowledge of open source software -- a rarity on Capitol Hill -- but he argues that since Accumulo has already been built and open sourced, the committee has no business intervening.

"When Accumulo was written, it was definitely doing new work," he tells Wired. "Some of its differentiating features are being handled by other pieces of software. But other core concepts are unique, including the cell-level security.... That's an incredibly important feature, and to do it properly is incredibly complicated."

Not All Open Source Projects Are Created Equal

The bill benefits HBase and Cassandra -- two very popular open source projects. But it certainly undermines the progress of Accumulo, and that's a particular worry for Oren Falkowitz, one of the developers of the database, who has left the NSA to start Sqrrl, a company that seeks to build a business around Accumulo in much the same way Red Hat built one around the Linux operating system.

Like Hellekson, Falkowitz argues that since Accumulo already open source -- and its backed by the Apache Software Foundation, a major open source steward -- it doesn't violate government policy. "The launch of sqrrl validates the success of Apache Accumulo as a project," he says, pointing out that sqrrl has received funding from two well-known venture capital firms. "Accumulo's technical strengths are not limited to government use cases, and already, we've seen interest and adoption of Accumulo by financial, healthcare, and a broad range of other commercial firms."

He also argues that Accumulo is still quite different from other BigTable mimics. BigTable and other similar database splits massive amounts of data into tiny pieces and spreads them across potentially tens of thousands of servers. But unlike any other platform, Falkowitz says, Accumulo lets you tag each tiny piece of data so that it can only be accessed by certain outside servers. This is useful not only to the NSA, he says, but to other government organizations and health care outfits legally required to separate data in this way.

"Basically, each [data object] has an extra label that's attached to it, and you can use that to authenticate and authorize users against each object," Falkowitz says. "Most systems do that at the columns or the rows level of the database."

Red Hat's Hellekson -- who has blogged about the issue on multiple occasions -- goes further, arguing that the bill could undermine the progress of open source projects well beyond Accumulo. The bill doesn't just ask that the DoD prove that the Accumulo project is no more costly than the likes of HBase and Cassandra. It wants proof that Accumulo is a "successful Apache Foundation open source database with adequate industry support and diversification."

"It doesn’t take much imagination to see that same 'adequacy criteria' applied to all open source software projects," Hellekson writes. "Got a favorite open source project on your DoD program, but no commercial vendor? Inadequate. Only one vendor for the package? Lacks diversity. Proprietary software doesn’t have a burden like this."

If the bill passed with the current Accumulo language intact, the onus is on the chief information officer of the Department of Defense to determine whether Accumulo can be used within the department. But whatever the verdict, it would not bar the NSA from using the database -- just the rest of the DoD.

Open source is a complicated thing. Especially inside the government.