Hadoop has been the industry standard for scalable data processing applications for several years, so why does a search for “Hadoop” on USAJOBS.gov return zero results?
One reason could be that given the current budget environment, hiring for IT projects might be suspended. The budget is certainly a factor, although it cannot be the only one as jobs for SQL, Java, and even COBOL developers can still be found.
Another reason might be that the federal government is simply contracting out this work. Again, this might explain part of the situation, but if so, it reflects poor planning by government agencies as these skills will be increasingly critical to the federal government given the massive amount of information it collects, stores and processes, and agencies should be cultivating this talent.
A more likely reason is that government agencies have not fully embraced “big data” because government leaders still do not fully understand what it can do or how it can help them operate more efficiently. For example, text mining can be applied to financial fraud detection, research paper classification, student sentiment analysis and smarter search engines for all manner of government records, and machine learning can be used for decision support systems for healthcare, model generation for climate science, speech recognition for security and mobile data entry across agencies.
In addition, the uptake of data science in some government agencies has been slowed by a shortage of qualified engineers in the public sector and compliance concerns associated with storing data in public cloud-based servers.
Although fairly intuitive at a high level of abstraction, Hadoop remains difficult to actually work with, due to the difficulty of performing even basic administrative tasks such as installation and configuration, as well as the ingenuity required to translate complicated algorithms that were not designed for parallel computing into the style Hadoop uses. EMC CTO John Roese stated in May that “It’s a challenge to find 1,000 Hadoop experts.” In government, where pay is often lower than in the private sector, this shortage is even more acute.
Another challenge for adoption is concern about security; despite the fact that public cloud storage providers offer broad guarantees on information security and often better security controls than found in other environments, some federal CIOs have expressed reservations about storing highly-sensitive data, such as classified information, in a public cloud. Private cloud infrastructure may be an option in these situations, but insofar as it can be more costly than a pay-per-use model on a public cloud, perceived security concerns do present a potential obstacle in some scenarios, especially in risk-averse government agencies.
But even with these barriers, the applications of massively parallel data processing are many, and the federal government should not be sitting on the sidelines. Just as government leaders need economic literacy to make good policy, they also need data literacy. Policymakers should develop a broader understanding of the technical capabilities of data science so they can ensure the public sector reaps more of the same cost and efficiency benefits the private sector has enjoyed for years.
Update (8/7/2013): USAJobs.gov now shows one listing for a Hadoop-related software development job with the Office of Financial Research in the U.S. Department of Treasury. Interestingly, Congress recently held a hearing to investigate the use of consumer data by the Treasury.