Why Are There No Jobs for Hadoop in the Federal Government?

by Daniel Castro August 6, 2013

written by Daniel Castro August 6, 2013

Hadoop has been the industry standard for scalable data processing applications for several years, so why does a search for “Hadoop” on USAJOBS.gov return zero results?

One reason could be that given the current budget environment, hiring for IT projects might be suspended. The budget is certainly a factor, although it cannot be the only one as jobs for SQL, Java, and even COBOL developers can still be found.

Another reason might be that the federal government is simply contracting out this work. Again, this might explain part of the situation, but if so, it reflects poor planning by government agencies as these skills will be increasingly critical to the federal government given the massive amount of information it collects, stores and processes, and agencies should be cultivating this talent.

A more likely reason is that government agencies have not fully embraced “big data” because government leaders still do not fully understand what it can do or how it can help them operate more efficiently. For example, text mining can be applied to financial fraud detection, research paper classification, student sentiment analysis and smarter search engines for all manner of government records, and machine learning can be used for decision support systems for healthcare, model generation for climate science, speech recognition for security and mobile data entry across agencies.

In addition, the uptake of data science in some government agencies has been slowed by a shortage of qualified engineers in the public sector and compliance concerns associated with storing data in public cloud-based servers.

Although fairly intuitive at a high level of abstraction, Hadoop remains difficult to actually work with, due to the difficulty of performing even basic administrative tasks such as installation and configuration, as well as the ingenuity required to translate complicated algorithms that were not designed for parallel computing into the style Hadoop uses. EMC CTO John Roese stated in May that “It’s a challenge to find 1,000 Hadoop experts.” In government, where pay is often lower than in the private sector, this shortage is even more acute.

Another challenge for adoption is concern about security; despite the fact that public cloud storage providers offer broad guarantees on information security and often better security controls than found in other environments, some federal CIOs have expressed reservations about storing highly-sensitive data, such as classified information, in a public cloud. Private cloud infrastructure may be an option in these situations, but insofar as it can be more costly than a pay-per-use model on a public cloud, perceived security concerns do present a potential obstacle in some scenarios, especially in risk-averse government agencies.

But even with these barriers, the applications of massively parallel data processing are many, and the federal government should not be sitting on the sidelines. Just as government leaders need economic literacy to make good policy, they also need data literacy. Policymakers should develop a broader understanding of the technical capabilities of data science so they can ensure the public sector reaps more of the same cost and efficiency benefits the private sector has enjoyed for years.

Update (8/7/2013): USAJobs.gov now shows one listing for a Hadoop-related software development job with the Office of Financial Research in the U.S. Department of Treasury. Interestingly, Congress recently held a hearing to investigate the use of consumer data by the Treasury.

Daniel Castro

Daniel Castro is the director of the Center for Data Innovation and vice president of the Information Technology and Innovation Foundation. Mr. Castro writes and speaks on a variety of issues related to information technology and internet policy, including data, privacy, security, intellectual property, internet governance, e-government, and accessibility for people with disabilities. His work has been quoted and cited in numerous media outlets, including The Washington Post, The Wall Street Journal, NPR, USA Today, Bloomberg News, and Businessweek. In 2013, Mr. Castro was named to FedScoop’s list of “Top 25 most influential people under 40 in government and tech.” In 2015, U.S. Secretary of Commerce Penny Pritzker appointed Mr. Castro to the Commerce Data Advisory Council. Mr. Castro previously worked as an IT analyst at the Government Accountability Office (GAO) where he audited IT security and management controls at various government agencies. He contributed to GAO reports on the state of information security at a variety of federal agencies, including the Securities and Exchange Commission (SEC) and the Federal Deposit Insurance Corporation (FDIC). In addition, Mr. Castro was a Visiting Scientist at the Software Engineering Institute (SEI) in Pittsburgh, Pennsylvania where he developed virtual training simulations to provide clients with hands-on training of the latest information security tools. He has a B.S. in Foreign Service from Georgetown University and an M.S. in Information Security Technology and Management from Carnegie Mellon University.

Why Are There No Jobs for Hadoop in the Federal Government?

A Global Alliance for Genomic Data Sharing

Data Transparency 2013

You may also like