5 Q’s for Dr. Joshua New, R&D Staff at Oak Ridge National Laboratory

The Center for Data Innovation spoke with Dr. Joshua New, research and development staff at Oak Ridge National Laboratory (ORNL), the largest U.S. Department of Energy (DOE) science and energy laboratory. Dr. New discussed how supercomputing helped develop advanced modeling tools to improve building energy efficiency and how investing in supercomputing resources can offer huge benefits to public and private sector operations.

Strangely enough, the world of big data is small and Joshua New of the Center for Data Innovation and Joshua New of ORNL share the same name. For the sake of clarity, Joshua New of ORNL is listed in the conversation as Dr. New.

This interview has been lightly edited.

Joshua New: One of your main projects at ORNL is Autotune, a collection of open-source tools for improving the energy efficiency of residential and commercial buildings by improving the accuracy of their models. Could you explain how Autotune works, and why the Department of Energy thought it was necessary?

Dr. Joshua New: DOE’s Building Technologies Office has an overarching goal to reduce our nation’s $431 billion-per-year utility bill for buildings 30 percent by 2030, compared to a 2010 baseline energy use intensity. One of the methods that can scalably be employed for assessment of our 125.1 million buildings is software-based retrofit analysis, which can identify how to make a particular building more energy efficient while optimizing the return-on-investment. However, software models suffer from a credibility problem when simulation-based predictions do not match reality. Standards organizations have meet-or-beat criteria that define a limit for useful software models of a building, but manual efforts to calibrate a model of a building to match utility bill data are currently an expensive art that relies upon the subjectivity and time constraints of an energy professional.

The Autotune project’s goal was to define an objective, scientifically-defensible, scalable, fully-automated process for model calibration that could change a building energy model’s input so that simulation output actually matches measured data from the real building. And this process had to be sufficient to satisfy ASHRAE Guideline 14 requirements—industry standards for measuring energy savings. In this project, we employed eight supercomputers, developed the fastest buildings simulator in the world, harnessed several software packages into a suite of machine learning algorithms, and applied several machine learning and multi-objective optimization algorithms in parallel on supercomputers to determine which robustly performed with the best accuracy and took as little computational time as possible. In a benchmark involving calibration for 20,000 buildings, we employed an evolutionary algorithm that performed the best with an average error rate of 3.65 percent, far surpassing the guideline requirement of 30 percent. It works by successively evolving building models, via evolutionary operators such as crossover and mutation, over subsequent generations in which each building model’s similarity to utility data determines the likelihood of transmitting similar building properties to the following generation. The core algorithm, instructions, website, web service, and virtual machine image are .

New: Autotune was developed as part of EnergyPlus, a whole-building energy simulation program DOE developed with analysis from ORNL’s Titan supercomputer. Can you give an overview of the EnergyPlus program and explain how it was only possible thanks to supercomputing?

Dr. New: EnergyPlus is DOE’s flagship whole-building simulation engine, which they’ve invested $65 million in since 1995 and is also open source and available on GitHub. In order to evolve better models that more closely match monthly utility bills, Autotune uses software called OpenStudio for quickly defining high-level details about a baseline building model, and then uses EnergyPlus simulations as it creates a more accurate building model during the automated calibration process. There are many algorithms for doing this, and many approximation techniques which tradeoff accuracy for time. Using supercomputers, we were able to test hundreds of thousands of algorithm variations simultaneously to find one that performed very well and then deploy that best algorithm that can run on a laptop or through a website.

I’ve once heard science described as a search through the space of models to find one that sufficiently describes observables. In cases where a sufficient simulation engine and computational resources exist, science can be sufficiently automated through the employ of optimization algorithms. Automating science and the discovery process at scale requires world-class computing resources.

New: The Obama Administration launched its National Strategic Computing Initiative in 2015, which aims to ensure that the United States remains at the cutting edge of high performance computing (HPC), in particular by accelerating the development of HPC systems 100 times more powerful that what we have today. What would that kind of analytical capacity mean for the projects you work on? What problems would you be able to tackle that you can’t today?

Dr. New: Let me put this in context of the Autotune project. To calibrate a building model, there are 3,000 inputs for the average building which can be varied. We worked with subject matter experts to define 156 of the most important, such as thermal resistance of attic insulation, for energy consumption in a residential building, and then come up with a discrete set of values that were physically realistic. Even with this subset, there are 5×10⁵² possible building models. We used the Titan supercomputer in this project, currently the world’s second fastest. If we had the entire machine to ourselves, running full-tilt around the clock, it would take Titan 4.1×10²⁸ lifetimes of the known universe to calibrate just one building model using brute-force. Using Titan with new advances in machine learning and large-scale search allowed Autotune to deploy the best algorithm that can calibrate a building in one hour on consumer-grade hardware.

My five-year vision is to create calibrated building energy model for every building in America. This requires gathering and processing information from multiple data sources, and then calibrating the model for things that can be seen by using energy data from the building. Only supercomputing can achieve the time-to-solution requirements for creating and calibrating over 125 million building energy models. Once such a data set exists, a building model provided to the homeowner for sharing with industry would allow immediately scalable implementation by the private sector for making America’s buildings energy efficient.

New: You are also involved in DOE’s Lab-Corps program, which focuses on helping national lab researchers better understand the business implications of their technology. Why is this kind of tech-transfer program beneficial? Has anything interesting come out of it so far?

Dr. New: Honestly, I believe this type of training should be mandatory for national lab researchers. As researchers, we are often very good at making and testing hypotheses about a phenomenon being studied in a lab. However, I believe few ever realize that industry and the public simply do not care about our technology—they only care about what it can do for them. For researchers that participate in tech-to-market activities, few know what it takes to define a business model around a technology, and even fewer have interviewed market stakeholders to validate the many hypotheses that go into such a model. As part of Lab-Corps, our three-person team interviewed 86 people in six weeks to refine 54 hypotheses and establish a business model canvas that defines the dollar value of the technology based on real market needs.

I am a practical researcher, and while there is a role for fundamental research that is seemingly disconnected from immediate application, there is also an expectation of responsible stewardship for individuals and organizations underwritten by the American taxpayer to provide value back in terms of improving better quality of life, economic competitiveness, and sustainability for future generations. I believe national labs could better develop their technologies and research programs if the research staff had a more clear understanding of market needs. There were 14 teams in our Lab-Corps cohort and several have identified sufficient business needs to commercialize national lab technology that might not have happened otherwise.

New: You have told me that you think the public sector can make better use of its own HPC resources, which anyone in the public or private sector can compete for. Could you elaborate on this? What applications would HPC systems be better suited for that many agency project managers may not consider?

Dr. New: HPC resources and cloud resources both have an important role to play for government. Cloud computing has helped to commoditize computing and given the public sector the ability to dynamically scale to meet market demand for their services. The cloud can respond relatively quickly whereas jobs submitted to a supercomputer may run more quickly but take days to begin due to high demand for access. Leadership-class HPC facilities, like Titan, are research instruments created for the most important needs of society, used by the brightest scientists, and extend the best software capabilities. HPC can deliver big science if the need is sufficient to warrant an award on these machines. In order to prepare for an award, there are allocation request procedures, which vary by supercomputer provider, to allow time on the machine and demonstrate that you’re ready for a major allocation on the supercomputer. It is a common requirement that the information derived from an allocation must be published in the open literature. These systems are free to the user, which can make things feasible that would not be on cloud computing resources. As an example from the Autotune project, we used Titan to run over 8 million EnergyPlus simulations and store over 200 terabytes of simulation data; on cloud resources, this would have cost $78,000 in compute time and $8.2 million per year in database storage. Autotune would not exist in its current form if there was not a publicly-funded HPC infrastructure to support development of this open source public good for the private sector.

5 Q’s for Dr. Joshua New, R&D Staff at Oak Ridge National Laboratory

The Panama Papers Reveal the Sad State of U.S. Corporate Data Transparency Laws

Congress is Stepping Up to Protect Open Data

You may also like