John Elder is founder and CEO of the Elder Research Inc., a data and text mining consultancy founded in 1995 that serves a variety of federal agencies and private sector companies, and the author of a number of books on data analysis. I asked John to talk with me about how organizations are using data mining, especially text mining and analytics, to make better decisions.
Castro: You’ve written quite a bit about text mining and text analytics. How does “turning text into numbers” help business leaders make better decisions?
Elder: A great deal of data useful for a business is in text form and, on many of our text mining projects it far outweighs numerical data in value. It is a real challenge to automatically extract useful information from text, compared to numbers, since there are so many different ways words can express a particular idea or fact. Techniques for mining text are still less sophisticated than those for numbers. But, they are getting better fast. And, since the industrial processing of text data is novel, the patterns to be found are “low hanging fruit”, that can be harvested by the first ones to get there.
Having an idea of what customers and other stakeholders are “saying” about your product and company is extremely valuable to heading off problems, discovering new niches, or sparking ideas for marketing and product development.
Castro: What are some examples of how federal agencies have begun using these tools to better deliver government services?
Elder: Federal agencies are at the forefront of mining text to succeed at their mission. Our team, for instance, has helped the SSA grade disability applications faster and more accurately, the DHS identify potentially dangerous cargoes, and the DoD discover outbreaks of infectious diseases. Text mining is working! And, it can be used together with numerical mining. An exciting trend is to combine data from multiple channels to get a 360-degree view of your customer, constituent, or workforce.
Data mining is particularly well designed to address “needle in a haystack” challenges, like fraud detection, insider threat, or drug discovery, where very dangerous (or valuable) cases are hidden deep in a pile of…. junk. A model built using predictive analytics can score the data and rank cases by interest. It “looks at” all the cases, gathering many small clues about each case to create an equation summarizing which factors, over the known data, affect risk. Alone it’s somewhat useful, but paired with an expert analyst (who provides context and judgment), it can phenomenally enhance productivity. For instance, a model of ours for a large federal agency improved fraud detection by a factor of 25. And, we’re getting great feedback for a powerful mining and visualization tool we built for the USPS-OIG that ~1,000 analysts are using to prioritize and track cases to discover multiple kinds of fraud and non-compliance.
Castro: If you read some of the trade press, you might believe that data mining is always the answer. When is data mining not effective?
Elder: Wait? There are times it’s not? Actually, even having fancy software and relevant data are no guarantee of success. There are so many ways to go astray by, for example, missing the real problem, using data from the future, or “torturing the data until it confesses”. I’ve written a popular book chapter on the major types of analytic mistakes (see www.tinyurl.com/bookERI) after learning many of them the hard way! It’s wise to use at least the advice and oversight of expert data mining practitioners to get the greatest return on your investment.
Castro: What capabilities do organizations need to be able to do data mining effectively?
Elder: You have to have relevant data, a motivated business champion, experienced analytic professionals, and good software. Oh, and a budget! But the budget isn’t as big a problem as usual, as you should be able to recoup it with gains from the analysis very quickly. I mentioned the business champion (determined client with a goal) before even the analytic professional. A motivated and visionary champion is essential to seeing the model makes it all the way to implementation. Many powerful models have never made it out of the lab for lack of shepherding. A good movie (and book) illustrating this is Moneyball. There, the nerd (analyst) with a great new way of scoring baseball skill would never have made an impact if the manager (business champion) hadn’t risked his career on the models’ outcome. (We took the whole office to see that movie when it came out. Now I just need to take all my clients!) It resulted, of course, in a phenomenal success, relative to the team’s budget constraints.
Castro: Do you think organizations will change how they use data mining over the next few years?
Elder: Data is increasing in Velocity, Volume, and Variety. So the tools with which we tame it must get better with time. Already it’s getting easier for people to better visualize their data. No computer will likely ever be as good at seeing the big picture or finding novel insights as a motivated analyst aware of the context of the data and able to explore it through fast visualization! Second, combining multiple strands of different kinds of data is going to become more possible. Third, more practitioners will build multiple competing models (ensembles) to obtain the greatest known accuracy (www.tinyurl.com/book2ERI).
Lastly, the most effective organizations will foster a “learning environment” where they nurture analytic talent, maintain a pipeline of projects ranging from conceptual to production, and create a budget to experiment with some of their decisions – to generate new results that might not be short-term optimal, but are a great long-term investment in learning about their business landscape. It often takes a group several years to go from its first data mining project to where it becomes a “learning” organization, fully utilizing the power of predictive analytics. But it’s well worth doing; there still exists a “first mover” advantage. So, take every opportunity to be the first organization in your field to get there!