The Center for Data Innovation spoke with Anne Kao, Senior Technical Fellow at Boeing Research and Technology. Kao discussed how she uses machine learning to analyze maintenance reports and how philosophy influences how she approaches data science.
Joshua New: You’ve been with Boeing since 1990. How have you seen the use of data analytics change in the aerospace industry over the past nearly three decades? What were the game-changing developments?
Anne Kao: In the mid-90s, I had to explain to our leadership team what text mining meant because data mining had just started catching the attention of a few people in various industries. Gradually, data analytics focused on better integrating multiple sources in multiple formats, which, together with a growth in machine learning methods and a recognition of the importance of leveraging domain knowledge, is creating much better results. With aerospace requirements, we need to focus on interpretable methods which have begun to make a comeback after years of dominance by deep learning and artificial neural net methods, which are typically non-interpretable. Boeing now has an entire initiative devoted to this effort and other data analytics initiatives, known as Boeing AnalytX, which focuses on turning raw data into actionable insights.
New: You developed a tool called PANDA, short for Part Name Discovery Analytics, to help identify part names in maintenance reports and other records. Why was a tool like this necessary? How does it work?
Kao: There is a huge wealth of data in the aerospace industry contained in free text data. Part information, typically in the form of nomenclature, represents the most crucial piece to help understand key issues such as quality, reliability, supply chain management, and airplane health management. Part name extraction, though a type of entity extraction, is less understood than traditional entity extraction, and super challenging due to the fact that the majority of part names consist of multiple words and there is no “standard” list anywhere. In fact, a standard list would quickly become obsolete as technology changes and adapts. Our method leverages a combination of linguistic knowledge and machine learning, and requires very little machine learning training data, unlike typical deep learning methods.
New: You also developed a method for normalizing part names called UNAMER. Why is normalization so important?
Kao: Free text remains a primary means of communication and documentation. The vast majority of text is not professionally authored and, as I mentioned, there is no standardized list of part names. Trying to get authors—mechanics and engineers all over the world—to write standardized text, including standardized part names, is impractical if not impossible. To unlock the value in free text data, the ability to process noisy text data, like text full of non-dictionary words and non-standard usage, is a must. The challenge does not stop there. Without normalizing the extracted key information, unifying different names for the same part, condition, or action, we will not be able to form a complete picture of the trend or groupings in the data, and we’d fall short in utilizing the result for decision support.
New: It seems like both of these tools have broad applicability outside the aerospace and vehicle maintenance spaces. How easy would it be to adapt these tools to other contexts?
Kao: PANDA is highly useful for any industry that needs to deal with a large number of parts or any domain such as bio-industry that needs to deal with a large number of new terms which have complex structures. The use of UNAMER is even broader and it can benefit any situation where people need to deal with lots of non-standard spelling or usage in text data. Requiring subject matter experts to manually label a lot of data for machine learning is often not practical. Since we do not require a lot of labeled data to train machine learning, it has a big advantage in adapting to different contexts.
New: Before your career in data science, you received your PhD in philosophy studying the philosophy of language. How has this influenced your work in data science?
Kao: My approach to philosophy of language is very data driven. I follow Ordinary Language philosophy a la late Wittgenstein and JL Austin. Instead of assuming we can capture the meaning of language via a fixed mapping between language and the real world as provided, for example, by grammar and dictionary, this approach recognizes that language is evolving constantly to serve the goal of communication. As Wittgenstein noted: “The meaning of a word is its use in the language.”