Problem tractability refers to the concept that a proposed problem can actually be solved using machine learning methods. Problem tractability analysis may involve assessing whether there are sufficient signals encoded within the data set to predict a specific outcome, whether humans could solve the problem given the right data, or whether a solution could be found given the fundamental physics involved.
A first step to machine learning problem selection is ensuring that the problem can be solved. This involves thinking through the premise and formulation of the problem, including analysis of the available data. Supervised learning models, for instance, typically require hundreds of labels for training and often thousands, or even millions, of labels to learn to accurately predict outcomes. Are sufficient historical data available, and are there sufficient data signals and labels for an algorithm to be trained successfully? For many supervised learning problems, the number and quality of available labels becomes a key limiting issue.
Contrast, for example, trying to predict the failure of a very expensive and complex bespoke machine. The machine may be only partially instrumented with few input signals and may only have one or two historical failures from which an algorithm could learn. This is an example of a data-poor and label-poor environment and system in which the tractability of the problem formulation is unclear. Ultimately, this may not be a tractable problem for a supervised machine learning algorithm (Depending on the historical data and instrumentation available, other ML techniques may be applicable – e.g., unsupervised anomaly detection methods.)
C3 AI Application Platform and C3 AI Applications provide numerous tools and capabilities to aggregate, explore, and analyze available data sets prior to model development, aiding data scientists, developers, and analysts to assess problem tractability. In addition, based on C3 AI’s extensive experience in helping organizations solve large scale problems with machine learning, we have codified best practices around assessing problem tractability, that we make available to organizations.