When Should Data Scientists Try a New Technique?
hubie writes:
If a scientist wanted to forecast ocean currents to understand how pollution travels after an oil spill, she could use a common approach that looks at currents traveling between 10 and 200 kilometers. Or, she could choose a newer model that also includes shorter currents. This might be more accurate, but it could also require learning new software or running new computational experiments. How to know if it will be worth the time, cost, and effort to use the new method?
A new approach developed by MIT researchers could help data scientists answer this question, whether they are looking at statistics on ocean currents, violent crime, children's reading ability, or any number of other types of datasets.
The team created a new measure, known as the "c-value," that helps users choose between techniques based on the chance that a new method is more accurate for a specific dataset. This measure answers the question "is it likely that the new method is more accurate for this data than the common approach?"
Traditionally, statisticians compare methods by averaging a method's accuracy across all possible datasets. But just because a new method is better for all datasets on average doesn't mean it will actually provide a better estimate using one particular dataset. Averages are not application-specific.
So, researchers from MIT and elsewhere created the c-value, which is a dataset-specific tool. A high c-value means it is unlikely a new method will be less accurate than the original method on a specific data problem.
Read more of this story at SoylentNews.