Is Machine Learning The Future Of Coffee Health Research?

If you’ve been a reader of Sprudge for any reasonable amount of time, you’ve no doubt by now ready multiple articles about how coffee is potentially beneficial for some particular facet of your health. The stories generally go like this: “a study finds drinking coffee is associated with a X% decrease in [bad health outcome]” followed shortly by “the study is observational and does not prove causation.”

In a new study in the American Heart Association’s journal Circulation: Heart Failure, researchers found a link between drinking three or more cups of coffee a day and a decreased risk of heart failure. But there’s something different about this observational study. This study used machine learning to get to its conclusion, and it may significantly alter the utility of this sort of study in the future.

As reported by the New York Times, the new study isn’t exactly new at all. Led by David Kao, a cardiologist at University of Colorado School of Medicine, researchers re-examined the Framingham Heart Study (FHS), “a long-term, ongoing cardiovascular cohort study of residents of the city of Framingham, Massachusetts” that began in 1948 and has grown to include over 14,000 participants.

Whereas most research starts out with a hypothesis that it then seeks to prove or disprove, which can lead to false relationships being established by the sort variables researchers choose to include or exclude in their data analysis, Kao et al instead approached the FHS with no intended outcome. Instead, they utilized “a powerful and increasingly popular data-analysis technique known as machine learning” to find any potential links between patient characteristics captured in the FHS and the odds of the participants experiencing heart failure.

Able to analyze massive amounts of data in a short amount of time—as well as be programmed to handle uncertainties in the data, like if a reported cup of coffee is six ounces or eight ounces—machine learning can then start to ascertain and rank which variables are most associated with incidents of heart failure, giving even observational studies more explanatory power in their findings. And indeed, when the results of the FHS machine learning analysis were compare to two other well-known studies, the Cardiovascular Heart Study (CHS) and the Atherosclerosis Risk in Communities study (ARIC), the algorithm was able “to correctly predict the relationship between coffee intake and heart failure.”

But, of course, there are caveats. Machine learning algorithms are only as good as the data being fed to it. If the scope is too narrow, the results may not translate more broadly and it’s real-world predictive utility is significantly decreased. The New York Times offers facial recognition software as an example: “Trained primarily on white male subjects, the algorithms have been much less accurate in identifying women and people of color.”

Still, the new study shows promise, not just for the health benefits the algorithm uncovered, but for how we undertake and interpret this sort of analysis-driven research.

Zac Cadwalader is the managing editor at Sprudge Media Network and a staff writer based in Dallas. Read more Zac Cadwalader on Sprudge.