Researchers find that a result of data leakage in science questions the credibility of the use of machine learning in science
Machine learning is a significant tool used by researchers and is expanding quickly due to its efficiency. Machine learning helps researchers to make predictions by analyzing patterns in their data.
A pair of researchers at Princeton University (New Jersey) are predicting a reproducibility crisis in machine learning use for researchers in science. As machine learning is being sold as a tool to researchers in their studies, the tool unfortunately lacks credibility, as there is not enough proper training involved for researchers, and therefore their results could very likely be flawed.
Reproducibility allows for others to replicate the results of the experiment given the data and receive similar output. Machine learning creates conflicts in this process: if researchers misuse the tool, their data are incorrect and so are the results. This doesn’t allow others to reproduce the data and diminishes the credibility of their experiment as there are errors in data analysis.
When applied to areas such as health and justice, errors in machine learning algorithms and flaws in data models could pose a real issue. To fix this data leakage, the researchers suggest that they use evidence in their manuscripts, through a template, to prove that their models don’t have each of the eight types of leakage.
Although machine learning is still new to other fields, researchers must attempt to avoid the crash and implement steps to take to avoid data leakage sooner rather than later.
Ameena Pathan works in strategic marketing and content development at That's Nice, LLC and is a contributor at Pharma's Almanac. She specializes in professional social media marketing, pharmaceutical marketing, online content production, web design, copywriting, and project management. She holds an MBA in international business from the Miami Herbert Business School.