Tools of the Data Detective- A Review of Statistical Methods to Detect Data Anomalies in Psychology

Abstract

In psychology, it is largely assumed that when researchers carry out studies, they collect real data and analyze it honestly. This assumption, unfortunately, has not always held true. There have been numerous high-profile cases of data fraud in the social sciences and there exist many studies that have been retracted on the grounds of data fraud (see the Retraction Watch Database). Moreover, some meta-scientific work has found that data fraud occurs in 0.82 per 10,000 studies—a value that seems small on the surface but on a deeper level implicates thousands of research studies. Since data integrity is so fundamental to psychological research, it is important that there exist ways of detecting potential data anomalies in the literature. Fortunately to this end, there exist two main classes of statistical tools to detect data anomalies, including raw data tools and summary statistical tools. These tools were created with the intention of 1) detecting and removing papers that contain falsified/fabricated data within the literature, 2) discouraging and deterring potential data fabrication/falsification from occurring, and 3) investigating suspected instances of papers which may contain anomalous data or analyses. The aim of this talk was to introduce these two main classes of tools, explain how they worked, and consider whether they were appropriate to apply to psychological research articles suspected of fraud. Subsequently, I discussed the potential applicability of these tools as a screening measure for potential data anomalies, as well as emphasized how crucial it is that these tools be used thoughtfully, critically, and responsibly.

Gabriel Crone
Gabriel Crone
MA Student in Quantitative Methods