It is debatable whether or not science is progressive.1 Evidence of “p-hacking” and scientific bias exists.2,3 However, we can increase the likelihood that science remains or becomes progressive by increasing transparency and using practices that reduce the chance of scientific errors, such as unsound interpretation of data. Specifically, we would like to discuss the importance of data visualization and data transparency, an area of great evolutionary need in our expectations of contributions to the International Journal of Sports Physiology and Performance (IJSPP). Over the years, many fields have highlighted the importance of improving how scientists present data. In 2015, Weissgerber et al4 noted many issues in data visualization present in the top physiology journals after reviewing over 700 published articles. The recommendation to “encourage more complete presentation of data” is equally or possibly even more important for journals like IJSPP, where studies with small sample sizes are often published, such as those including an elite athlete population. Further, readers interested in studies that focus on the elite athlete are often interested in individual performance or n = 1 analysis alongside the performance of a team or the group response.
There are numerous reasons why authors have failed to provide more transparent or complete data visualization practices. As noted by Weissgerber et al,4 a potential lack of specific scientific training on data presentation is likely a reason for the identified issues with data visualization in published research. Hence, the inspiration for this editorial is to increase awareness and to provide brief recommendations on data visualization in IJSPP. This briefing includes the ongoing requirement for skill development that supports transparent storytelling with data. The intention is not to vilify instances when authors may have improved their use of data visualizations or to assume that the use of an uninformative or inappropriately used bar plot was intentional. In fact, a principal barrier for the uptake of better data visualization practice is the systemic challenge of accessing data visualization software, a challenge that is being mitigated in the era of open source and open access software. Instead, this editorial is written in the spirit of Kaizen or the value in striving for continual improvement.
Show Me the Data!
While discussing data visualization as an abstract idea to convince researchers of the merits of enhanced data presentation, we should visualize the point! The following are two examples of randomly generated data to elicit similar Pearson correlation coefficients (Figure 1) and group mean change from before and after an intervention (Figure 2). These common research designs or analyses are performed in IJSPP articles.
—Randomly generated data to fit the same magnitude of correlation (r = .70 [95% confidence interval, .37–.87]) for (A) and (B) with implications for the strength of a correlation to be largely affected by a single data point (B) produced in R (version 4.0.2) (Supplementary Material [available online]).11
Citation: International Journal of Sports Physiology and Performance 15, 10; 10.1123/ijspp.2020-0813
—Bar graph and univariate graph using a free template4 with different versions of postintervention data but with similar means and SDs. Visualization would indicate in one instance (A) uniform response to the intervention versus (B) divergent response to the intervention.
Citation: International Journal of Sports Physiology and Performance 15, 10; 10.1123/ijspp.2020-0813
The interpretation of the data in Figure 1 without data visualization is shaped by the large magnitude of the effect with r = .70 (95% confidence interval, .37–.87). However, when placed into the context of the individual response inside the team, Figure 1B depicts an interesting outlier or a data point far from the “team response.” Without this data point, the strength of the correlation changes vastly (see Figure 1B annotation “data to notice”). The correct data visualization, one that shows the individual response, is valuable as it changes our level of uncertainty and the generalizability of our conclusion. Most importantly, by increasing transparency in the data visualization, we can assist other researchers to contextualize our findings and the potential reproducibility of the result in future studies (or practical applications), a key tenant of the scientific process. Nothing is right or wrong about a conclusion that uses a correlation coefficient, but in the absence of proper data visualization, dubious conclusions may be drawn. Further, as with many scientific pursuits, the outlier, the atypical result often piques our curiosity leading to deeper insights and better questions. This is a strength, not a weakness of telling a transparent story with data.
An intervention-based inquiry is a key part of the scientific research process,5 especially in applied settings, which are of great interest to the IJSPP readership. Figure 2 provides an example of how bar plots can be misleading in intervention research, especially when the sample size is small. Bar plots that show a mean group response prevent us from seeing the individual response. And, there is considerable scientific merit to the individual response in intervention-based studies. First, the notion of extreme responders, average responders, and nonresponders to intervention is well established. To this end, visualizations that show the individual response allow us to attend to this possibility in our data. Second, a frequentist probability perspective suggests the possibility of an outlying response to occur by chance alone. Data visualizations that show the individual response allow the reader to highlight the presence of the outlier and judge for themselves whether the scientific inference stands. Finally, as mentioned previously, it is the outlier that often leads us to bigger and more fruitful scientific questions. An unexpected finding often helps scientific progress.
Resources for Data Visualization
To simply say “do better data visualization” would be contrary to the ethos of Kaizen. We want to progress our science, not to condemn our scientists. Therefore, we direct researchers to the following resources to share the commitment to continual improvement in our research and publications practices:
- 1.Templates for visualization using Microsoft Excel (Microsoft Corp, Redmond, WA) as used in Figure 24 and several general articles and resources on data visualization6–9;
- 2.For interactive data visualization techniques related to the concepts shown in Figure 1, we recommend visiting the website by Kristoffer Magnussen,10 an excellent resource on the importance of data visualization and interpretation of statistical results;
- 3.An open-source book by Claus Wilke titled Fundamentals of Data Visualization that does not ascribe to any one programing language or software8;
- 4.And finally for individuals interested in getting started in using the open-source R programing language as used in Figure 1,11 there is a free online resource ggplot2: Elegant Graphics for Data Analysis that reduces the financial barrier to entry for all scientists.12
The recommendations in this editorial may change “the way we have always done things” or could require “an old dog to learn new tricks.” However, in our opinion, continuous change, growth and personal advancement, the principles of Kaizen pave the path to progressive science. Changing the standards for how we visualize data is not the “groundbreaking” part. Instead, this is the conduit to a shift in scientific thinking and how we interpret our data and make scientific inferences. In the 1996 film Jerry Maguire, Rod Tidwell, the character played by Cuba Gooding, Jr., says to Jerry Maguire, played by Tom Cruise: “Show me the money, Jerry!” Money is power and in science, data is power. We need to realize that it is not just “you” who needs to see the data but “me.” Transparent data visualization allows the readership (and reviewers) to make informed interpretations of our studies, enhances our understanding, strengthens a conclusion, and might even lead us to a new discovery. There is only one thing left to say, “Show Me the Data, Jerry!”
References
- 1.↑
Niiniluoto I. Scientific Progress. In: Zalta EN, ed. The Stanford Encyclopedia of Philosophy. Stanford University, Stanford, CA: Metaphysics Research Lab, Center for the Study of Language and Information Stanford University; 2019. https://plato.stanford.edu/archives/win2019/entries/scientific-progress/
- 2.↑
Nuzzo R. How scientists fool themselves—and how they can stop. Nature News. 2015;526(7572):182. doi:10.1038/526182a
- 3.↑
Bishop D. Rein in the four horsemen of irreproducibility. Nature. 2019;568(7753):435–435. doi:10.1038/d41586-019-01307-2
- 4.↑
Weissgerber TL, Milic NM, Winham SJ, Garovic VD. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol. 2015;13(4):e1002128. doi:10.1371/journal.pbio.1002128
- 6.↑
Slutsky DJ. The effective use of graphs. J Wrist Surg. 2014;3(2):67–68. doi:10.1055/s-0034-1375704
- 7.
Allen M, Poggiali D, Whitaker K, Marshall TR, Kievit RA. Raincloud plots: a multi-platform tool for robust data visualization. Wellcome Open Res. 2019;4:63. doi:10.12688/wellcomeopenres.15191.1
- 8.↑
Wilke CO. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. 1st ed. Sebastopol, CA: O’Reilly Media; 2019. https://clauswilke.com/dataviz/index.html. Accessed October 2, 2020.
- 9.↑
Weissgerber TL, Winham SJ, Heinzen EP, et al. Reveal, don’t conceal: transforming data visualization to improve transparency. Circulation. 2019;140(18):1506–1518. doi:10.1161/CIRCULATIONAHA.118.037777
- 10.↑
Magnusson K. Understanding and interpreting correlations—an interactive visualization. Interpreting Correlations. https://rpsychologist.com/d3/correlation/. Accessed October 2, 2020.
- 11.↑
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. 2020. https://www.R-project.org/
- 12.↑
Wickham H . ggplot2: Elegant Graphics for Data Analysis. Springer Science+Business Media; 2016. https://ggplot2-book.org/. Accessed October 2, 2020.