heRcules

A REPOSITORY WITH SCRIPTS FOR LEARNING DATA ANALYSIS IN R

Keywords: R, Data analysis, Sample size, Plots, GitHub, Hypotheses tests

Abstract

Data analysis is a crucial step in the development of scientific projects, playing a central role in the validation and interpretation of study results. Before data collection begins, the researcher must meticulously and systematically plan their experiments and analyses, ensuring a robust approach that minimizes the influence of biases that could compromise the validity of the results. This document reports the creation of the "heRcules" repository, a public resource offering script models in the R language for scientific data analysis, with a particular focus on the Biological and Health Sciences. This repository is designed to be a valuable tool for researchers, providing ready-to-use scripts for executing essential tasks such as experimental planning, data analysis, result visualization, and hypothesis testing. The initial model, described in this document, includes scripts for a wide range of functions: sample size calculation, statistical power calculation, spreadsheet import, creation of vectors and data frames, descriptive statistics, file export, graph creation (using both base R and ggplot2), outlier tests, normality tests, and notebook creation with R Markdown. The repository is hosted on the GitHub platform (https://github.com/drhrf/heRcules.git), ensuring that the resources are available efficiently, free of charge, and collaboratively to the scientific community. This repository aims not only to facilitate the work of individual researchers but also to promote transparency and reproducibility in scientific research, providing a solid foundation for conducting rigorous and well-founded data analyses, such as those exemplified in the current model.

Author Biography

Hércules Rezende Freitas, Universidade Federal do Rio de Janeiro

Hércules é Biólogo e Matemático, com especializações em Fitoterapia, Farmacologia e Big Data. Também é Mestre e Doutor em Biofísica (UFRJ/Universidade de Coimbra), com pós-doutorado em Neuropatologia na Universidade da Califórnia. Atua como Bioestatístico para o Instituto Nacional de Traumatologia e Ortopedia e é professor universitário na Universidade do Grande Rio.

References

CHAMPELY, S. pwr: Basic Functions for Power Analysis. R package version 1.3-0, 2020. Disponível em: https://link.ufms.br/1gVny. Acesso em: 4 mar. 2004.

DEBASTIANI, V. J. Introdução ao R. [S. l.], 2020. Disponível em: https://link.ufms.br/jrVkK. Acesso em: 21 dez. 2021.

DRAGULESCU, A.; ARENDT, C. xlsx: Read, Write, Format Excel 2007 and Excel 97/2000/XP/2003 Files. R package version 0.6.5, 2020. Disponível em: https://link.ufms.br/50ihv. Acesso em: 4 mar. 2004.

GROSJEAN, P.; IBANEZ, F. pastecs: Package for Analysis of Space-Time Ecological Series. R package version 1.3.21, 2018. Disponível em: https://link.ufms.br/RC3TO. Acesso em: 4 mar. 2004.

KASSAMBARA, A. rstatix: Pipe-Friendly Framework for Basic Statistical Tests. R package version 0.7.0, 2021. Disponível em: https://link.ufms.br/aOTIi. Acesso em: 4 mar. 2004.

R CORE TEAM. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2013. Disponível em: https://link.ufms.br/U0dqv. Acesso em: 4 mar. 2004.

REVELLE, W. psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, 2021. Versão 2.1.9. Disponível em: https://link.ufms.br/R179A. Acesso em: 4 mar. 2004.

SHAPIRO, A. S. S.; WILK, M. B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika, v. 52, n. 3/4, p. 591–611, 1965. Disponível em: https://doi.org/10.2307/2333709. Acesso em: 4 mar. 2004.

TORCHIANO, M. effsize: Efficient Effect Size Computation. R package version 0.8.1, 2020. Disponível em: https://doi.org/10.5281/zenodo.1480624. Acesso em: 4 mar. 2004.

TUKEY, J. W. Comparing individual means in the analysis of variance. Biometrics, v. 5, n. 2, p. 99-114, 1949. Disponível em: https://doi.org/10.2307/3001913. Acesso em: 4 mar. 2004.

WARING, E.; QUINN, M.; MCNAMARA, A.; LA RUBIA, E. A.; ZHU, H.; ELLIS, S. skimr: Compact and Flexible Summaries of Data. R package version 2.1.3, 2021. Disponível em: https://link.ufms.br/g9Atv. Acesso em: 4 mar. 2004.

WICKHAM, H. Reshaping Data with the reshape Package. Journal of Statistical Software, v. 21, n. 12, p. 1-20, 2007. Disponível em: https://doi.org/10.18637/jss.v021.i12. Acesso em: 4 mar. 2004.

WICKHAM, H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

WICKHAM, H.; FRANÇOIS, R.; HENRY, L.; MÜLLER, K. dplyr: A Grammar of Data Manipulation. R package version 1.0.7, 2021. Disponível em: https://link.ufms.br/udQwn. Acesso em: 4 mar. 2004.

WUERTZ, D.; SETZ, T.; CHALABI, Y. fBasics: Rmetrics - Markets and Basic Statistics. R package version 3042.89.1, 2020. Disponível em: https://link.ufms.br/HOaQj. Acesso em: 4 mar. 2004.

ZHU, H. kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4, 2021. Disponível em: https://link.ufms.br/UUuNg. Acesso em: 4 mar. 2004.

Published
2024-12-26
How to Cite
FREITAS, H. R. heRcules. Edutec - Education, Digital Technologies, and Teacher Education, v. 4, n. 1, 26 Dec. 2024.

Funding data