Methodological note 2. Association and correlation in biostatistics, are they the same?
Association and Correlation in Biostatistics: Are They the Same Thing?
Evelia Apolinar-Jiméneza,*, Edna J. Nava-Gonzálezb
aUnidad de Metabolismo y Nutrición, Departamento de Investigación, Hospital Regional de Alta Especialidad del Bajío, Secretaría de Salud, México. bFacultad de Salud Pública y Nutrición, Universidad Autónoma de Nuevo León, Monterrey, Nuevo León, México.
When reviewing a scientific article or addressing research questions, it is common to identify
certain key terms relevant to those questions, which may share similarities.
It is not uncommon for some biostatistical concepts, which also form part of everyday
language, to be confused in their meaning and, consequently, in their use and interpretation.
This is particularly true for the terms association and correlation.
The analysis of the association between two or more variables forms part of statistical data
analysis. Association implies that the distribution of one variable's values changes in relation
to the values of another variable. Measures of association include relative risk (RR) and odds
ratio (OR).
The relative risk is calculated by dividing the outcome frequency in the exposed group by the
outcome frequency in the non-exposed group. OR can be calculated in both cross-sectional
studies and longitudinal studies, such as cohort or case-control studies. However, OR should
not be applied to cross-sectional studies, as it can be easily influenced by prevalence; in such
studies, the prevalence ratio (PR) is used instead. Both RR and OR should be accompanied
by confidence intervals, as these indicate the strength, direction, and possible range within
which the outcome occurs, as well as the likelihood of its occurrence.
In contrast, correlation is a statistical method used to evaluate the potential bidirectional
linear relationship between two continuous variables. The correlation coefficient ranges from
-1 to +1, indicating the strength of the association between the variables. A coefficient of zero
signifies no association between the two variables. A value of -1 indicates an inverse
(negative) correlation, meaning that as one variable increases, the other decreases.
Conversely, a coefficient of +1 represents a direct (positive) correlation, where an increase in
one variable corresponds to an increase in the other.
There are two types of correlation coefficients: Pearson and Spearman. Pearson's is used
when both variables follow a normal distribution, while Spearman's is used when one or both
variables do not follow a normal distribution.
Assessing the association between variables is essential for testing hypotheses and answering
research questions. Correctly using the terms association and correlation is crucial for
critically engaging with scientific literature, whether as a potential reviewer, and for
analysing, describing, and interpreting results as an author of a scientific article.
References:
- Grimes DA, Schulz KF. An overview of clinical research: the lay of the land. 2002 Jan 5;359(9300):57-61. doi: 10.1016/S0140-6736(02)07283-5.
- Mukaka MM. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012 Sep;24(3):69-71.