16: Association
16.1 Association vs. Relationship
Understanding associations between variables is fundamental to drawing meaningful conclusions in applied statistics. It is important to distinguish between association and relationship: an association refers to a statistical link between two variables, while a relationship may suggest a broader or more conceptual connection. The statistical tests discussed in this chapter are descriptive and exploratory in nature. They help identify whether patterns or associations exist, but they are not intended to explain those patterns or determine their causes. Instead, these tools provide a foundation for further analysis and informed decision-making.
16.2 Correlation
Correlation is a statistical technique used to measure and describe the strength (how closely values move together) and direction (whether they increase or decrease together) of the association between two continuous variables. It provides a single number, called a correlation coefficient, that summarizes the degree to which the variables change in tandem.
The two most common types are Pearson’s correlation and Spearman’s correlation, each suited to different data conditions and assumptions.
16.3 Pearson’s Correlation
Pearson’s correlation coefficient (r) is the most widely used measure of correlation. It assesses the strength and direction of the linear association between two continuous variables and is appropriate when both variables are approximately normally distributed and have a linear relationship.
Assumptions
The key assumptions of Pearson’s correlation include linearity, normality of both variables, and homoscedasticity. Linearity means that the relationship between the two variables should follow a straight-line pattern: when one increases, the other tends to increase or decrease at a consistent rate. Normality means that each variable should follow a bell-shaped curve, with most values clustered around the middle and fewer at the extremes. Homoscedasticity means that the amount of variation in one variable is roughly the same across all values of the other variable. There shouldn’t be areas where the data suddenly becomes much more spread out or tightly clustered.
How To: Pearson Correlation
To run Pearson Correlation in Jamovi, go to the Analyses tab, select Regression, then Correlation Matrix.
- Select the variables you want to analyze and move them to the Variables box.
- Under Correlation Coefficients, check Pearson.
- Under Additional Options, check Flag significant correlations.
- Under Plot, check Correlation Matrix (this produces a scatterplot).
Phrasing Results: Pearson Correlation
Use this template to phrase significant results:
-
A Pearson correlation coefficient was calculated for the relationship between [Variable 1] and [Variable 2].
-
A [strength], [direction] correlation was found (r([degrees of freedom]) = [correlation coefficient], p < [approximate p-value]).
Use this template to phrase non-significant results:
-
A Pearson correlation coefficient was calculated for the relationship between [Variable 1] and [Variable 2].
-
The result (r([degrees of freedom]) = [correlation coefficient], p = [exact p-value]) indicated a non-significant relationship between the two variables.
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
16.4 Spearman’s Rank Correlation
Spearman’s Rank correlation (rₛ) is a non-parametric test that measures the strength and direction of the relationship between two ordinal or continuous variables. Unlike Pearson’s correlation, Spearman’s does not assume that the data follow a straight-line pattern or are normally distributed. This makes it useful for data that are ranked, or for continuous data that do not meet the assumptions required for Pearson’s correlation.
Assumptions
Spearman’s rank correlation is a nonparametric test that does not require the same assumptions as Pearson’s correlation. It is a good choice when the data are skewed, contain outliers, or do not follow a linear pattern, conditions that can distort Pearson’s correlation.
How To: Spearman’s Rank Correlation
To run Spearman’s Rank correlation in Jamovi, go to the Analyses tab, select Regression, then Correlation Matrix.
- Select the variables you want to analyze and move them to the Variables box.
- Under Correlation Coefficients, check Spearman (uncheck Pearson).
- Under Additional Options, check Flag significant correlations.
- Under Plot, check Correlation Matrix (this produces a scatterplot).
Phrasing Results: Spearman’s Rank Correlation
Use this template to phrase significant results:
-
A Spearman’s Rank correlation coefficient was calculated for the relationship between [Variable 1] and [Variable 2].
-
A [strength], [direction] correlation was found (rs([degrees of freedom]) = [correlation coefficient], p < [approximate p-value]).
Use this template to phrase non-significant results:
-
A Spearman’s Rank correlation coefficient was calculated for the relationship between [Variable 1] and [Variable 2].
-
The result (rs([degrees of freedom]) = [correlation coefficient], p = [exact p-value]) indicated a non-significant relationship between the two variables.
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
16.5 Understanding the Chi-Square Statistic
The chi-square statistic (χ²) is a key tool in inferential statistics, especially for analyzing categorical data. It compares the observed frequencies in a dataset to the frequencies expected under a specific hypothesis. This statistic is central to tests such as the binomial test, the goodness-of-fit test, and the test of independence, all of which assess whether the distribution of data differs meaningfully from what is expected.
A small chi-square value indicates that the observed and expected frequencies are similar, suggesting little to no difference. A large chi-square value indicates a greater discrepancy between observed and expected counts, which may lead to rejecting the null hypothesis. Understanding how the chi-square statistic works is essential for interpreting results in categorical data analysis.
16.6 Binomial Test
The Binomial Test assesses whether the observed frequency of outcomes in a categorical variable with two possible outcomes differs significantly from what would be expected by chance. It is useful when working with a small number of cases and when you want to test whether the proportion of one outcome matches a specific expected value.
Assumptions
The Binomial Test assumes that each observation is independent, meaning the outcome of one observation does not influence the outcome of another. It also assumes that the probability of each outcome stays the same across all observations.
How To: Binomial Test
To run the Binomial Test in Jamovi, go to the Analyses tab, select Frequencies, then 2 Outcomes Binomial Test.
- Move a 2-group nominal variable to the Variables box.
- Check the Confidence Intervals box.
TIP: The Binomial Test assumes an expected equal proportion between the two groups. You can change the expected proportion in the Test Value box.
Phrasing Results: Binomial Test
Use this template to phrase significant results:
-
A Binomial Test showed that the sample’s proportion of [Variable-group 1] and [Variable-group 2] significantly differed (p < [approximate p-value]) from the expected population proportion of 50%.
Use this template to phrase non-significant results:
-
A Binomial Test showed that the sample’s proportion of [Variable-Group 1] and [Variable-Group 2] did not significantly differ (p = [p-value]) from the expected population proportion of 50%.
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
16.7 Goodness-of-Fit Test
The Goodness-of-Fit Test is used to determine whether the observed distribution of a categorical variable matches an expected distribution. It is commonly applied to test whether a variable follows a specific pattern or whether the frequencies across categories are evenly distributed. The expected values can be based on theoretical proportions, prior research, or an assumption of equal probability across categories.
Assumptions
The Goodness-of-Fit Test assumes that each observation is independent, meaning that the classification of one case does not affect the classification of another. It also assumes that the categories being tested are mutually exclusive, so each observation can belong to only one category. Another important assumption is that the expected frequency in each category should be at least 5. If expected counts are too small, the chi-square distribution may not provide an accurate estimate, and the results of the test may be unreliable.
How To: Goodness-of-Fit Test
To run the Goodness-of-Fit Test in Jamovi, go to the Analyses tab, select Frequencies, then N Outcomes Chi-Square Goodness of Fit Test.
- Move a 3 or more group nominal variable to the Variables box.
- Check the Expected Counts box.
TIP: The Goodness-of-Fit Test assumes an expected equal proportion between the groups. You can view and change the expected proportion under Expected Proportions.
Phrasing Results: Goodness-of-Fit Test
Use this template to phrase significant results:
-
A Chi-Square Goodness-of-Fit Test indicated that the sample’s [variable] significantly differed (χ²([degrees of freedom]) = [chi-square statistic], p < [approximate p-value]) from the expected population proportions of [expected proportion].
Use this template to phrase non-significant results:
-
A Chi-Square Goodness-of-Fit Test indicated that the sample’s [variable] did not significantly differ (χ²([degrees of freedom]) = [chi-square statistic], p = [p-value]) from the expected population proportions of [expected proportion].
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
16.8 Test of Independence
The Test of Independence, also called the Test of Association, is used to determine whether there is a statistical association between two categorical variables or whether they are independent. It compares the observed frequencies in a contingency table to the frequencies that would be expected if no association existed between the variables.
Assumptions
The Test of Independence relies on several key assumptions to ensure valid results. First, it assumes that the observations are independent, meaning that each individual or case contributes to only one cell in the contingency table. Second, the variables must be categorical, with mutually exclusive groups. Third, the expected frequency in each cell of the table should generally be 5 or more.
How To: Test of Independence
To run the Test of Independence in Jamovi, go to the Analyses tab, select Frequencies, then Independent Samples Chi-Square Test of Association.
-
Move one nominal variable into the Row box and another into the Column box.
-
Under the Statistics drop-down, select Phi and Cramer’s V.
- Under Cells drop-down, select Observed counts and Expected Counts
- Optional: Under Percentages, select: Row or Column.
- Under Plots drop-down, select, Bar Plot
NOTE: The Test of Independence is powerful and complex test with many options that are beyond the scope of this book.
Phrasing Results: Test of Independence
Use this template to phrase significant results:
-
A Chi-Square Test of Independence was conducted to examine the association between [variable 1] and [variable 2].
-
The results indicated a statistically significant, [strength] association (χ²([degrees of freedom]) = [chi-square statistic], p < [approximate p-value], V = [Cramer’s V statistic]) between the two variables.
Use this template to phrase non-significant results:
-
A Chi-Square Test of Independence was conducted to examine the association between [variable 1] and [variable 2].
-
The results indicated there is no significant association (χ²([degrees of freedom]) = [chi-square statistic], p = p-value]) between the two variables.
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
Chapter 16 Summary and Key Takeaways
Four common statistical tests help examine associations between variables. Correlation tests measure the strength and direction of the association between continuous variables. The Binomial Test evaluates whether the frequency of two outcomes in a categorical variable differs from a specified proportion. The Goodness-of-Fit Test compares observed category frequencies to an expected distribution, and the Test of Independence assesses whether two categorical variables are statistically associated or independent. These tests are descriptive and exploratory tools that help identify meaningful patterns in data. However, statistical association does not imply causality. Understanding the assumptions and appropriate applications of each test allows researchers to draw accurate, evidence-based conclusions. All four tests can be easily conducted using Jamovi’s user-friendly interface, making it an accessible tool for exploring associations in both continuous and categorical data.
- Correlation quantifies the strength and direction of the association between continuous variables but does not imply causality.
- The Binomial Test evaluates whether observed binary outcomes differ significantly from a specified proportion.
- The Goodness-of-Fit Test compares observed and expected frequencies to assess how well the data fit a hypothesized distribution.
- The Test of Independence determines whether two categorical variables are statistically associated or independent.
- Association refers to a statistical connection between variables, while a relationship may suggest a broader conceptual or theoretical link.