"

18: Association

Chapter 18 Guiding Questions

  1. What does an association between variables indicate?
  2. What conclusions cannot be drawn from association alone?
  3. How do measurement and design shape the interpretation of relationships?
  4. How can associations be misused in applied settings?

18.1 Association vs. Relationship

Understanding associations between variables is fundamental to drawing meaningful conclusions in applied statistics. It is important to distinguish between association and relationship: an association refers to a statistical link between two variables, while a relationship may suggest a broader or more conceptual connection. The statistical tests discussed in this chapter are exploratory in nature. They help identify whether patterns or associations exist, but they are not intended to explain those patterns or determine their causes. Instead, these tools provide a foundation for further analysis and informed decision-making.

18.2 Correlation

Correlation is a statistical technique used to measure and describe the strength (how closely values move together) and direction (whether they increase or decrease together) of the relationship between two continuous variables. It provides a single number, called a correlation coefficient, that summarizes the degree to which the variables change in tandem.

The two most common types are Pearson’s correlation and Spearman’s correlation, each suited to different data conditions and assumptions.

Sample Size and Stability

In addition to meeting statistical assumptions, researchers must consider sample size when interpreting a correlation coefficient. In small samples, correlation estimates are more sensitive to random variation. A moderate or even strong correlation observed in a small sample may fluctuate considerably if the study were repeated with a different group of participants.

As sample size increases, correlation estimates become more stable and precise. Larger samples reduce sampling variability and narrow confidence intervals, providing greater confidence that the observed relationship reflects the underlying population pattern.

For this reason, correlation coefficients should be interpreted cautiously when sample sizes are small. Reporting confidence intervals alongside the correlation coefficient can provide additional context about the precision of the estimate.

18.3 Pearson’s Correlation

Pearson’s correlation coefficient (r) is the most widely used measure of correlation. It assesses the strength and direction of the linear association between two continuous variables and is appropriate when both variables are approximately normally distributed and have a linear relationship.

Assumptions

The key assumptions of Pearson’s Correlation include linearity, normality of both variables, and homoscedasticity. Linearity means that the relationship between the two variables should follow a straight-line pattern: when one increases, the other tends to increase or decrease at a consistent rate. Normality means that each variable should follow a bell-shaped curve, with most values clustered around the middle and fewer at the extremes. Homoscedasticity means that the amount of variation in one variable is roughly the same across all values of the other variable. There shouldn’t be areas where the data suddenly becomes much more spread out or tightly clustered.

How To: Pearson’s Correlation

To run Pearson’s Correlation in Jamovi, go to the Analyses tab, select Regression, then Correlation Matrix.

  1. Select the variables you want to analyze and move them to the Variables box.
  2. Under Correlation Coefficients, check Pearson.
  3. Under Additional Options, check Flag significant correlations.
  4. Under Plot, check Correlation Matrix (this produces a scatterplot).

Understanding the Output

The output from the Pearson’s Correlation is shown below.

 

Jamovi interface showing a correlation matrix and scatterplots for three personality variables.
Figure 18.1. Pearson’s Correlation Test Results with Scatterplot Matrix

To interpret the Pearson’s Correlation output in Jamovi, begin with the correlation matrix table in the Results panel. Each cell displays the correlation coefficient (r) for a pair of variables. The table is symmetrical, meaning the values above and below the diagonal are identical. The diagonal values are always 1.00 because each variable is perfectly correlated with itself. Therefore, you only need to interpret one half of the matrix.

First, examine the correlation coefficient (r). The sign indicates direction. A positive value means the variables tend to increase together. A negative value means that as one variable increases, the other tends to decrease.

Next, evaluate the magnitude of r to determine strength. As a general guideline:

  • Weak relationship: |r| ≈ .10 to .29
  • Moderate relationship: |r| ≈ .30 to .69
  • Strong relationship: |r| ≥ .70

Values close to 0 indicate little to no linear relationship.

Then review the p-value listed in the table. If the p-value is below your selected alpha level (e.g., .05), the relationship is statistically significant. Jamovi may also display asterisks next to significant coefficients. Always confirm what the symbols represent by checking the note beneath the table.

If degrees of freedom (df) are displayed, remember that for Pearson correlation, df = n − 2. This reflects the number of cases used to compute the test.

Below the matrix, Jamovi displays a scatterplot matrix. Each panel represents the relationship between two variables. Every point in a panel represents one case in the dataset. Focus on the overall pattern formed by the points:

  • Points clustered closely around an upward-sloping line indicate a stronger positive linear relationship.
  • Points clustered around a downward-sloping line indicate a stronger negative linear relationship.
  • Widely scattered points with a slight upward or downward trend indicate a weaker relationship.
  • A random cloud of points suggests little to no linear association.

Also check for outliers, which appear as points far removed from the general pattern. Outliers can meaningfully influence the correlation coefficient. Finally, confirm that the pattern appears linear, as Pearson’s r measures only linear relationships.

Phrasing Results: Pearson Correlation

Use this template to phrase significant results:

  • A Pearson correlation coefficient was calculated for the relationship between [Variable 1] and [Variable 2].
  • A [strength], [direction] correlation was found (r([degrees of freedom]) = [correlation coefficient], p < [approximate p-value]).

Use this template to phrase non-significant results:

  • A Pearson correlation coefficient was calculated for the relationship between [Variable 1] and [Variable 2].
  • The result (r([degrees of freedom]) = [correlation coefficient], p = [exact p-value]) indicated a non-significant relationship between the two variables.

TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.

18.4 Spearman’s Rank Correlation

Spearman’s Rank Correlation (rₛ) is a non-parametric test that measures the strength and direction of the relationship between two ordinal or continuous variables. Unlike Pearson’s correlation, Spearman’s does not assume that the data follow a straight-line pattern or are normally distributed. This makes it useful for data that are ranked, or for continuous data that do not meet the assumptions required for Pearson’s correlation.

Assumptions

Spearman’s Rank Correlation is a nonparametric test that does not require the same assumptions as Pearson’s correlation. It is a good choice when the data are skewed, contain outliers, or do not follow a linear pattern, conditions that can distort Pearson’s correlation.

How To: Spearman’s Rank Correlation

To run Spearman’s Rank Correlation in Jamovi, go to the Analyses tab, select Regression, then Correlation Matrix.

  1. Select the variables you want to analyze and move them to the Variables box.
  2. Under Correlation Coefficients, check Spearman (uncheck Pearson).
  3. Under Additional Options, check Flag significant correlations.
  4. Under Plot, check Correlation Matrix (this produces a scatterplot).

Understanding the Output

The output from the Spearman’s Rank Correlation is shown below.

 

Jamovi interface showing a correlation matrix and scatterplots for three personality variables.
Figure 18.2. Spearman’s Rank Correlation Results with Scatterplot Matrix

To interpret the Spearman’s Rank Correlation output in Jamovi, begin with the correlation matrix table in the Results panel. Each cell displays Spearman’s rho (rₛ) coefficient for a pair of variables. The table is symmetrical, meaning the values above and below the diagonal are identical. The diagonal values are not interpreted because each variable is perfectly correlated with itself. Therefore, you only need to interpret one half of the matrix.

First, examine Spearman’s rho (rₛ) coefficient. The sign indicates direction. A positive value means higher values on one variable tend to be associated with higher values on the other. A negative value means higher values on one variable tend to be associated with lower values on the other.

Next, evaluate the magnitude of rₛ to determine strength. As a general guideline:

  • Weak relationship: |rₛ| ≈ .10 to .29
  • Moderate relationship: |rₛ| ≈ .30 to .69
  • Strong relationship: |rₛ| ≥ .70

Values close to 0 indicate little to no overall association.

Then review the p-value listed in the table. If the p-value is below your selected alpha level (e.g., .05), the relationship is statistically significant. Jamovi may also display asterisks next to significant coefficients. Always confirm what the symbols represent by checking the note beneath the table.

If degrees of freedom (df) are displayed, remember that for correlation tests, df = n − 2. This reflects the number of cases used to compute the test.

Below the matrix, Jamovi displays a scatterplot matrix. Each panel represents the relationship between two variables. Every point in a panel represents one case in the dataset. Focus on the overall pattern formed by the points:

  • Points that generally rise from left to right indicate a stronger positive association.
  • Points that generally fall from left to right indicate a stronger negative association.
  • Widely scattered points with only a slight overall pattern indicate a weaker association.
  • A random cloud of points suggests little to no consistent association.

Also check for outliers, which appear as points far removed from the general pattern. Outliers can influence the appearance of the scatterplot and may affect the strength of the reported association. Finally, remember that Spearman’s rho is often used when the relationship is not perfectly straight but still shows a consistent overall direction.

Phrasing Results: Spearman’s Rank Correlation

Use this template to phrase significant results:

  • A Spearman’s Rank correlation coefficient was calculated for the relationship between [Variable 1] and [Variable 2].
  • A [strength], [direction] correlation was found (rₛ([degrees of freedom]) = [correlation coefficient], p < [approximate p-value]).

Use this template to phrase non-significant results:

  • A Spearman’s Rank correlation coefficient was calculated for the relationship between [Variable 1] and [Variable 2].
  • The result (rₛ([degrees of freedom]) = [correlation coefficient], p = [exact p-value]) indicated a non-significant relationship between the two variables.

TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.

18.5 Choosing Pearson or Spearman

Selecting the correct correlation coefficient depends on the type of variables and the pattern observed in the scatterplot.

Use a Pearson’s Correlation when both variables are continuous and the scatterplot shows a clear straight-line pattern. Pearson is appropriate when the relationship is linear, extreme outliers are not present, and assumptions of normality are reasonably met.

Use a Spearman’s Correlation when variables are ordinal, ranked, or not normally distributed. Spearman is also appropriate when the scatterplot shows a consistent upward or downward trend that is not perfectly straight, when outliers could meaningfully distort a Pearson correlation, or when the relationship is directional but curved rather than linear.

Do not rely on a correlation matrix when the relationship changes direction, such as in U-shaped patterns, when the scatterplot shows no consistent overall trend, or when a nonlinear model would better represent the pattern. Always examine the scatterplot before selecting the appropriate correlation test.

18.6 Understanding the Chi-Square Statistic

The chi-square statistic (χ²) is a key tool in inferential statistics, especially for analyzing categorical data. It compares the observed frequencies in a dataset to the frequencies expected under a specific hypothesis. This statistic is central to tests such as the binomial test, the goodness-of-fit test, and the test of independence, all of which assess whether the distribution of data differs meaningfully from what is expected.

A small chi-square value indicates that the observed and expected frequencies are similar, suggesting little to no difference. A large chi-square value indicates a greater discrepancy between observed and expected counts, which may lead to rejecting the null hypothesis. Understanding how the chi-square statistic works is essential for interpreting results in categorical data analysis.

18.7 Binomial Test

The Binomial Test assesses whether the observed frequency of outcomes in a categorical variable with two possible outcomes differs significantly from what would be expected by chance. It is useful when working with a small number of cases and when you want to test whether the proportion of one outcome matches a specific expected value.

Assumptions

The Binomial Test assumes that each observation is independent, meaning the outcome of one observation does not influence the outcome of another. It also assumes that the probability of each outcome stays the same across all observations.

How To: Binomial Test

To run the Binomial Test in Jamovi, go to the Analyses tab, select Frequencies, then 2 Outcomes Binomial Test.

  1. Move a 2-group nominal variable to the Variables box.
  2. Check the Confidence Intervals box.

TIP: The Binomial Test assumes an expected equal proportion between the two groups. You can change the expected proportion in the Test Value box.

Understanding the Output

The output from the Binomial Test is shown below.

 

Jamovi interface showing binomial test results for a two-outcome proportion test.
Figure 18.3. Binomial Test Results

To interpret the Binomial Test in Jamovi, begin by identifying the level being tested and reviewing the count and total columns. The count indicates how many cases fall into a particular category, while the total reflects the overall sample size used in the test.

Next, examine the proportion column. This value represents the observed proportion of cases in that category relative to the total sample. Compare this observed proportion to the test value specified in the analysis options. In a two-outcome binomial test, the default comparison value is often 0.50, meaning the test evaluates whether the observed proportion differs from an equal split between the two categories.

Then review the p-value. The p-value indicates whether the observed proportion is statistically different from the hypothesized test value. If the p-value is below the selected alpha level (e.g., .05), you conclude that the observed proportion is significantly different from the comparison value. If the p-value is above the alpha level, there is insufficient evidence to conclude that the proportion differs from the hypothesized value.

Finally, confirm the stated alternative hypothesis in the note beneath the table. In a two-tailed test, the analysis evaluates whether the proportion is not equal to the specified test value.

Phrasing Results: Binomial Test

Use this template to phrase significant results:

  • A Binomial Test showed that the sample’s proportion of [Variable-group 1] and [Variable-group 2] significantly differed (p < [approximate p-value]) from the expected population proportion of 50%.

Use this template to phrase non-significant results:

  • A Binomial Test showed that the sample’s proportion of [Variable-Group 1] and [Variable-Group 2] did not significantly differ (p = [p-value]) from the expected population proportion of 50%.

TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.

18.8 Goodness-of-Fit Test

The Goodness-of-Fit Test is used to determine whether the observed distribution of a categorical variable matches an expected distribution. It is commonly applied to test whether a variable follows a specific pattern or whether the frequencies across categories are evenly distributed. The expected values can be based on theoretical proportions, prior research, or an assumption of equal probability across categories.

Assumptions

The Goodness-of-Fit Test assumes that each observation is independent, meaning that the classification of one case does not affect the classification of another. It also assumes that the categories being tested are mutually exclusive, so each observation can belong to only one category. Another important assumption is that the expected frequency in each category should be at least 5. If expected counts are too small, the chi-square distribution may not provide an accurate estimate, and the results of the test may be unreliable.

How To: Goodness-of-Fit Test

To run the Goodness-of-Fit Test in Jamovi, go to the Analyses tab, select Frequencies, then N Outcomes Chi-Square Goodness of Fit Test.

  1. Move a 3 or more group nominal variable to the Variables box.
  2. Check the Expected Counts box.

TIP: The Goodness-of-Fit Test assumes an expected equal proportion between the groups. You can view and change the expected proportion under Expected Proportions.

Understanding the Output

The output from the Goodness-of-Fit Test  is shown below.

 

Jamovi interface showing chi-square goodness-of-fit results for a multi-category proportion test.
Figure 18.4. Goodness-of-Fit Test Results with Proportions

To interpret a chi-square Goodness-of-Fit Test in Jamovi, begin with the Proportions table. This table lists each category (level), the number of cases observed in each category (count), and the proportion of the total sample represented by each category. Review these values to understand how responses are distributed across categories.

Next, consider the expected proportions specified when setting up the analysis. The Goodness-of-Fit Test evaluates whether the observed distribution differs significantly from those expected proportions. The test does not determine which category is highest or lowest in isolation; rather, it assesses whether the overall pattern of counts differs from what would be expected under the null hypothesis.

Then review the chi-square statistic (χ²), degrees of freedom (df), and p-value. The chi-square statistic summarizes the overall discrepancy between observed and expected counts. Larger differences between observed and expected counts contribute more to the chi-square statistic. The degrees of freedom are based on the number of categories minus one. The p-value indicates whether the difference between observed and expected distributions is statistically significant. If the p-value is below the selected alpha level (e.g., .05), you conclude that the observed distribution differs significantly from the expected distribution. If it is above the alpha level, there is insufficient evidence to conclude that the distributions differ.

Phrasing Results: Goodness-of-Fit Test

Use this template to phrase significant results:

  • A Chi-Square Goodness-of-Fit Test indicated that the sample’s [variable] significantly differed (χ²([degrees of freedom]) = [chi-square statistic], p < [approximate p-value]) from the expected population proportions of [expected proportion].

Use this template to phrase non-significant results:

  • A Chi-Square Goodness-of-Fit Test indicated that the sample’s [variable] did not significantly differ (χ²([degrees of freedom]) = [chi-square statistic], p = [p-value]) from the expected population proportions of [expected proportion].

TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.

18.9 Test of Association

The Test of Association, also called the Test of Independence, is used to determine whether there is a statistical association between two categorical variables or whether they are independent. It compares the observed frequencies in a contingency table to the frequencies that would be expected if no association existed between the variables.

Assumptions

The Test of Association relies on several key assumptions to ensure valid results. First, it assumes that the observations are independent, meaning that each individual or case contributes to only one cell in the contingency table. Second, the variables must be categorical, with mutually exclusive groups. Third, the expected frequency in each cell of the table should generally be 5 or more.

How To: Test of Association

To run the Test of Association in Jamovi, go to the Analyses tab, select Frequencies, then Independent Samples Chi-Square Test of Association.

  1. Move one nominal variable into the Row box and another into the Column box.
  2. Under the Statistics drop-down, select Phi and Cramer’s V.
  3. Under Cells drop-down, select Observed counts and Expected Counts
  4. Optional: Under Percentages, select: Row or Column.
  5. Under Post Hoc Tests, select Standardized residuals (adjusted Pearson)
  6. Under Plots drop-down, select Bar Plot

NOTE: The Test of Independence is powerful and complex test with many options that are beyond the scope of this book.

Understanding the Output

The output from the Test of Association is shown below. The screenshots separate the results for display purposes, but the full output appears in a single Jamovi output window when all test options are selected.

 

Jamovi interface showing chi-square test of association results with observed and expected counts.
Figure 18.5a. Test of Association Results with Effect Size and Contingency Table

 

Jamovi interface showing standardized residuals and bar plot for a chi-square test of association.
Figure 18.5b. Test of Association Post Hoc Test Results with Comparative Bar Plot

To interpret a chi-square Test of Association in Jamovi, begin with the contingency table. The table displays the observed counts for each combination of the two categorical variables, along with the expected counts. Observed counts represent the actual number of cases in each cell. Expected counts represent the number of cases that would be expected in each cell if the two variables were independent.

Compare the observed and expected counts within each cell. Large differences between observed and expected values contribute more to the chi-square statistic. If the observed and expected counts are very similar across cells, this suggests little association between the variables.

Next, review the chi-square test results. The chi-square statistic summarizes the overall discrepancy between observed and expected counts. The degrees of freedom are based on the number of rows and columns in the table. The p-value indicates whether the overall association between the two variables is statistically significant. If the p-value is below the selected alpha level (e.g., .05), you conclude that the variables are not independent and that an association exists. If the p-value is above the alpha level, there is insufficient evidence to conclude that an association exists.

Then examine the effect size, typically reported as Cramer’s V for tables larger than 2 × 2. This value indicates the strength of the association. As a general guideline, values around .10 suggest a weak association, around .30 suggest a moderate association, and .50 or higher suggest a strong association. The effect size helps determine practical importance beyond statistical significance.

If post hoc standardized residuals are displayed, use them to identify which specific cells contribute most to the overall chi-square result. Standardized residuals represent the difference between observed and expected counts expressed in standard error units. Values greater than approximately ±2 (highlighted in red) indicate that a cell contributes meaningfully to the overall association. Positive residuals indicate more cases than expected, while negative residuals indicate fewer cases than expected.

Finally, review the bar plot to visually confirm the pattern in the contingency table. Differences in bar heights across categories can help illustrate where associations appear strongest.

Phrasing Results: Test of Association

Use this template to phrase significant results:

  • A Chi-Square Test of Association was conducted to examine the association between [variable 1] and [variable 2].
  • The results indicated a statistically significant, [strength] association (χ²([degrees of freedom]) = [chi-square statistic], p < [approximate p-value], V = [Cramer’s V statistic]) between the two variables.

Use this template to phrase non-significant results:

  • A Chi-Square Test of Association was conducted to examine the association between [variable 1] and [variable 2].
  • The results indicated there is no significant association (χ²([degrees of freedom]) = [chi-square statistic], p = p-value]) between the two variables.

TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.

18.10 Correlation Does Not Establish Causation

Correlation and other measures of association quantify the degree to which two variables move together. However, statistical association alone does not establish that one variable causes changes in another. Correlated variables may reflect reverse influence, shared underlying factors, or coincidental patterns within the data. Without experimental control, temporal ordering, or appropriate design safeguards, correlation results indicate relationships rather than causal mechanisms. Interpreting associations as evidence of causal influence therefore overstates what these analyses can demonstrate.

Chapter 18 Summary and Key Takeaways

Four common statistical tests help examine associations between variables. Correlation tests measure the strength and direction of the association between continuous variables. The Binomial Test evaluates whether the frequency of two outcomes in a categorical variable differs from a specified proportion. The Goodness-of-Fit Test compares observed category frequencies to an expected distribution, and the Test of Association assesses whether two categorical variables are statistically associated or independent. These tests are descriptive and exploratory tools that help identify meaningful patterns in data. However, statistical association does not imply causality. Understanding the assumptions and appropriate applications of each test allows researchers to draw accurate, evidence-based conclusions. All four tests can be easily conducted using Jamovi’s user-friendly interface, making it an accessible tool for exploring associations in both continuous and categorical data.

  • Correlation quantifies the strength and direction of the association between continuous variables but does not imply causality.
  • The Binomial Test evaluates whether observed binary outcomes differ significantly from a specified proportion.
  • The Goodness-of-Fit Test compares observed and expected frequencies to assess how well the data fit a hypothesized distribution.
  • The Test of Association determines whether two categorical variables are statistically associated or independent.
  • Association refers to a statistical connection between variables, while a relationship may suggest a broader conceptual or theoretical link.