Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
21: Repeated Measures
Chapter 21 Guiding Questions
What makes repeated measures designs analytically distinct?
How does dependency among observations affect analysis?
What assumptions are unique to repeated measures analyses?
When are repeated measures designs appropriate or inappropriate?
21.1 Measuring Changes Over Time
In quantitative research, especially when examining changes over time or the effects of different conditions on the same participants, repeated measures tests are essential. These tests account for the fact that the same individuals are measured more than once, allowing researchers to examine within-subject variation rather than differences between separate groups. A key advantage of repeated measures designs is that they reduce variability due to individual differences, providing greater statistical power and more sensitive analyses. Repeated measures designs are commonly used in longitudinal studies, clinical trials, and experimental settings where participants are exposed to multiple conditions over time.
21.2 Paired Samples t-Test
The Paired Samples t-Test (also known as the Dependent Samples t-Test) is used to compare the means of two related groups, such as measurements taken before and after an intervention on the same subjects. It tests whether the average difference between paired observations is significantly different from zero. This test is appropriate when the same individuals are measured under two different conditions or at two different time points.
Assumptions
The Paired Samples t-Test relies on several key assumptions. The dependent variable should be continuous and measured at the interval or ratio level. The pairs of observations must be related, meaning each value in one group is meaningfully paired with a value in the other (e.g., the same subject measured twice). The differences between the paired scores should be approximately normally distributed. This assumption is most important when the sample size is small; with larger samples, the test is more robust to violations of normality.
How To: Paired Samples t-Test
To run the Paired Samples t-Test in Jamovi, go to the Analyses tab, select T-Tests, then Paired Samples T-Test.
Move the paired interval variables into the Paired Variable box.
Under Additional Statistics, select: Mean difference, Effect size, Descriptives, and Descriptives plots.
Under Assumption Checks, select Normality test and Q-Q plot.
Understanding the Output
The output from the Paired Samples t-Test is shown below. The screenshots separate the results for display purposes, but the full output appears in a single Jamovi output window when all test options are selected.
Figure 21.1a. Paired Samples t-Test Results with Assumption Tests.
Figure 21.1b. Paired Samples t-Test Results with Descriptives and Mean Plot
Begin by checking the assumption of normality for the difference scores. In Jamovi, this can be evaluated using both the Shapiro–Wilk test and the Q–Q plot. The Shapiro–Wilk test provides a statistical test of whether the difference scores are approximately normally distributed. A non-significant result suggests that the normality assumption has not been violated.
The Q–Q plot provides a visual check of the same assumption. To interpret it, look at whether the plotted points fall close to the diagonal reference line. When the points generally follow the line, the distribution of difference scores is considered approximately normal. Small deviations are common, but large or systematic departures from the line may suggest that normality is questionable.
Once the normality assumption is judged acceptable, interpret the paired samples t-test table. The t statistic tests whether the mean difference between the two related measures is large enough, relative to the variability in the difference scores, to conclude that a difference likely exists in the population. The p-value indicates whether that difference is statistically significant. If the p-value is below the chosen alpha level, the null hypothesis of no average difference is rejected.
The mean difference indicates the direction and size of the difference between the two measures. Its sign shows which measure tends to have the higher average score. The standard error of the difference indicates how precisely that mean difference has been estimated.
The effect size, reported as Cohen’s d, helps evaluate the practical magnitude of the difference. This is useful because it shows how substantial the difference is, beyond whether it is statistically significant.
The descriptive statistics provide additional context by showing the sample size, mean, median, standard deviation, and standard error for each measure. These values help you understand how the scores are distributed in each condition before focusing on the difference between them.
Finally, the mean plot with confidence intervals provides a visual comparison of the two related measures and helps illustrate the direction and relative size of the difference.
Phrasing Results: Paired Samples t-Test
Use this template to phrase significant results:
A Paired Samples t-Test was conducted to compare [DV] between [condition 1] and [condition 2].
A significant difference was found (t([df]) = [t statistic], p < [approximate p-value]) with a [size] practical effect (d = [Cohen’s d]).
Use this template to phrase non-significant results:
A Paired Samples t-Test was conducted to compare [DV] between [condition 1] and [condition 2].
No significant difference was found (t([df]) = [t statistic], p = [p-value]).
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
21.3 Wilcoxon Signed-Rank Test
The Wilcoxon Signed-Rank Test is a non-parametric statistical test used to determine whether there is a significant difference between two related measurements. It works by ranking the absolute differences between paired observations, then analyzing the ranks to determine whether the differences are systematically in one direction, either mostly positive or mostly negative. This test is especially useful when working with ordinal data or continuous data that are not normally distributed.
Assumptions
The Wilcoxon Signed-Rank Test is a non-parametric alternative to the Paired Samples t-Test. It is used to compare two related groups when the assumption of normality for the difference scores is violated.
How To: Wilcoxon Signed-Rank Test
To run the Wilcoxon Signed-Rank Test in Jamovi, go to the Analyses tab, select T-Tests, then Paired Samples T-Test.
Move the paired interval variables into the Paired Variable box.
Under Tests, select: Wilcoxon rank (uncheck Student’s).
Under Additional Statistics, select: Mean difference, Effect size, Descriptives, and Descriptives plots.
Understanding the Output
The output from the Wilcoxon Signed-Rank Test is shown below.
Figure 21.2. Wilcoxon Signed-Rank Test Results with Descriptives and Mean Plot
Begin by examining the Wilcoxon W statistic and the p-value in the test table. The W statistic is calculated from the ranks of the absolute differences between paired observations. The p-value indicates whether the distribution of the paired differences suggests a statistically significant difference between the two related measures. If the p-value is below the chosen significance level (commonly .05), the null hypothesis that the median difference equals zero is rejected.
The mean difference provides an estimate of the direction and magnitude of the difference between the two measures. The sign of this value indicates which measure tends to have larger scores on average. The standard error of the difference reflects the precision of that estimate.
The effect size, reported here as the rank biserial correlation, describes the magnitude of the difference between the two related measures. Larger absolute values indicate a stronger difference between the paired observations, regardless of sample size.
The descriptive statistics table provides additional context by displaying the sample size, mean, median, standard deviation, and standard error for each measure. These values help summarize the central tendency and variability of each set of scores.
Finally, the plot comparing the two measures provides a visual representation of the differences between the paired conditions, allowing readers to see how the central tendency of the scores differs across the two related measurements.
Phrasing Results: Wilcoxon Signed-Rank Test
Use this template to phrase significant results:
A Wilcoxon Signed-Rank Test was conducted to compare [DV] between [condition 1] and [condition 2].
A significant difference was found (W = [W statistic], p < [approximate p-value]) with a [size] practical effect (r_bc = [Rank Biserial Correlation]).
Use this template to phrase non-significant results:
A Wilcoxon Signed-Rank Test was conducted to compare [DV] between [condition 1] and [condition 2].
No significant difference was found (W = [W statistic], p = [p-value]).
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
21.4 McNemar Test
The McNemar Test is a non-parametric test used for analyzing paired binary data. It is typically applied when comparing nominal responses from the same individuals measured at two time points or under two conditions. The test evaluates whether the proportion of cases that change from one category to another is statistically significant. It focuses specifically on individuals whose responses changed between measurements, rather than those who remained in the same category.
Assumptions
The McNemar Test has a few key assumptions. Each pair of observations must be independent of other pairs, meaning that each participant contributes only one paired response and is not represented more than once in the dataset. The test analyzes only the discordant pairs, which are the cases where a subject’s response changes from one condition to the other. Responses that remain the same in both conditions are excluded from the calculation. Because the test statistic is based solely on these discordant cases, it assumes there are enough of them (typically at least 10 to 25) to run the test.
How To: McNemar Test
To run the McNemar Test in Jamovi, go to the Analyses tab, select Frequencies, then McNemar Test under Paired Samples.
Move the first paired 2-group nominal variable into the Row box.
Move the second paired 2-group nominal variable into the Column box.
Optional: Under Percentages, select: Row or Column.
Understanding the Output
The output from the McNemar Test is shown below.
Figure 21.3. McNemar Test Results with Paired Contingency Table
The contingency table displays the number of participants in each combination of responses before and after the measurement. Each row represents the initial response category, and each column represents the response category at the second measurement. The counts in the cells show how many participants remained in the same category across both measurements or changed from one category to another.
When interpreting this table, particular attention is given to the off-diagonal cells, which represent cases where participants changed their responses between the two measurements. These cells show the number of individuals who moved from one category to the other.
The McNemar test evaluates whether the number of participants who changed in one direction differs significantly from the number who changed in the opposite direction. The chi-square statistic (χ²) represents the magnitude of the difference between the two types of changes. The p-value indicates whether the imbalance between these directional changes is statistically significant. If the p-value is below the selected significance level (commonly .05), the null hypothesis that the two types of changes occur equally often is rejected.
Finally, the sample size (N) indicates the total number of paired observations included in the analysis. This value reflects the number of participants whose responses were compared across the two measurements.
Phrasing Results: McNemar Test
Use this template to phrase significant results:
A McNemar’s Test was conducted to compare the proportions of [DV] between [condition 1] and [condition 2].
A significant difference was found (χ²([df]) = [χ² statistic], p < [approximate p-value]).
Use this template to phrase non-significant results:
A McNemar’s Test was conducted to compare the proportions of [DV] between [condition 1] and [condition 2].
No significant difference was found (χ²([df]) = [χ² statistic], p = [p-value]).
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
21.5 Repeated Measures ANOVA
Repeated Measures ANOVA is used when multiple measurements are taken from the same subjects over time or under different conditions. In this design, the independent variable (referred to as the within-subjects factor) represents the repeated conditions or time points measured for each participant. The test evaluates how this factor (or factors) influences a continuous dependent variable and also allows for the analysis of interaction effects when more than one within-subjects factor is included. Because the same individuals are measured repeatedly, the test accounts for within-subject variation, reducing error caused by individual differences.
Repeated Measures ANOVA is similar in purpose to the Paired Samples t-Test, as both are used with related or repeated observations from the same individuals. However, the t-test is limited to comparing only two related conditions or time points. Repeated Measures ANOVA should be used when there are three or more repeated measures. It provides an overall test of whether significant differences exist across conditions, and also allows for the inclusion of interaction effects, between-subjects factors, and covariates when appropriate. This makes Repeated Measures ANOVA a more flexible and comprehensive approach for analyzing repeated-measures data.
Assumptions
Repeated Measures ANOVA has several key assumptions. The dependent variable should be continuous and approximately normally distributed at each level of the within-subjects factor. The observations must be related, since the same participants are measured across all conditions. The test also assumes sphericity, meaning that the variances of the differences between all combinations of repeated measures are approximately equal.
How To: Repeated Measures ANOVA
To run the Repeated Measures ANOVA in Jamovi, go to the Analyses tab, select ANOVA, then Repeated Measures ANOVA.
In the Repeated Measures Factor box: Enter the RM Factor 1 name to reflect the factor connecting the related variables.
Enter the level names to describe the sub-factor of each variable.
In the Repeated Measures Cells box: Move the matching variable to the named sub-factor and repeat for all levels for the factor.
Under Effect Size, select η² (eta-squared).
Under Assumption Checks, select: all options under Sphericity tests, Homogeneity test (if including a Between-Subjects Factor), and Q-Q plot.
Under Post-Hoc Tests, move the Repeated Measures Factor to the box on the right.
Under Corrections, select: Tukey.
Under Estimated Marginal Means, move the Repeated Measures Factor to the Term 1 box.
Under Output, select: Marginal means plots and Marginal mean tables.
Understanding the Output
The output from the Repeated Measures ANOVA is shown below. The screenshots separate the results for display purposes, but the full output appears in a single Jamovi output window when all test options are selected.
Figure 21.4a. Repeated Measures ANOVA Results with Assumption Tests
Figure 21.4b. Repeated Measures ANOVA Results with Post-Hoc Comparisons
Figure 21.4c. Repeated Measures ANOVA Results with Estimated Marginal Means and Confidence Intervals Plot
Begin by examining the Within-Subjects Effects table. This table contains the primary test of whether the repeated conditions differ from one another. The F statistic compares the variability between the repeated conditions to the variability of the residual error. The p-value indicates whether the differences among the repeated measures are statistically significant. If the p-value is below the chosen significance level (commonly .05), the null hypothesis that all condition means are equal is rejected.
The effect size, reported as eta squared (η²), indicates the proportion of variability in the dependent variable that is associated with the repeated factor. Larger values suggest that the repeated condition explains a greater portion of the variability in the outcome.
The Between-Subjects Effects table reports variability attributable to differences between participants rather than differences between the repeated conditions. This information helps partition the total variability in the model but is not typically the primary focus when interpreting repeated measures effects.
Next, examine the Tests of Sphericity. Sphericity is an assumption of repeated measures ANOVA that requires the variances of the differences between all pairs of conditions to be approximately equal. Mauchly’s test evaluates this assumption. A non-significant result suggests that the assumption of sphericity has not been violated. If the assumption were violated, report the Greenhouse–Geisser correction for stronger violations or the Huynh–Feldt correction when the violation appears mild or when epsilon values are closer to 1.
The Q–Q plot provides a visual check of the normality of the residuals. When the points fall close to the diagonal reference line, the residuals are considered approximately normally distributed.
If the overall repeated measures test is statistically significant, post hoc comparisons help determine which specific pairs of conditions differ from one another. These pairwise tests compare the mean difference between each pair of conditions while controlling for multiple comparisons. The p-values indicate whether the difference between each pair of conditions is statistically significant.
Finally, the Estimated Marginal Means table and plot display the average score for each condition along with standard errors and confidence intervals. These values help illustrate the direction and magnitude of differences among the conditions and provide a visual summary of how the outcome changes across the repeated measurements.
Phrasing Results: Repeated Measures ANOVA
Use this template to phrase significant results:
A Repeated Measures ANOVA was conducted to examine differences in [DV] across the levels of [within-subjects factor].
A significant effect of [within-subjects factor] was found, F([df1], [df2]) = [F statistic], p < [approximate p-value], with a [size] practical effect (η² = [eta squared statistic]).
Use this template to phrase the post-hoc results:
A Tukey post-hoc test was conducted to determine the nature of the mean differences between levels of the [within-subjects factor].
This analysis revealed that pairwise comparisons between [condition level 1] and [condition level 2] (ΔM = [mean difference], p < [approximate p-value]) were significantly different.
NOTE: Only include the post-hoc results if the Repeated Measures ANOVA produces produces a significant result.
Use this template to phrase non-significant results:
A Repeated Measures ANOVA was conducted to examine differences in [DV] across the levels of [within-subjects factor].
No significant effect of [within-subjects factor] was found, F([df1], [df2]) = [F statistic], p = [p-value], with a [size] practical effect (η² = [eta squared statistic]).
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
TIP: The Repeated Measures ANOVA test in Jamovi can accommodate independent variables and covariates like other ANOVA tests, allowing for complex analyses.
21.6 Friedman Test
The Friedman Test is a non-parametric statistical test used to detect differences across three or more related conditions or time points in a within-subjects design. It is appropriate when the same subjects are measured repeatedly and the data are either ordinal or not normally distributed. The test ranks the values within each subject across conditions, then analyzes those ranks to determine whether the distribution of ranks differs significantly across the conditions.
Assumptions
The Friedman Test is a non-parametric alternative to Repeated Measures ANOVA, so it can be used when the assumption of normality for repeated measures data is violated.
How To: Friedman Test
To run the Friedman Test in Jamovi, go to the Analyses tab, select ANOVA, then Friedman under Non-Parametric.
Move all related interval variables into the Measures box.
Under the Variable List box, select: Pairwise comparisons, Descriptives, Descriptive plots (select Median).
Understanding the Output
The output from the Friedman Test is shown below.
Figure 21.5. Friedman Test Results with Pairwise Comparisons, Descriptives, and Descriptive Plot
Begin by examining the Friedman test table. The chi-square statistic (χ²) represents the overall test of whether the distributions of the repeated conditions differ from one another. The p-value indicates whether the observed differences among the conditions are statistically significant. If the p-value is below the chosen significance level (commonly .05), the null hypothesis that the distributions of the conditions are equal is rejected.
If the overall Friedman test is statistically significant, pairwise comparisons help determine which specific conditions differ from one another. These comparisons evaluate the differences between each pair of conditions while adjusting for multiple comparisons. The p-values indicate whether the difference between each pair of conditions is statistically significant.
The descriptive statistics table provides additional context by displaying the mean and median values for each condition. Because the Friedman test is based on ranked data, the median values are often especially helpful for understanding the relative ordering of the conditions.
Finally, the descriptive plot provides a visual representation of the central tendency of each condition. This plot helps illustrate how the outcome variable changes across the repeated measurements and can make it easier to see patterns or differences among the conditions.
Phrasing Results: Friedman Test
Use this template to phrase significant results:
A Friedman Test was conducted to examine differences in [dependent variable] across [condition].
A significant effect of [condition] was found, χ²([df]) = [chi-square value], p < [approximate p-value].
Use this template to phrase the post-hoc results:
Follow-up Durbin-Conover pairwise comparisons indicated that [Measure 1] and [Measure 2] differed significantly, Z = [Statistic value], p < [approximate p-value].
NOTE: Only include the post-hoc results if the Friedman Test produces produces a significant result.
Use this template to phrase non-significant results:
A Friedman Test was conducted to examine differences in [DV] across [condition].
No significant effect of [condition] was found, χ²([df]) = [chi-square value], p = [p-value].
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
21.7 Cochran’s Q Test
The Cochran’s Q Test is a non-parametric test used for analyzing binary outcomes measured across three or more related conditions or time points. It is an extension of the McNemar Test and is appropriate when binary data are collected from the same subjects under multiple conditions. This test is especially useful when examining whether the proportion of responses differs significantly across repeated measures in a within-subjects design.
Assumptions
The Cochran’s Q Test has several key assumptions. Each subject should contribute one binary response per condition, and the response categories must be consistent across conditions. Additionally, the observations between subjects must be independent, meaning that the response of one subject should not influence the responses of others.
How To: Cochran’s Q
Jamovi does not currently provide a direct Cochran’s Q procedure, so the nonparametric repeated-measures (Friedman) module is used to obtain the overall test of differences across related binary conditions
To run the Cochran’s Q Test in Jamovi, go to the Analyses tab, select ANOVA, then Friedman under Non-Parametric.
Move 3 or more 2-group nominal variables into the Measures box.
Under the Variable List box, select: Pairwise comparisons (This table organizes the pairs needed for post-hoc analysis and is not used for interpretation).
For significant results: run post-hoc analyses through the McNemar Test (explained earlier in the chapter).
Understanding the Output
The output from the Cochran’s Q Test is shown below.
Figure 21.6. Cochran’s Q Test Results with Exploratory Pairwise Comparisons
Begin by examining the overall test table. The chi-square statistic (χ²) represents the overall test of whether the proportions differ across the repeated measurements. The p-value indicates whether the observed differences among the conditions are statistically significant. If the p-value is below the chosen significance level (commonly .05), the null hypothesis that the proportions are equal across all conditions is rejected.
Jamovi also provides pairwise comparisons. Because the Friedman procedure is based on ranked data, these comparisons are not the formal follow-up tests for Cochran’s Q, which is designed for binary outcomes. However, they can still be helpful for identifying which pairs of conditions appear most likely to differ.
To formally evaluate those differences for binary repeated-measures data, follow-up analyses should be conducted using pairwise McNemar tests between the relevant conditions. When conducting multiple McNemar tests, a multiple-comparison correction such as Bonferroni should be applied to control the overall Type I error rate. Bonferroni correction adjusts the significance level when multiple tests are conducted to reduce the risk of false positives. This is done by dividing the chosen alpha level (commonly .05) by the number of comparisons being performed.
Phrasing Results: Cochran’s Q
Use this template to phrase significant results:
A Cochran’s Q Test was conducted to examine differences in [binary DV] across [condition].
A significant effect of [condition] was found, Q([df]) = [chi-square value], p < [approximate p-value].
Use this template to phrase the post-hoc results:
Follow-up McNemar tests with a Bonferroni correction (adjusted p-value = [adjusted p-value]) were conducted to explore pairwise differences.
Results indicated that [Condition 1] and [Condition 2] differed significantly, p = [corrected p-value].
NOTE: Only include the post-hoc results if the Cochran’s Q Test produces a significant result.
Use this template to phrase non-significant results:
A Cochran’s Q Test was conducted to examine differences in [binary dependent variable] across [condition].
No significant effect of [condition] was found, Q([df]) = [chi-square value], p = [p-value].
TIP: Replace the content inside the brackets with your variables and results, then remove the brackets.
NOTE: The Q statistic is the same as the chi-square value when used for the Cochran’s Q Test.
21.8 Ignoring Repeated Measures Structure
A common mistake occurs when repeated measures data are treated as independent observations. When the same participants are measured multiple times, their responses are inherently related. Treating those observations as independent violates a core statistical assumption and can distort standard errors and test statistics. Analyses such as paired samples t-tests or repeated measures ANOVA are designed to account for this dependency. Proper test selection requires recognizing whether observations are independent or related before choosing an analysis.
Chapter 21 Summary and Key Takeaways
Several statistical tests are available for analyzing repeated measures data. These include the Paired Samples t-Test for comparing the means of two related groups, and the Wilcoxon Signed-Rank Test as a non-parametric alternative when normality is violated. The McNemar Test is used for paired binary responses across two conditions. Repeated Measures ANOVA evaluates the effects of one or more within-subjects factors and supports the inclusion of interaction effects, between-subjects factors, and covariates. When data do not meet the assumptions of normality, the Friedman Test provides a non-parametric solution for comparing three or more related measurements. The Cochran’s Q Test is used to assess binary outcomes across three or more related time points or conditions. Understanding the appropriate use, assumptions, and interpretation of these tests supports sound analysis of within-subjects designs in quantitative research. Each of these analyses can be conducted in Jamovi using user-friendly interfaces that guide researchers through setup, diagnostics, and interpretation.
Paired Samples t-Test and Wilcoxon Signed-Rank Test are used to compare two related groups; Wilcoxon is preferred when data are ordinal or non-normal.
McNemar Test is appropriate for binary data measured at two time points or under two conditions.
Repeated Measures ANOVA supports the analysis of within-subjects factors and allows for testing interactions, covariates, and between-subjects factors.
Friedman Test is a non-parametric method for comparing three or more related measurements when assumptions of normality are violated.
Cochran’s Q Test is used for binary outcomes across three or more related time points or conditions.