"

13: Distribution

Chapter 13 Guiding Questions

  1. Why does distribution shape statistical decision-making?
  2. What does normality mean in applied research contexts?
  3. How do skewness and kurtosis affect interpretation?
  4. When should distributional assumptions be questioned or addressed?

13.1 Describing Distributions

Distribution refers to how the values in a dataset are spread or arranged. Understanding the distribution of data is essential for identifying patterns, relationships, and anomalies. It informs key characteristics such as central tendency, variability, and shape, all of which are critical for selecting appropriate statistical methods and drawing valid inferences. Important aspects of distribution include its shape, skewness, kurtosis, percentiles, and outliers. Together, these elements help researchers assess the structure of the data and choose the most suitable analytical tools.

13.2 Common Distribution Shapes

The shape of a distribution refers to the overall pattern of values when data is plotted. Recognizing distribution shape helps researchers understand how data is spread and whether certain statistical methods are appropriate.

A normal distribution (also known as a Gaussian distribution) is symmetric and bell-shaped, with most data points clustered around the mean. It follows the 68-95-99.7 rule, meaning approximately 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three, as shown below.

Normal distribution curve labeled with the 68–95–99.7 empirical rule.
Figure 13.1. Normal Distribution Curve

In a perfectly normal distribution, the mean, median, and mode are equal and in the middle of the distribution.

A skewed distribution occurs when data is not evenly distributed around the center. In a positively skewed (right-skewed) distribution, the tail extends to the right; most data points are on the lower end, and the mean is greater than the median, as shown below.

Right-skewed distribution showing mean greater than median.
Figure 13.2. Positive or Right Skew

In contrast, a negatively skewed (left-skewed) distribution has a longer tail to the left, with the mean less than the median, as shown below.

Left-skewed distribution showing mean less than median.
Figure 13.3. Negative or Left Skew

A uniform distribution has all values occurring with equal frequency, resulting in a flat, even shape. A bimodal distribution features two distinct peaks, often indicating that the data comes from two different groups or processes.

13.3 Skewness

Skewness refers to the degree of asymmetry in a distribution. In a positively skewed distribution, the right tail (containing larger values) is longer, and most values are concentrated on the lower end. In a negatively skewed distribution, the left tail (containing smaller values) is longer, and most values are concentrated on the higher end. A normal distribution has a skewness of 0, indicating perfect symmetry. Positive values of skewness indicate a right-skewed shape, while negative values indicate a left-skewed shape.

Skewness affects how central tendency is interpreted. When data is highly skewed, the mean may be misleading, and the median often provides a better representation of the typical value. Skewness also influences the choice of statistical tests, particularly those that assume normally distributed data.

13.4 Kurtosis

Kurtosis describes how much data is concentrated in the tails (the extreme high and low values) of a distribution, compared to a normal distribution. It helps researchers understand the likelihood of outliers and how sharply peaked the data is.

A distribution with high kurtosis, called a leptokurtic distribution, has a sharper peak and heavier tails, meaning more data is found in the extremes, as shown below.

Leptokurtic distribution showing a sharp central peak and heavier tails.
Figure 13.4. Leptokurtic or Positive Kurtosis

High kurtosis distributions are more prone to outliers.

A distribution with low kurtosis, known as platykurtic, has a flatter peak and thinner tails, indicating fewer extreme values and less variation, as shown below.

A low-kurtosis distribution showing a flatter peak and lighter tails.
Figure 13.5. Platykurtic or Negative Kurtosis

A mesokurtic distribution falls in the middle, resembling a normal distribution with a moderate peak and tails. This type indicates a balanced dataset with a typical amount of variation and few extreme values.

Understanding kurtosis helps in assessing whether data contain unusual or extreme values that may affect interpretation and analysis.

13.5 Percentiles

Percentiles divide a dataset into 100 equal parts, helping to show the relative standing of individual data points within a distribution. The 25th percentile marks the point below which 25% of the data fall, the 50th percentile (the median) is the midpoint, and the 75th percentile marks the point below which 75% of the data fall. Percentiles are especially useful for comparing individual values to the overall distribution.

13.6 Outliers

Outliers are data points that fall significantly outside the overall pattern of a distribution. These extreme values may result from data entry errors, or they may represent rare but valid observations. Outliers can heavily influence statistical measures such as the mean, variance, and standard deviation, potentially distorting the results of analyses if not properly addressed.

Several methods are commonly used to identify and visualize outliers. Boxplots display potential outliers as individual points beyond the whiskers, which typically extend 1.5 times the interquartile range (IQR) above the third quartile or below the first quartile. Another approach involves converting values to standard scores, or z-scores, which are discussed later in this chapter.

13.7 Distribution in Jamovi

Jamovi simplifies distribution analysis with built-in tools that make it easy to explore data shape, spread, and outliers.

How To: Distribution

To calculate skewness, kurtosis, percentiles, and outliers in Jamovi, go to the Analyses tab, select Exploration, then Descriptives.

  1. Move interval variables into the Variables box.
  2. Under the Statistics drop-down, check Percentiles under Percentile Values.
  3. Check Skewness and Kurtosis under Distribution.
  4. Check Most extreme under Outliers.
  5. Optional: Check Mean, Median, Standard deviation, and IQR to compare with the distribution statistics.

Understanding the Output

The output from the skewness, kurtosis, percentiles, and outliers is shown below.

 

Jamovi interface displaying skewness, kurtosis, and extreme values in Descriptives output.
Figure 13.6. Distribution Test Results

When interpreting distribution statistics, begin with skewness, which describes the symmetry of the distribution. Values close to zero suggest the data are approximately symmetric. Negative values indicate a slight left (negative) skew, meaning the lower scores extend farther than the higher scores. To determine whether skewness is meaningfully different from zero, compare it to its standard error; dividing skewness by its standard error provides a rough z-value. If that value is within approximately ±2, the distribution is not substantially skewed.

Next, consider kurtosis, which reflects the peakedness or tail weight of the distribution. Values near zero indicate a shape similar to a normal distribution. Negative values suggest a flatter distribution with lighter tails (platykurtic), whereas positive values indicate a more peaked distribution with heavier tails (leptokurtic). As with skewness, comparing kurtosis to its standard error helps determine whether deviations from normality are substantial.

The percentiles provide information about how scores are spread across the distribution. The 25th percentile represents the point below which 25% of scores fall, the 50th percentile is the median, and the 75th percentile marks where 75% of scores fall below. The distance between the 25th and 75th percentiles (the interquartile range) shows where the middle 50% of the data lie and gives insight into variability without being influenced by extreme values.

Finally, the extreme values table identifies the highest and lowest observed scores along with their row numbers in the dataset. This allows you to check for potential outliers or data entry errors and to evaluate whether extreme scores meaningfully influence the distribution.

13.8 Standard Scores (Z-Scores)

A z-score, also called a standard score, indicates how far a value is from the mean in units of standard deviation. Rather than focusing on the raw value itself, a z-score expresses the value’s relative position within the distribution.

  • A z-score of 0 means the value is exactly at the mean.
  • A positive z-score indicates that the value is above the mean.
  • A negative z-score indicates that the value is below the mean.

For example, a z-score of +1 means the value is one standard deviation above the mean, while a z-score of –2 means the value is two standard deviations below the mean. The larger the absolute value of the z-score, the farther the data point is from the center of the distribution.

Z-scores are especially useful when working with approximately normal distributions. Because normal distributions follow predictable patterns, standard scores help researchers determine how typical or unusual a value may be. Values with large positive or negative z-scores may indicate extreme observations.

Z-scores are also commonly used to identify potential outliers. Values that are more than three standard deviations from the mean are often considered extreme and may warrant closer examination.

Beyond descriptive analysis, z-scores provide the conceptual foundation for many inferential statistics. Statistical test results often represent standardized distances similar to z-scores. Understanding how standard scores work therefore strengthens interpretation of hypothesis tests, regression results, and other inferential procedures discussed later in this book.

13.9 Z-Scores in Jamovi

Jamovi makes it easy to create standardized variables using its built-in compute function. Z-scores can be generated quickly, allowing researchers to examine how individual values compare to the overall distribution.

How To: Z-Scores

To calculate z-scores in Jamovi, go to the Data tab and create a computed variable.

  • Click Compute to add a new variable.
  • Create a name for the variable.
  • Under the functions menu, double-click Z and apply it to the formula box.
  • Double-click the variable you want to standardize to add it to the formula.

Once the Z score option is selected and the analysis is run, Jamovi will create a new variable containing the standardized (z-score) values, similar to the image below.

 

Jamovi interface displaying computed Z scores in a new variable column.
Figure 13.7. Computing Z Scores

To review the new z-score variable, go to the Analyses tab, select Exploration, then Descriptives, and move the standardized variable into the Variables box. Under the Statistics drop-down, check Mean and Standard deviation to confirm that the standardized variable has a mean close to 0 and a standard deviation close to 1, as shown below.

 

Jamovi output showing descriptives for a z-score variable.
Figure 13.8. Checking Z Score Descriptives

You might notice the mean isn’t written exactly as 0. It may appear as a very small number in scientific notation (ex., 1.14e-16). That does not mean something is wrong. It simply reflects rounding differences in how Jamovi stores decimal values. For practical purposes, it is zero.

13.10 Choosing the Right Measure of Dispersion

When deciding how to describe a dataset’s distribution, it’s important to choose the measure that best aligns with your analytical goals. Skewness, kurtosis, and percentiles each offer insight into different aspects of a distribution, and selecting the right one depends on what you’re trying to understand or communicate about your data.

If your primary concern is whether the data meet the assumptions of normality, especially in preparation for parametric tests like t-tests or ANOVA, skewness and kurtosis are the most appropriate. Skewness will tell you whether the data are symmetrical or biased to one side, which can impact the validity of statistical tests that assume balanced distributions. Kurtosis, on the other hand, helps assess the presence of extreme values or heavy tails, which may indicate the need for transformations or more robust methods.

When your goal is to compare individual scores or values to the rest of the dataset, percentiles are more useful. They provide context for interpreting relative performance or standing within the distribution, which is especially helpful in applied settings like education, health, or income research. Percentiles are also helpful for describing spread and position when the data are not normally distributed and when you want to avoid relying on the mean.

In some cases, using multiple measures together provides the most insight. For instance, reporting percentiles alongside skewness and kurtosis allows you to describe not only where values fall within the dataset, but also how balanced and extreme the overall shape of the distribution is. Ultimately, the best choice depends on whether you’re preparing for inferential analysis, comparing values, or describing the overall structure of your data.

Chapter 13 Summary and Key Takeaways

Distribution describes how values are spread across a dataset and is essential for interpreting patterns, identifying anomalies, and selecting appropriate statistical methods. Common distribution shapes include normal, skewed, uniform, and bimodal. Skewness measures the direction and degree of asymmetry, while kurtosis reflects how heavily values are concentrated in the tails of the distribution. Percentiles provide information about the relative standing of values, and outliers highlight unusual data points that may influence results. Jamovi makes it easier to calculate and interpret these aspects of distribution.

  • Distribution describes how data points are spread across a dataset, with common shapes including normal, skewed, uniform, and bimodal.
  • Skewness indicates the direction of asymmetry in the distribution and helps determine whether the mean or median is a better measure of center.
  • Kurtosis assesses the presence of extreme values and shows whether a distribution has heavy or light tails compared to a normal distribution.
  • Percentiles divide data into 100 equal parts and help evaluate the relative position of individual values.
  • Outliers are extreme values that fall outside the typical pattern of the data and can distort statistical results.