13: Distribution
13.1 Describing Distributions
Distribution refers to how the values in a dataset are spread or arranged. Understanding the distribution of data is essential for identifying patterns, relationships, and anomalies. It informs key characteristics such as central tendency, variability, and shape, all of which are critical for selecting appropriate statistical methods and drawing valid inferences. Important aspects of distribution include its shape, skewness, kurtosis, percentiles, and outliers. Together, these elements help researchers assess the structure of the data and choose the most suitable analytical tools.
13.2 Common Distribution Shapes
The shape of a distribution refers to the overall pattern of values when data is plotted. Recognizing distribution shape helps researchers understand how data is spread and whether certain statistical methods are appropriate.
A normal distribution (also known as a Gaussian distribution) is symmetric and bell-shaped, with most data points clustered around the mean. It follows the 68-95-99.7 rule, meaning approximately 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three. In a perfectly normal distribution, the mean, median, and mode are equal.
A skewed distribution occurs when data is not evenly distributed around the center. In a positively skewed (right-skewed) distribution, the tail extends to the right; most data points are on the lower end, and the mean is greater than the median. In contrast, a negatively skewed (left-skewed) distribution has a longer tail to the left, with the mean less than the median.
A uniform distribution has all values occurring with equal frequency, resulting in a flat, even shape. A bimodal distribution features two distinct peaks, often indicating that the data comes from two different groups or processes.
13.3 Skewness
Skewness refers to the degree of asymmetry in a distribution. In a positively skewed distribution, the right tail (containing larger values) is longer, and most values are concentrated on the lower end. In a negatively skewed distribution, the left tail (containing smaller values) is longer, and most values are concentrated on the higher end. A normal distribution has a skewness of 0, indicating perfect symmetry. Positive values of skewness indicate a right-skewed shape, while negative values indicate a left-skewed shape.
Skewness affects how central tendency is interpreted. When data is highly skewed, the mean may be misleading, and the median often provides a better representation of the typical value. Skewness also influences the choice of statistical tests, particularly those that assume normally distributed data.
13.4 Kurtosis
Kurtosis describes how much data is concentrated in the tails (the extreme high and low values) of a distribution, compared to a normal distribution. It helps researchers understand the likelihood of outliers and how sharply peaked the data is.
A distribution with high kurtosis, called a leptokurtic distribution, has a sharper peak and heavier tails, meaning more data is found in the extremes. These distributions are more prone to outliers. A distribution with low kurtosis, known as platykurtic, has a flatter peak and thinner tails, indicating fewer extreme values and less variation. A mesokurtic distribution falls in the middle, resembling a normal distribution with a moderate peak and tails. This type indicates a balanced dataset with a typical amount of variation and few extreme values.
Understanding kurtosis helps in assessing whether data contain unusual or extreme values that may affect interpretation and analysis.
13.5 Percentiles
Percentiles divide a dataset into 100 equal parts, helping to show the relative standing of individual data points within a distribution. The 25th percentile marks the point below which 25% of the data fall, the 50th percentile (the median) is the midpoint, and the 75th percentile marks the point below which 75% of the data fall. Percentiles are especially useful for comparing individual values to the overall distribution.
13.6 Outliers
Outliers are data points that fall significantly outside the overall pattern of a distribution. These extreme values may result from data entry errors, or they may represent rare but valid observations. Outliers can heavily influence statistical measures such as the mean, variance, and standard deviation, potentially distorting the results of analyses if not properly addressed.
Several methods are commonly used to identify and visualize outliers. Boxplots display potential outliers as individual points beyond the whiskers, which typically extend 1.5 times the interquartile range (IQR) above the third quartile or below the first quartile. Another approach uses z-scores, with values greater than +3 or less than –3 often flagged as outliers.
13.7 Distribution in Jamovi
Jamovi simplifies distribution analysis with built-in tools that make it easy to explore data shape, spread, and outliers.
How To: Distribution
To calculate skewness, kurtosis, percentiles, and outliers in Jamovi, go to the Analyses tab, select Exploration, then Descriptives.
- Move interval variables into the Variables box.
-
Under the Statistics drop-down, check Percentiles under Percentile Values.
-
Check Skewness and Kurtosis under Distribution.
-
Check Most extreme under Outliers.
- Optional: Check Mean, Median, Standard deviation, and IQR to compare with the distribution statistics.
13.8 Choosing the Right Measure of Dispersion
When deciding how to describe a dataset’s distribution, it’s important to choose the measure that best aligns with your analytical goals. Skewness, kurtosis, and percentiles each offer insight into different aspects of a distribution, and selecting the right one depends on what you’re trying to understand or communicate about your data.
If your primary concern is whether the data meet the assumptions of normality, especially in preparation for parametric tests like t-tests or ANOVA, skewness and kurtosis are the most appropriate. Skewness will tell you whether the data are symmetrical or biased to one side, which can impact the validity of statistical tests that assume balanced distributions. Kurtosis, on the other hand, helps assess the presence of extreme values or heavy tails, which may indicate the need for transformations or more robust methods.
When your goal is to compare individual scores or values to the rest of the dataset, percentiles are more useful. They provide context for interpreting relative performance or standing within the distribution, which is especially helpful in applied settings like education, health, or income research. Percentiles are also helpful for describing spread and position when the data are not normally distributed and when you want to avoid relying on the mean.
In some cases, using multiple measures together provides the most insight. For instance, reporting percentiles alongside skewness and kurtosis allows you to describe not only where values fall within the dataset, but also how balanced and extreme the overall shape of the distribution is. Ultimately, the best choice depends on whether you’re preparing for inferential analysis, comparing values, or describing the overall structure of your data.
Chapter 13 Summary and Key Takeaways
Distribution describes how values are spread across a dataset and is essential for interpreting patterns, identifying anomalies, and selecting appropriate statistical methods. Common distribution shapes include normal, skewed, uniform, and bimodal. Skewness measures the direction and degree of asymmetry, while kurtosis reflects how heavily values are concentrated in the tails of the distribution. Percentiles provide information about the relative standing of values, and outliers highlight unusual data points that may influence results. Jamovi makes it easier to calculate and interpret these aspects of distribution.
- Distribution describes how data points are spread across a dataset, with common shapes including normal, skewed, uniform, and bimodal.
- Skewness indicates the direction of asymmetry in the distribution and helps determine whether the mean or median is a better measure of center.
- Kurtosis assesses the presence of extreme values and shows whether a distribution has heavy or light tails compared to a normal distribution.
- Percentiles divide data into 100 equal parts and help evaluate the relative position of individual values.
- Outliers are extreme values that fall outside the typical pattern of the data and can distort statistical results.