20: Prediction
20.1 Prediction as a Statistical Relationship
In applied research, prediction refers to using statistical models to forecast the value of a dependent variable based on one or more independent variables. The goal of prediction is to understand how the independent variables are related to the dependent variable, and to use this relationship to make informed predictions about future or unseen data. Prediction plays a crucial role in fields like economics, psychology, medicine, and social sciences, as it helps researchers make evidence-based decisions and anticipate outcomes.
This chapter will focus on two essential prediction methods: Multiple Linear Regression and Binomial Logistic Regression. Both models predict the value of a dependent variable based on one or more predictors, but they differ in the type of data they handle and the assumptions they make. We will discuss the assumptions for each test and guide how to use them in applied research.
20.2 Multiple Linear Regression
Multiple Linear Regression is used to predict the value of a continuous dependent variable based on two or more independent variables. The relationship between the dependent variable and each independent variable is assumed to be linear, meaning that changes in the independent variables are associated with consistent, proportional changes in the dependent variable. This model is commonly used when researchers aim to understand the relationship between multiple predictors and an outcome.
Handling nominal independent variables in multiple linear regression requires transforming them into a format the model can interpret, usually through dummy coding (also known as one-hot encoding). Dummy coding involves creating binary variables for each category of the nominal variable. For instance, if you have a nominal variable such as “Gender” with two levels (Male, Female), a dummy variable can be created that takes the value of 0 for Male and 1 for Female (or vice versa). This allows the regression model to estimate the effect of each category relative to the reference category.
The assumptions for multiple linear regression include linearity (there should be a linear relationship between the dependent and independent variables), independence of errors (residuals should be independent), homoscedasticity (the variance of residuals should be constant across the levels of the independent variables), and normality of residuals (residuals should be normally distributed). Additionally, there should be no multicollinearity among the independent variables. Solutions like data transformations or using Principal Component Analysis (PCA) to reduce dimensionality can be considered if these assumptions are violated.
In Jamovi, multiple linear regression is run by selecting Analyses > Regression > Linear Regression, choosing the dependent and independent variables, and reviewing the model output, which will provide regression coefficients, R-squared values, p-values, and multicollinearity diagnostics.
How To: Multiple Linear Regression
Type your exercises here.
- First
- Second
Below is an example of the results generated when the steps are correctly followed.
IMAGE [INSERT NAME OF DATASET]
Interpretation
Phrasing Results: Multiple Linear Regression
Type your examples here.
- First
- Second
20.3 Binomial Logistic Regression
Binomial Logistic Regression is used when the dependent variable is binary, meaning it has two possible outcomes, such as 0/1, Yes/No, or Success/Failure. This model predicts the probability that an observation falls into one of the two categories based on one or more independent variables. Logistic regression is widely used when the outcome is categorical and researchers want to understand the likelihood of one outcome given the values of the predictors.
Like multiple linear regression, nominal independent variables in binomial logistic regression must be transformed into binary variables using dummy coding. These dummy variables are then included in the model as independent predictors. The assumptions for binomial logistic regression include a binary dependent variable, independence of observations, and linearity of the logit (the relationship between the independent variables and the log-odds of the dependent variable should be linear). Additionally, there should be no multicollinearity among the independent variables.
If the assumptions are violated, such as with non-linearity of the logit, solutions include applying log transformations or using polynomial logistic regression. For issues with multicollinearity, removing correlated variables or combining them into a single predictor can help.
In Jamovi, binomial logistic regression is run by selecting Analyses > Regression > Logistic Regression, choosing the binary dependent variable, and adding one or more independent variables. The results will provide odds ratios, regression coefficients, p-values, and model fit statistics.
How To: Binomial Logistic Regression
Type your exercises here.
- First
- Second
Below is an example of the results generated when the steps are correctly followed.
IMAGE [INSERT NAME OF DATASET]
Interpretation
Phrasing Results: Binomial Logistic Regression
Type your examples here.
- First
- Second
20.4 Choosing the Right Test for Prediction
Selecting the correct prediction model depends on the nature of your data and the research question. Multiple Linear Regression should be used when the dependent variable is continuous and there are multiple independent variables that you wish to assess. If the dependent variable is binary, Binomial Logistic Regression is appropriate, as it models the probability of the outcome and can handle both continuous and categorical independent variables.
To ensure the accuracy of your predictions, it is important to carefully assess the assumptions for each model before running them. Violations of these assumptions can lead to biased or misleading results, so addressing them properly is essential for effective prediction in research.
Chapter 20 Summary and Key Takeaways
This chapter discussed two important prediction methods used in applied research: Multiple Linear Regression and Binomial Logistic Regression. Multiple Linear Regression predicts a continuous dependent variable from multiple independent variables. In contrast, Binomial Logistic Regression is used for binary dependent variables to predict the probability of an outcome. For each of these tests, we reviewed the key assumptions. We guided how to address violations of these assumptions, such as using data transformations, handling multicollinearity, or considering robust regression techniques when necessary.
- Multiple Linear Regression is used for predicting a continuous dependent variable from multiple predictors.
- Binomial Logistic Regression is used for binary dependent variables, providing probabilities for the outcome.
- Always check for assumptions like normality, multicollinearity, and linearity, and use the appropriate transformations or robust methods when necessary.