Search This Blog

Multiple Regression

Sometimes you may want to explain variability in a continuous DV using several different continuous IVs. Multiple regression allows us to build an equation predicting the value of the DV from the values of two or more IVs. The parameters of this equation can be used to relate the variability in our DV to the variability in specific IVs. Sometimes people use the term multivariate regression to refer to multiple regression, but most statisticians do not use .multiple" and .multivariate" as synonyms. Instead, they use the term .multiple" to describe analyses that examine the effect of two or more IVs on a single DV, while they reserve the term .multivariate" to describe analyses that examine the effect of any number of IVs on two or more DVs.
The general form of the multiple regression model is

Yi = β0 + β 1Xi1 + β 2Xi2 + . + βkXik + åi,.

The elements in this equation are the same as those found in simple linear regression, except that we now have k different parameters which are multipled by the values of the k IVs to get our predicted value. We can again use least squares estimation to determine the estimates of these parameters that best our observed data. Once we obtain these estimates we can either use our equation for prediction, or we can test whether our parameters are significantly different from zero to determine whether each of our IVs makes a significant contribution to our model.
Care must be taken when making inferences based on the coefficients obtained in multiple regression. The way that you interpret a multiple regression coefficient is somewhat different from the way that you interpret coefficients obtained using simple linear regression. Specifically, the value of a multiple regression coefficient represents the ability of part of the corresponding IV that is unrelated to the other IVs to predict the part of the DV that is unrelated to the other IVs. It therefore represents the unique ability of the IV to account for variability in the DV. One implication of the way coefficients are determined is that your parameter estimates become very difficult to interpret if there are large correlations among your IVs. The effect of these relationships on multiple regression coefficients is called multicollinearity. This changes the values of your coefficients and greatly increases their variance. It can cause you to find that none of your coefficients are significantly different from zero, even when the overall model does a good job predicting the value of the DV.

One implication of the way coefficients are determined is that your parameter estimates become very difficult to interpret if there are large correlations among your IVs. The typical effect of multicollinearity is to reduce the size of your parameter estimates. Since the value of the coefficient is based on the unique ability for an IV to account for variability in a DV, if there is a portion of variability that is accounted for by multiple IVs, all of their coefficients will be reduced. Under certain circumstances multicollinearity can also create a suppression effect. If you have one IV that has a high correlation with another IV but a low correlation with the DV, you can find that the multiple regression coefficient for the second IV from a model including both variables can be larger (or even opposite in direction!) compared to the coefficient from a model that doesn’t include the first IV. This happens when the part of the second IV that is independent of the first IV has a different relationship with the DV than does the part that is related to the first IV. It is called a suppression effect because the relationship that appears in multiple regression is suppressed when you just look at the variable by itself.

To perform a multiple regression in SPSS
Choose Analyze thengoto Regression thengot Linear.
Move the DV to the Dependent box.
Move all of the IVs to the Independent(s) box.
Click the Continue button.
Click the OK button.

The SPSS output from a multiple regression analysis contains the following sections.
Variables Entered/Removed. This section is only used in model building and contains no useful information in standard multiple regression.
Model Summary. The value listed below R is the multiple correlation between your IVs and your DV. The value listed below R square is the proportion of variance in your DV that can be accounted for by your IV. The value in the Adjusted R Square column is a measure of model fit, adjusting for the number of IVs in the model. The value listed below Std. Error of the Estimate is the standard deviation of the residuals.
ANOVA. This section provides an F test for your statistical model. If this F is significant, it indicates that the model as a whole (that is, all IVs combined) predicts significantly more variability in the DV compared to a null model that only has an intercept parameter. Notice that this test is affected by the number of IVs in the model being tested.
Coefficients. This section contains a table where each row corresponds to a single coefficient in your model. The row labeled Constant refers to the intercept, while the coefficients for each of your IVs appear in the row beginning with the name of the IV. Inside the table, the column labeled B contains the estimates of the parameters and the column labeled Std. Error contains the standard error of those estimates. The column labeled Beta contains the standardized regression coefficient. The column labeled t contains the value of the t-statistic testing whether the value of each parameter is equal to zero. The p-value of this test is found in the column labeled Sig. A significant t-test indicates that the IV is able to account for a significant amount of variability in the DV, independent of the other IVs in your regression model.

Multiple regression with interactions
In addition to determining the independent effect of each IV on the DV, multiple regression can also be used to detect interactions between your IVs. An interaction measures the extent to which the relationship between an IV and a DV depends on the level of other IVs in the model. For example, if you have an interaction between two IVs (called a two-way interaction) then you expect that the relationship between the first IV and the DV will be different across different levels of the second IV. Interactions are symmetric, so if you have an interaction such that the effect of IV1 on the DV depends on the level of IV2, then it is also true that the effect of IV2 on the DV depends on the level of IV1. It therefore does not matter whether you say that you have an interaction between IV1 and IV2 or an interaction between IV2 and IV1. You can also have interactions between more than two IVs. For example, you can have a three-way interaction between IV1, IV2, and IV3. This would mean that the two-way interaction between IV1 and IV2 depends on the level of IV3. Just like two-way interactions, three-way interactions are also independent of the order of the variables. So the above three-way interaction would also mean
that the two-way interaction between IV1 and IV3 is dependent on the level of IV2, and that the two-way interaction between IV2 and IV3 depends on the level of IV1.
It is possible to have both main effects and interactions at the same time. For example, you can have a general trend that the value of the DV increases when the value of a particular IV increases along with an interaction such that the relationship is stronger when the value of a second IV is high than when the value of that second IV is low. You can also have lower order interactions in the presence of a higher order interaction. Again, the lower-order interaction would represent a general trend that is modified by the higher-order interaction.

You can use linear regression to determine if there is an interaction between a pair of IVs by adding an interaction term to your statistical model. To detect the interaction effect of two IVs (X1 and X2) on a DV (Y) you would use linear regression to estimate the equation

Yi = b0 + b 1Xi1 + b 2Xi2 + b 3Xi1Xi2.

You construct the variable for the interaction term Xi1Xi2 by literally multiplying the value of X1 by the value of X2 for each case in your data set. If the test of b3 is significant, then the two predictors have an interactive effect on the outcome variable.
In addition to the interaction term itself, your model must contain all of the main effects of the variables involved in the interaction as well as all of the lower-order interaction terms that can be created using those main effects. For example, if you want to test for a three-way interaction you must include the three main effects as well as all of the possible two-way interactions that can be made from those three variables. If you do not include the lower-order terms then the test on the highest order interaction will produce incorrect results.
It is important to center the variables that are involved in an interaction before including them in your model. That is, for each independent variable, the analyst should subtract the mean of the independent variable from each participant.s score on that variable. The interaction term should then be constructed from the centered variables by multiplying them together. The model itself should then be tested using the centered main effects and the constructed interaction term. Centering your independent variables will not change their relationship to the dependent variable, but it will reduce the collinearity between the main effects and the interaction term. If the variables are not centered then none of the coefficients on terms involving IVs involved in the interaction will be interpretable except for the highest-order interaction. When the variables are centered, however, then the coefficients on the IVs can be interpreted as representing the main effect of the IV on the DV, averaging over the other variables in the interaction. The coefficients on lower-order interaction terms can similarly be interpreted as the testing the average strength of that lower-order interaction, averaging over the variables that are excluded from the lower-order interaction but included in the highest-order interaction term. Centering has the added benefit of reducing the collinearity between the main effect and interaction terms.

You can perform a multiple regression including interaction terms in SPSS just like you would a standard multiple regression if you create your interaction terms ahead of time. However, creating these variables can be tedious when analyzing models that contain a large number of
interaction terms. Luckily, if you choose to analyze your data using the General Linear Model procedure, SPSS will create these interaction terms for you (although you still need to center all of your original IVs beforehand). To analyze a regression model this way in SPSS

Center the IVs involved in the interaction.
Choose Analyze thengot General Linear Model thengot Univariate.
Move your DV to the box labeled Dependent Variable.
Move all of the main effect terms for your IVs to the box labeled Covariate(s).
Click the Options button.
Check the box next to Parameter estimates. By default this procedure will only provide you with tests of your IVs and not the actual parameter estimates.
Click the Continue button.
By default SPSS will not include interactions between continuous variables in its statistical models. However, if you build a custom model you can include whatever terms you like. You should therefore next build a model that includes all of the main effects of your IVs as well as any desired interactions. To do this
o Click the Model button.
o Click the radio button next to Custom.
o Select all of your IVs, set the drop-down menu to Main effects, and click the arrow button.
o For each interaction term, select the variables involved in the interaction, set the drop-down menu to Interaction, and click the arrow button.
o If you want all of the possible two-way interactions between a collection of IVs you can just select the IVs, set the drop-down menu to All 2-way, and click the arrow button. This procedure can also be used to get all possible three-way, fourway, or five-way interactions between a collection of IVs by setting the dropdown menu to the appropriate interaction type.
Click the Continue button.
Click the OK button.

The output from this analysis will contain the same sections found in standard multiple regression. When referring to an interaction, SPSS will display the names of the variables involved in the interaction separated by asterisks (*). So the interaction between the variables RACE and GENDER would be displayed as RACE * GENDER.

So what does it mean if you obtain a significant interaction in regression? Remember that in simple linear regression, the slope coefficient (b1) indicates the expected change in Y with a oneunit change in X. In multiple regression, the slope coefficient for X1 indicates the expected change in Y with a one-unit change in X1, holding all other X values constant. Importantly, this change in Y with a one-unit change in X1 is the same no matter what value the other X variables in the model take on. However, if there is a significant interaction, the interpretation of coefficients is slightly different. In this case, the slope coefficient for X1 depends on the level of the other predictor variables in the model.