Regression is a statistical tool that allows you to predict the value of one continuous variable from one or more other variables. When you perform a regression analysis, you create a regression equation that predicts the values of your DV using the values of your IVs. Each IV is associated with specific coefficients in the equation that summarizes the relationship between that IV and the DV. Once we estimate a set of coefficients in a regression equation, we can use hypothesis tests and confidence intervals to make inferences about the corresponding parameters in the population. You can also use the regression equation to predict the value of the DV given a specified set of values for your IVs.
Simple Linear Regression
Simple linear regression is used to predict the value of a single continuous DV (which we will call Y) from a single continuous IV (which we will call X). Regression assumes that the relationship between IV and the DV can be represented by the equation.
Yi = β0 + β 1Xi + åi,
where Yi is the value of the DV for case i, Xi is the value of the IV for case i, β0 and β1 are constants, and åi is the error in prediction for case i. When you perform a regression, what you are basically doing is determining estimates of β0 and β1 that let you best predict values of Y from values of X. You may remember from geometry that the above equation is equivalent to a straight line. This is no accident, since the purpose of simple linear regression is to define the line that represents the relationship between our two variables. β0 is the intercept of the line, indicating the expected value of Y when X = 0. β1 is the slope of the line, indicating how much we expect Y will change when we increase X by a single unit.
The regression equation above is written in terms of population parameters. That indicates that our goal is to determine the relationship between the two variables in the population as a whole. We typically do this by taking a sample and then performing calculations to obtain the estimated regression equation
Yi = b0 + b1Xi .
Once you estimate the values of b0 and b1, you can substitute in those values and use the regression equation to predict the expected values of the DV for specific values of the IV. Predicting the values of Y from the values of X is referred to as regressing Y on X. When analyzing data from a study you will typically want to regress the values of the DV on the values
of the IV. This makes sense since you want to use the IV to explain variability in the DV. We typically calculate b0 and b1 using least squares estimation. This chooses estimates that minimize the sum of squared errors between the values of the estimated regression line and the actual observed values.
In addition to using the estimated regression equation for prediction, you can also perform hypothesis tests regarding the individual regression parameters. The slope of the regression equation (β1) represents the change in Y with a one-unit change in X. If X predicts Y, then as X increases, Y should change in some systematic way. You can therefore test for a linear relationship between X and Y by determining whether the slope parameter is significantly different from zero.
When using performing linear regression, we typically make the following assumptions about the error terms åi.
1. The errors have a normal distribution.
2. The same amount of error in the model is found at each level of X.
3. The errors in the model are all independent.
To perform a simple linear regression in SPSS
Choose Analyze thengoto Regression thengoto Linear.
Move the DV to the Dependent box.
Move the IV to the Independent(s) box.
Click the Continue button.
Click the OK button.
The output from this analysis will contain the following sections.
Variables Entered/Removed. This section is only used in model building and contains no useful information in simple linear regression.
Model Summary. The value listed below R is the correlation between your variables. The value listed below R Square is the proportion of variance in your DV that can be accounted for by your IV. The value in the Adjusted R Square column is a measure of model fit, adjusting for the number of IVs in the model. The value listed below Std. Error of the Estimate is the standard deviation of the residuals.
ANOVA. Here you will see an ANOVA table, which provides an F test of the relationship between your IV and your DV. If the F test is significant, it indicates that there is a relationship.
Coefficients. This section contains a table where each row corresponds to a single coefficient in your model. The row labeled Constant refers to the intercept, while the row containing the name of your IV refers to the slope. Inside the table, the column labeled B contains the estimates of the parameters and the column labeled Std. Error contains the standard error of those parameters. The column labeled Beta contains the standardized regression coefficient, which is the parameter estimate that you would get if you standardized both the IV and the DV by subtracting off their mean and dividing by their standard deviations. Standardized regression coefficients are sometimes used in multiple regression (discussed below) to compare the relative importance of different IVs when predicting the DV. In simple linear regression, the standardized regression coefficient will always be equal to the correlation between the IV and the DV. The column labeled t contains the value of the t-statistic testing whether the value of each parameter is equal to zero. The p-value of this test is found in the column labeled Sig. If the value for the IV is significant, then there is a relationship between the IV and the DV. Note that the square of the t statistic is equal to the F statistic in the ANOVA table and that the p-values of the two tests are equal. This is because both of these are testing whether there is a significant linear relationship between your variables.