REGRESSION


Regression is a statistical tool that allows you to predict the value of one continuous variable from one or more other variables. When you perform a regression analysis, you create a regression equation that predicts the values of your DV using the values of your IVs. Each IV is associated with specific coefficients in the equation that summarizes the relationship between that IV and the DV. Once we estimate a set of coefficients in a regression equation, we can use hypothesis tests and confidence intervals to make inferences about the corresponding parameters in the population. You can also use the regression equation to predict the value of the DV given a specified set of values for your IVs.

Simple Linear Regression
Simple linear regression is used to predict the value of a single continuous DV (which we will call Y) from a single continuous IV (which we will call X). Regression assumes that the relationship between IV and the DV can be represented by the equation.

Yi = β0 + β 1Xi + åi,

where Yi is the value of the DV for case i, Xi is the value of the IV for case i, β0 and β1 are constants, and åi is the error in prediction for case i. When you perform a regression, what you are basically doing is determining estimates of β0 and β1 that let you best predict values of Y from values of X. You may remember from geometry that the above equation is equivalent to a straight line. This is no accident, since the purpose of simple linear regression is to define the line that represents the relationship between our two variables. β0 is the intercept of the line, indicating the expected value of Y when X = 0. β1 is the slope of the line, indicating how much we expect Y will change when we increase X by a single unit.

The regression equation above is written in terms of population parameters. That indicates that our goal is to determine the relationship between the two variables in the population as a whole. We typically do this by taking a sample and then performing calculations to obtain the estimated regression equation

Yi = b0 + b1Xi .

Once you estimate the values of b0 and b1, you can substitute in those values and use the regression equation to predict the expected values of the DV for specific values of the IV. Predicting the values of Y from the values of X is referred to as regressing Y on X. When analyzing data from a study you will typically want to regress the values of the DV on the values
of the IV. This makes sense since you want to use the IV to explain variability in the DV. We typically calculate b0 and b1 using least squares estimation. This chooses estimates that minimize the sum of squared errors between the values of the estimated regression line and the actual observed values.

In addition to using the estimated regression equation for prediction, you can also perform hypothesis tests regarding the individual regression parameters. The slope of the regression equation (β1) represents the change in Y with a one-unit change in X. If X predicts Y, then as X increases, Y should change in some systematic way. You can therefore test for a linear relationship between X and Y by determining whether the slope parameter is significantly different from zero.

When using performing linear regression, we typically make the following assumptions about the error terms åi.

1. The errors have a normal distribution.
2. The same amount of error in the model is found at each level of X.
3. The errors in the model are all independent.

To perform a simple linear regression in SPSS
Choose Analyze thengoto Regression thengoto Linear.
Move the DV to the Dependent box.
Move the IV to the Independent(s) box.
Click the Continue button.
Click the OK button.

The output from this analysis will contain the following sections.
Variables Entered/Removed. This section is only used in model building and contains no useful information in simple linear regression.
Model Summary. The value listed below R is the correlation between your variables. The value listed below R Square is the proportion of variance in your DV that can be accounted for by your IV. The value in the Adjusted R Square column is a measure of model fit, adjusting for the number of IVs in the model. The value listed below Std. Error of the Estimate is the standard deviation of the residuals.
ANOVA. Here you will see an ANOVA table, which provides an F test of the relationship between your IV and your DV. If the F test is significant, it indicates that there is a relationship.
Coefficients. This section contains a table where each row corresponds to a single coefficient in your model. The row labeled Constant refers to the intercept, while the row containing the name of your IV refers to the slope. Inside the table, the column labeled B contains the estimates of the parameters and the column labeled Std. Error contains the standard error of those parameters. The column labeled Beta contains the standardized regression coefficient, which is the parameter estimate that you would get if you standardized both the IV and the DV by subtracting off their mean and dividing by their standard deviations. Standardized regression coefficients are sometimes used in multiple regression (discussed below) to compare the relative importance of different IVs when predicting the DV. In simple linear regression, the standardized regression coefficient will always be equal to the correlation between the IV and the DV. The column labeled t contains the value of the t-statistic testing whether the value of each parameter is equal to zero. The p-value of this test is found in the column labeled Sig. If the value for the IV is significant, then there is a relationship between the IV and the DV. Note that the square of the t statistic is equal to the F statistic in the ANOVA table and that the p-values of the two tests are equal. This is because both of these are testing whether there is a significant linear relationship between your variables.

CORRELATION


Pearson correlation
A Pearson correlation measures the strength of the linear relationship between two continuous variables. A linear relationship is one that can be captured by drawing a straight line on a scatterplot between the two variables of interest. The value of the correlation provides information both about the nature and the strength of the relationship.

Correlations range between -1.0 and 1.0.
The sign of the correlation describes the direction of the relationship. A positive sign indicates that as one variable gets larger the other also tends to get larger, while a negative sign indicates that as one variable gets larger the other tends to get smaller.
The magnitude of the correlation describes the strength of the relationship. The further that a correlation is from zero, the stronger the relationship is between the two variables. A zero correlation would indicate that the two variables aren’t related to each other at all.

Correlations only measure the strength of the linear relationship between the two variables. Sometimes you have a relationship that would be better measured by a curve of some sort rather than a straight line. In this case the correlation coefficient would not provide a very accurate measure of the strength of the relationship. If a line accurately describes the relationship between your two variables, your ability to predict the value of one variable from the value of the other is directly related to the correlation between them. When the points in your scatterplot are all clustered closely about a line your correlation will be large and the accuracy of the predictions will be high. If the points tend to be widely spread your correlation will be small and the accuracy of your predictions will be low.

The Pearson correlation assumes that both of your variables have normal distributions. If this is not the case then you might consider performing a Spearman rank-order correlation instead (described below).

To perform a Pearson correlation in SPSS
Choose Analyze thengoto Correlate thengoto Bivariate.
Move the variables you want to correlate to the Variables box.
Click the OK button.

The output of this analysis will contain the following section.
Correlations. This section contains the correlation matrix of the variables you selected. A variable always has a perfect correlation with itself, so the diagonals of this matrix will always have values of 1. The other cells in the table provide you with the correlation between the variable listed at the top of the column and the variable listed to the left of the row. Below this is a p-value testing whether the correlation differs significantly from zero. Finally, the bottom value in each box is the sample size used to compute the correlation.

Point-biserial correlation
The point-biserial correlation captures the relationship between a dichotomous (two-value) variable and a continuous variable. If the analyst codes the dichotomous variable with values of 0 and 1, and then computes a standard Pearson correlation using this variable, it is mathematically equivalent to the point-biserial correlation. The interpretation of this variable is similar to the interpretation of the Pearson correlation. A positive correlation indicates that group associated with the value of 1 has larger values than the group associated with the value of
0. A negative correlation indicates that group associated with the value of 1 has smaller values than the group associated with the value of 0. A value near zero indicates no relationship between the two variables.

To perform a point-biserial correlation in SPSS.
Make sure your categories are indicated by values of 0 and 1.
Obtain the Pearson correlation between the categorical variable and the continuous variable, as discussed above.

The result of this analysis will include the same sections as discussed in the Pearson correlation section.

Spearman rank correlation
The Spearman rank correlation is a nonparametric equivalent to the Pearson correlation. The Pearson correlation assumes that both of your variables have normal distributions. If this assumption is violated for either of your variables then you may choose to perform a Spearman rank correlation instead. However, the Spearman rank correlation is a less powerful measure of association, so people will commonly choose to use the standard Pearson correlation even when the variables you want to consider are moderately nonnormal. The Spearman Rank correlation is typically preferred over Kenda’s tau, another nonparametric correlation measure, because its scaling is more consistent with the standard Pearson correlation.

To perform a Spearman rank correlation in SPSS
Choose Analyze thengoto Correlate thengoto Bivariate.
Move the variables you want to correlate to the Variables box.
Check the box next to Spearman.
Click the OK button.

The output of this analysis will contain the following section.
Correlations. This section contains the correlation matrix of the variables you selected. The Spearman rank correlations can be interpreted in exactly the same way as you interpret a standard Pearson correlation. Below each correlation SPSS provides a p-value testing whether the correlation is significantly different from zero, and the sample size used to compute the correlation.