Search This Blog

VECTORS AND LOOPS

Vectors and loops are two tools drawn from computer programming that can be very useful when manipulating data. Their primary use is to perform a large number of similar computations using a relatively small program. Some of the more complicated types of data manipulation can only reasonably be done using vectors and loops.

A vector is a set of variables that are linked together because they represent similar things. The purpose of the vector is to provide a single name that can be used to access any of the entire set of variables. A loop is used to tell the computer to perform a set of procedures a specified number of times. Often times we need to perform the same transformation on a large number of variables. By using a loop, we only need to define the transformation once, and can then tell the computer to do the same thing to all the variables using a loop.

If you have computer-programming experience then you have likely come across these ideas before. However, what SPSS calls a “vector” is typically referred to as an “array” in most programming languages. If you are familiar with arrays and loops from a computer programming course, you are a step ahead. Vectors and loops are used in data manipulation in more or less the same way that arrays and loops are used in standard computer programming.

Vectors
Vectors can only be defined and used in syntax. Before you can use a vector you first need to define it. You must specify the name of the vector and list what variables are associated with it. Variables referenced by a vector are called “elements” of that vector. You declare a vector using the following syntax.

vector Vname = varX1 to varX2.

If the variables in the vector have not already been declared, you can do so as part of the vector statement. For more information on this, see page 904 of the
SPSS Base SyntaxReference Guide. The following are all acceptable vector declarations.

vector V = v1 to v8.
vector Myvector = entry01 to entry64.
vector Income = in1992 to in2000.

The vector is given the name Vname and is used to reference a set of variables defined by the variable list. The elements in the vector must be declared using the syntax first variable to last variable. You cannot list them out individually. This means that the variables to be included in a vector must all be grouped together in your data set.

Vectors can be used in transformation statements just like variables. However, the vector itself isn’t able to hold values. Instead, the vector acts as a mediator between your statement and the variables it references. The variables included in a vector are placed in a specific order, determined by the declaration statement. So if you give SPSS a vector and an order number (referred to as the index), it knows what specific element you want to access. You do not need to know what the exact name of the variable is - you just need to know its location in the vector.
References to items within a vector are typically made using the format

vname (index)

where vname is the name of the vector, and index is the numerical position of the desired element. Using this format, you can use a vector to reference a variable in any place that you would normally insert a variable name. For example, all of the following would be valid SPSS statements, assuming that we had defined the four variables above.

compute V(4) = 6.
if (Myvector(30)='house') correct = correct + 1.
compute change = Income(9) - Income(1).

Note that the index used by a vector only takes into account the position of elements in the vector - not the names of the variables. To reference the variable in1993 from in the Income vector above, you would use the phrase income(2), not income(1993).

Using vectors this way doesn’t provide us with much of an advantage - we are not really saving ourselves any effort by referring to a particular variable as Myvector(1) instead of entry01. The advantage comes in with the fact that the index of the vector itself can be a variable. In this case, the element that the vector will reference will depend on the value of the index variable. So the exact variable that is changed by the statement

depends on the value of t when this statement is executed. If t has the value of 1, then the variable grade1 will be incremented by 1. If t has a value of 8, then the variable grade8 will be incremented by 1. This means that the same statement can be used to perform many different things, simply depending what value you assign to t. This allows you to use vectors to write “generic” sections of code, where you control exactly what the code does by assigning different
values to the index variables.

Loops
Vectors are most useful when they are combined with loops. A loop is a statement that lets you tell the computer to perform a set of commands a specified number of times. In SPSS you can tell the computer to perform a loop by using the following code:

loop loop_variable = lower_limit to upper_limit.
--commands to be repeated appear here--
end loop.

When SPSS encounters a loop statement, what it does first is set the value of the loop variable to be equal to the lower limit. It then performs all of the commands inside the loop until it reaches the end loop statement. At that point the computer adds 1 to the loop variable, and then compares it to the upper limit. If the new value of the loop variable is less than or equal to the upper limit, it goes back to the beginning of the loop and goes through all of the commands again. If the new value is greater than the upper limit, the computer then moves to the statement after the end loop statement. Basically, this means that the computer performs the statements inside the loop a total number of times equal to (upper limit - lower limit + 1).
The following is an example of an SPSS program that uses a loop to calculate a sum:

compute x = 0.
loop #t = 4 to 8.
+ compute x = x + #t.
end loop.

The first line simply initializes the variable count to the value of zero. The second line defines the conditions of the loop. The loop variable is named t, and starts with a value of 4. The loop cycles until the value of t is greater than 8. This causes the program to perform a total of 5 cycles. During each cycle the current value of t is added to x. At the end of this set of statements, the variable x would have the value of 4 + 5 + 6 + 7 + 8 = 30.

In this example, the loop variable is denoted as a .scratch variable. because its first letter is a number sign (#). When something is denoted as a scratch variable in SPSS it is not saved in the final data set. Typically we are not interested in storing the values of our loop variables, so it is common practice to denote them as scratch variables. For more information on scratch variables see page 32 of the SPSS Base Syntax Reference Guide.

You will also notice the plus sign (+) placed before the compute statement in line 3. SPSS needs you to start all new commands in the first column of each line. Here we wish to indent the command to indicate that it is part of the loop. We therefore put the plus symbol in the first column which tells SPSS that the actual command starts later on the line.

Just in case you were wondering, the first statement setting x = 0 is actually necessary for the sum to be calculated. Most programming languages, including SPSS syntax, start variables with missing values. Adding anything to a missing value produces a missing value, so we must explicitly start the variable count at zero to be able to obtain the sum.

The Power of Combining Vectors and Loops
Though you can work with vectors and loops alone, they were truly designed to be used together. A combination of vectors and loops can save you incredible amounts of time when performing certain types of repetitive transformations. Consider the characteristics of vectors and loops. A vector lets you reference a set of related variables using a single name and an index. The index can be a variable or a mathematical expression involving one or more variables. A loop repeatedly performs a set of commands, incrementing a loop variable after each cycle. What would happen if a statement inside of a loop referenced a vector using the loop variable as the index? During each cycle, the loop variable increases by 1. So during each cycle, the vector would refer to a different variable. If you correctly design the upper and lower limits of your loop, you could use a loop to perform a transformation on every element of a vector.

For an example, let’s say that you conducted a reaction-time study where research participants observed strings of letters on the screen and judged whether they composed a real word or not. In your study, you had a total of 200 trials in several experimental conditions. You want to analyze your data with an ANOVA to see if the reaction time varies by condition, but you find that the
data has a right skew (which is common). To use ANOVA, you will need to transform the data so that it has a normal distribution, which involves taking the logarithm of the response time on each trial. In terms of your data set, what you need is a set of 200 new variables whose values are equal to the logarithms of the 200 response time variables. Without using vectors or loops, you would need to write 200 individual transformation statements to create each log variable from the corresponding response time variable. Using vectors and loops, however, we can do the same work with the following simple program. The program assumes that the original response time variables are rt001 to rt200, and the desired log variables will be lrt001 to lrt200.

vector Rtvector = rt001 to rt200.
vector Lvector = lrt001 to lrt200.
loop #item = 1 to 200.
+ compute Lvector(#item) = log(Rtvector(#item)).
end loop.

The first two statements set up a pair of vectors, one to represent the original response time variables and one to represent the transformed variables. The third statement creates a loop with 200 cycles. Each cycle of the loop corresponds to a trial in the experiment. The fourth line actually performs the desired transformation. During each cycle it takes one variable from Lvector and sets it equal to the log of the corresponding variable in Rtvector. The fifth
line simply ends the loop. By the time this program completes, it will have created 200 new variables holding the log values that you desire.

In addition to greatly reducing the number of programming lines, there are other advantages to performing transformations using vectors and loops. If you need to make a change to the transformation you only need to change a single statement. If you write separate transformations for each variable, you must change every single statement anytime you want to change the specifics of the transformation. It is also much easier to read programs that use loops than programs with large numbers of transformation statements. The loops naturally group together transformations that are all of the same type, whereas with a list you must examine each individual transformation to find out what it does.