Econometrics for finance - Types of data

There are broadly three types of data that can be employed in quantitative analysis of financial problems: time series data, cross-sectional data, and panel data.

Time series data
Time series data, as the name suggests, are data that have been collected over a period of time on one or more variables. Time series data have associated with them a particular frequency of observation or collection of data points. The frequency is simply a measure of the interval over, or the regularity with which, the data are collected or recorded. Following are the examples of time series data.

Industrial production
Monthly, or quarterly
Government budget deficit
Money supply
The value of a stock
As transactions occur

A word on ‘As transactions occur’ is necessary. Much financial data does not start its life as being regularly spaced. For example, the price of common stock for a given company might be recorded to have changed whenever there is a new trade or quotation placed by the financial information recorder. Such recordings are very unlikely to be evenly distributed over time -- for example, there may be no activity between, say, 5p.m. when the market closes and 8.30a.m. the next day when it reopens; there is also typically less activity around the opening and closing of the market, and around lunch time. Although there are a number of ways to deal with this issue, a common and simple approach is simply to select an appropriate frequency, and use as the observation for that time period the last prevailing price during the interval.

It is also generally a requirement that all data used in a model be of the same frequency of observation. So, for example, regressions that seek to estimate an arbitrage pricing model using monthly observations on macroeconomic factors must also use monthly observations on stock returns, even if daily or weekly observations on the latter are available. The data may be quantitative (e.g. exchange rates, prices, number of shares outstanding), or qualitative (e.g. the day of the week, a survey of the financial products purchased by private individuals over a period of time, a credit rating, etc.).

Problems that could be tackled using time series data:
·         How the value of a country’s stock index has varied with that country’s macroeconomic fundamentals
·         How the value of a company’s stock price has varied when it announced the value of its dividend payment
·         The effect on a country’s exchange rate of an increase in its trade deficit.

In all of the above cases, it is clearly the time dimension which is the most important, and the analysis will be conducted using the values of the variables over time.

Cross-sectional data
Cross-sectional data are data on one or more variables collected at a single point in time. For example, the data might be on:
·         A poll of usage of Internet stockbroking services
·         A cross-section of stock returns on the New York Stock Exchange (NYSE)
·         A sample of bond credit ratings for UK banks.

Problems that could be tackled using cross-sectional data:
·         The relationship between company size and the return to investing in its shares
·         The relationship between a country’s GDP level and the probability that the government will default on its sovereign debt.

Panel data
Panel data have the dimensions of both time series and cross-sections, e.g. the daily prices of a number of blue chip stocks over two years. The estimation of panel regressions is an interesting and developing area, and will be examined in detail in chapter 10.

Fortunately, virtually all of the standard techniques and analysis in econometrics are equally valid for time series and cross-sectional data. For time series data, it is usual to denote the individual observation numbers using the index t, and the total number of observations available for analysis by T. For cross-sectional data, the individual observation numbers are indicated using the index i , and the total number of observations available for analysis by N. Note that there is, in contrast to the time series case, no natural ordering of the observations in a cross-sectional sample. For example, the observations i might be on the price of bonds of different firms at a particular point in time, ordered alphabetically by company name. So, in the case of cross-sectional data, there is unlikely to be any useful information contained in the fact that Northern Rock follows National Westminster in a sample of UK bank credit ratings, since it is purely by chance that their names both begin with the letter ‘N’. On the other hand, in a time series context, the ordering of the data is relevant since the data are usually ordered chronologically. Here the total number of observations in the sample will be given by T even in the context of regression equations that could apply either to cross-sectional or to time series data.

Continuous and discrete data
As well as classifying data as being of the time series or cross-sectional type, we could also distinguish it as being either continuous or discrete, exactly as their labels would suggest. Continuous data can take on any value and are not confined to take specific numbers; their values are limited only by precision. For example, the rental yield on a property could be 6.2%, 6.24% or 6.238%, and so on. On the other hand, discrete data can only take on certain values, which are usually integers1 (whole numbers), and are often defined to be count numbers. For instance, the number of people in a particular underground carriage or the number of shares traded during a day. In these cases, having 86.3 passengers in the carriage or 58571/2 shares traded would not make sense.

Cardinal, ordinal and nominal numbers
Another way in which we could classify numbers is according to whether they are cardinal, ordinal, or nominal. Cardinal numbers are those where the actual numerical values that a particular variable takes have meaning, and where there is an equal distance between the numerical values. On the other hand, ordinal numbers can only be interpreted as providing a position or an ordering. Thus, for cardinal numbers, a figure of 12 implies a measure that is ‘twice as good’ as a figure of 6. Examples of cardinal numbers would be the price of a share or of a building, and the number of houses in a street. On the other hand, for an ordinal scale, a figure of 12 may be viewed as ‘better’ than a figure of 6, but could not be considered twice as good. Examples of ordinal numbers would be the position of a runner in a race (e.g. second place is better than fourth place, but it would make little sense to say it is ‘twice as good’) or the level reached in a computer game.

The final type of data that could be encountered would be where there is no natural ordering of the values at all, so a figure of 12 is simply different to that of a figure of 6, but could not be considered to be better or worse in any sense. Such data often arise when numerical values are arbitrarily assigned, such as telephone numbers or when codings are assigned to qualitative data (e.g. when describing the exchange that a US stock is traded on, ‘1’ might be used to denote the NYSE, ‘2’ to denote the NASDAQ and ‘3’ to denote the AMEX). Sometimes, such variables are called nominal variables. Cardinal, ordinal and nominal variables may require different modeling approaches or at least different treatments.