Non Linear Models of Econometrics

MODELLING VOLATILITY
An Excursion into Non-linearity Land
l  Motivation: the linear structural (and time series) models cannot explain a number of important features common to such financial data
                - leptokurtosis
                - volatility clustering or volatility pooling
                - leverage effects 
l  Our “traditional” structural model could be something like:
                 yt = b1 + b2x2t + ... + bkxkt + ut, or more compactly  y = Xb + u.
Non-linear Models: A Definition
l  Campbell, Lo and MacKinlay (1997) define a non-linear data generating process as one that can be written
                                yt = f(ut, ut-1, ut-2, …)
                where ut is an iid error term and f is a non-linear function.
l  They also give a slightly more specific definition as
                                yt = g(ut-1, ut-2, …)+ uts2(ut-1, ut-2, …)
                   where g is a function of past error terms only and s2 is a variance term.
l  Models with nonlinear g(•) are “non-linear in mean”, while those with nonlinear s2(•) are “non-linear in variance”.
Types of non-linear models
l  The linear paradigm is a useful one. Many apparently non-linear relationships can be made linear by a suitable transformation. On the other hand, it is likely that many relationships in finance are intrinsically non-linear.        
l  There are many types of non-linear models, e.g.
                - ARCH / GARCH
                - Switching models
                - Bilinear models
Testing for Non-linearity
l  The “traditional” tools of time series analysis (acf’s, spectral analysis) may find no evidence that we could use a linear model, but the data may still not be independent.
l  Portmanteau tests for non-linear dependence have been developed. The simplest is Ramsey’s RESET test, which took the form:
               
l  Many other non-linearity tests are available, e.g. the “BDS test” and the bispectrum test.
Heteroscedasticity
l  An example of a structural model is 
 with ut ~ N(0, sigma squre). 
l  The assumption that the variance of the errors is constant is known as
                homoscedasticity, i.e. Var (ut) =   (sigma squre)   
l  What if the variance of the errors is not constant?
                - heteroscedasticity
                - would imply that standard error estimates could be wrong. 
l  Is the variance of the errors likely to be constant over time? Not for financial data.
Modeling Volatility
l  In monetary theory and the theory of finance, financial asset portfolios are functions of the expected means and variances of the rates of returns. Increased volatility of security prices or rates of return are often indicators that the variances are not constant over time.  Engle (1982) introduced a new approach to modeling heteroscedasticity in a time series context.
The ARCH Specification
l  Autoregressive Conditional Heteroskedasticity (ARCH) models are specifically designed to model and forecast conditional variances. The variance of the dependent variable is modeled as a function of past values of the dependent variable and independent or exogenous variables. In developing an ARCH model, you will have to provide two distinct specifications—one for the conditional mean and one for the conditional variance.
ARCH Models
l  So use a model which does not assume that the variance is constant.
l  Recall the definition of the variance of ut:
                Sigma squre  = Var(ut½ ut-1, ut-2,...) = E[(ut-E(ut))2½ ut-1, ut-2,...]
                We usually assume that E(ut) = 0
                so  Sigma squre  = Var(ut ½ ut-1, ut-2,...) = E[ut2½ ut-1, ut-2,...].

l  What could the current value of the variance of the errors plausibly depend upon?
l  Previous squared error terms.
l  This leads to the autoregressive conditionally heteroscedastic model for the variance of the errors:
                                Sigma squre = a0 + a1 U squre minus 1
   This is known as an ARCH(1) model.
l  The full model would be
         yt = b1 + b2x2t + ... + bkxkt + ut ,  ut ~ N(0, sigma saure)
        where sigma squre = a0 + a1 U squre minus 1
l  We can easily extend this to the general case where the error variance depends on q lags of squared errors:
                                Sigma square  = a0 + a1         +a2           +...+aq U square minus q
l  This is an ARCH(q) model. 
l  Instead of calling the variance     , in the literature it is usually called ht, so the model is
                                         yt = b1 + b2x2t + ... + bkxkt + ut ,  ut ~ N(0,ht)
        where ht = a0 + a1         +a2             +...+aqU square minus q
l  For illustration, consider an ARCH(1). Instead of the above, we can write 
                yt = b1 + b2x2t + ... + bkxkt + ut ,     ut = vtst
                                              vt ~ N(0,1) 
l  The two are different ways of expressing exactly the same model. The first form is easier to understand while the second form is required for simulating from an ARCH model, for example.
Problems with ARCH(q) Models
l  How do we decide on q?
l  The required value of q might be very large
l  Non-negativity constraints might be violated.
l  When we estimate an ARCH model, we require ai >0 " i=1,2,...,q (since variance cannot be negative) 
l  A natural extension of an ARCH(q) model which gets around some of these problems is a GARCH model.
Generalised ARCH (GARCH) Models
l  Due to Bollerslev (1986). Allow the conditional variance to be dependent upon previous own lags
l  The variance equation is now
l  This is a GARCH(1,1) model, which is like an ARMA(1,1) model for the variance equation.
l  We could also write 
 
l  Substituting into (1) for st-12 : 
 
l  Now substituting into (2) for st-22  
 
l      An infinite number of successive substitutions would yield
   l   So the GARCH(1,1) model can be written as an infinite order ARCH model. 
l  We can again extend the GARCH(1,1) model to a GARCH(p,q): 
l    But in general a GARCH(1,1) model will be sufficient to capture the volatility clustering in the data. 
l  Why is GARCH Better than ARCH?
                - more parsimonious - avoids over fitting
                - less likely to breech non-negativity constraints
 The GARCH(1,1) Model                                                                                                                

Where
                
l  The (1,1) in GARCH(1,1) refers to the presence of a first-order GARCH term and a first-order ARCH term.  An ordinary ARCH model is a special case of a GARCH specification in which there are no lagged forecast variances in the conditional variance equation.
l  For example, if the asset return was unexpectedly large in either the upward or the downward direction, then the trader will increase the estimate of the variance for the next period.  This model is consistent with the volatility clustering often seen in financial returns data, where large changes in returns are likely to be followed by further large changes.
The Unconditional Variance under the GARCH Specification
l  The unconditional variance of ut is given by 
      
 When
     
  is termed “non-stationarity” in variance

       is termed intergrated GARCH

 lFor non-stationarity in variance, the conditional variance forecasts will not converge on their unconditional value as the horizon increases.
Estimation of ARCH / GARCH Models
l  Since the model is no longer of the usual linear form, we cannot use OLS. 
l  We use another technique known as maximum likelihood. 
l  The method works by finding the most likely values of the parameters given the actual data.  
l  More specifically, we form a log-likelihood function and maximise it.

l  The steps involved in actually estimating an ARCH or GARCH model are as follows 
 
  1. Specify the appropriate equations for the mean and the variance - e.g. an AR(1)- GARCH(1,1) model:
  2. Specify the log-likelihood function to maximise: 
  3.  The computer will maximise the function and give parameter values and their standard errors  
Extensions to the Basic GARCH Model
l  Since the GARCH model was developed, a huge number of extensions and variants have been proposed. Three of the most important examples are EGARCH, GJR, and GARCH-M models. 
l  Problems with GARCH(p,q) Models:
                - Non-negativity constraints may still be violated
                - GARCH models cannot account for leverage effects 
l  Possible solutions: the exponential GARCH (EGARCH) model or the GJR model, which are asymmetric GARCH models.
 The EGARCH Model
l  Suggested by Nelson (1991). The variance equation is given by 
 
l  Advantages of the model
- Since we model the log(st2), then even if the parameters are negative, st2
   will be positive.
- We can account for the leverage effect: if the relationship between   volatility and returns is negative, g, will be negative.
The GJR Model
l  For a leverage effect, we would see g > 0. 
l      We require a1 + g ³ 0 and a1 ³ 0 for non-negativity
News Impact Curves
The news impact curve plots the next period volatility (ht) that would arise from various positive and negative values of ut-1, given an estimated model.
News Impact Curves for Returns using Coefficients from GARCH and GJR Model Estimates:
GARCH-in Mean
l  We expect a risk to be compensated by a higher return. So why not let the return of a security be partly determined by its risk? 
l  Engle, Lilien and Robins (1987) suggested the ARCH-M specification. A GARCH-M model would be 
                               
l  d  can be interpreted as a sort of risk premium.
l  It is possible to combine all or some of these models together to get more complex “hybrid” models - e.g. an ARMA-EGARCH(1,1)-M model.
Testing Non-linear Restrictions or Testing Hypotheses about Non-linear Models
l  Usual t- and F-tests are still valid in non-linear models, but they are not flexible enough.
l  There are three hypothesis testing procedures based on maximum likelihood principles: Wald, Likelihood Ratio, Lagrange Multiplier.
l  Consider a single parameter, q to be estimated, Denote the MLE as  (estimated theta) and a restricted estimate as (congruent theta)     .
Likelihood Ratio Tests
l  Estimate under the null hypothesis and under the alternative.
l  Then compare the maximised values of the LLF.
l  So we estimate the unconstrained model and achieve a given maximised value of the LLF, denoted Lu
l  Then estimate the model imposing the constraint(s) and get a new value of the LLF denoted Lr.
l  Which will be bigger?
l  Lr £ Lu  comparable to RRSS ³ URSS 
l  The LR test statistic is given by
                                                                LR = -2(Lr - Lu) ~ c2(m)
    where m = number of restrictions
Hypothesis Testing under Maximum Likelihood
l  The vertical distance forms the basis of the LR test.
l  The Wald test is based on a comparison of the horizontal distance.
l      The LM test compares the slopes of the curve at A and B.
l  We know at the unrestricted MLE, L(estimated theta), the slope of the curve is zero. 
l  But is it “significantly steep” at L(congruent theta)   ?
l      This formulation of the test is usually easiest to estimate.
Estimating ARCH Models
a)            Option:
      Heteroskedasticity Consistent Covariances: You should use this option if you suspect that the residuals are not conditionally normally distributed.
b)            The Mean Equation:
      You can enter the specification in list form by listing the dependent variable followed by the regressors. You should add the C to your specification if you wish to include a constant. If your specification includes an ARCH-M term, you should add an appropriate specification.
c)            The variance Equation:
1         Under the ARCH specification label, you should choose the number of ARCH and GARCH terms. 
2         In the Variance Regressors, you may optionally list variables you wish to include in the variance specification.
ARCH Estimation Output
l  The output from ARCH estimation is divided into two sections:
1)      The upper part provides the standard output for the mean equation. 
2)      The lower part, labeled “Variance Equation” contains the coefficients, standard errors, z-statistics and p-values for the coefficients of the variance equation.  The ARCH parameters correspond toαand the GARCH parameters toβ. 
3)      Note that measures such as R2 may not be meaningful if there are no regressors in the mean equation.  Here, for example, the R2 is negative.
4)      The sum of the ARCH and GARCH coefficient (α+β) is very close to one, indicating that volatility shocks are quite persistent.
Working with ARCH Model
l  The ARCH LM test statistic is computed from an auxiliary test regression.  To test the null hypothesis that there is no ARCH up to order q in the residuals, we run the regression
l  where e is the residual.  This is a regression of the squared residuals on a constant and lagged squared residuals up to order q.  EViews reports two test statistics for this test regression.  The F-statistic is an omitted variable test for the joint significance of all lagged squared residuals.  The Obs*R-squared statistic is Engle’s LM test statistic, computed as the number of observations times the R2 from the test regression.  The exact finite sample distribution of the F-statistic under H0 is not known but the LM test statistic is asymptotically distributed x2(q) under quite general conditions.
The TARCH Model: Threshold ARCH


                where dt=1 if εt<0 , and 0 otherwise.
l  In this model, good news (εt>0 ), and bad new (εt<0 ), have differential effects on the conditional variance—good news has an impact of α, while bad news has an impact of α+ r.  If r>0 we say that the leverage effect exists.  If r≠0, the news impact is asymmetric.
The EGARCH Model
l  The specification for the conditional variance is
l  Note that the left-hand side is the log of the conditional variance.  This implies that the leverage effect is exponential, and that forecasts of the conditional variance are guaranteed to be nonnegative.  The presence of leverage effects can be tested by the hypothesis that r0.  The impact is asymmetric if r≠0.
Multivariate GARCH Models
l  Multivariate GARCH models are used to estimate and to forecast covariances and correlations. The basic formulation is similar to that of the GARCH model, but where the covariances as well as the variances are permitted to be time-varying.
l  There are 3 main classes of multivariate GARCH formulation that are widely used: VECH, diagonal VECH and BEKK.
                VECH and Diagonal VECH
l  e.g. suppose that there are two variables used in the model. The conditional covariance matrix is denoted Ht, and would be 2 ´ 2. Ht and VECH(Ht) are
               
BEKK and Model Estimation for M-GARCH
l  The BEKK Model uses a Quadratic form for the parameter matrices to ensure a positive definite variance / covariance matrix Ht.
l  Neither the VECH nor the diagonal VECH ensure a positive definite variance-covariance matrix.
l  An alternative approach is the BEKK model (Engle & Kroner, 1995).
l  In matrix form, the BEKK model is
               
l  Model estimation for all classes of multivariate GARCH model is again performed using maximum likelihood with the following LLF:
        
            where N is the number of variables in the system (assumed 2 above), q is a vector containing all of the parameters to be estimated, and T is the number of observations.

Presented by Dr. Babar Zaheer Butt to the students of MS/Ph.D at Iqra University Islamabad.