⬆️ ⬇️

Identification of cointegrated pairs of shares in stock markets

The purpose of this article is to share the results of a study to identify cointegrated pairs of stocks that are represented on the Moscow and New York stock exchanges, using the Angle-Granger test.



If we take two stocks with stationary increments , and find them some linear combination (spread), which will be stationary, then such a time series will be called cointegrated. The presence of cointegration gives us the opportunity to hedge shares and build a market neutral strategy. Why is this possible?



The principle on which profit is built



We all know that the price of a stock, regarded as a time series, can change quite significantly. If we make a position in any one paper, in most cases it will be a very risky game, since we will take all the risks associated with its volatility. However, there are such actions, from which it can be expected that, being paired, such series will not move too far from each other. This concept is called long-term dynamic equilibrium.



In the context of stationarity, long-term dynamic equilibrium takes on a more accurate form. If we take the stationary range of the spread, built between two cointegrated papers, it will have the property of returning to the average, that is, with any deviation from a certain equilibrium, it will tend to return back. The market neutral strategy is based on this principle.

')

How in the stock markets to find pairs connected by long-term dynamic equilibrium?



Correlation



The first thought that comes to mind is to calculate the correlation between the two papers and trade pairs with a strong correlation. This approach fails for two reasons.



First, if the price series of two stocks would have an ideal correlation, that is, if they changed in the same direction and in the same proportion, the difference between the rows would be zero, and we would not be able to earn any money, because none of the stocks will ever be too expensive or too cheap.



Secondly, the correlation does not give us enough information about the relationship of the two shares in the long term. For example, take a large and diversified portfolio of stocks. Let these shares also be included in the stock index, and let the weight of shares in the portfolio be determined by their weights in the index. Although the portfolio in the long run should move in accordance with the index, there will be periods when stocks that are in the index, but not in the portfolio, will have unusual price movements. Consequently, the empirical correlations between the portfolio and the index can be rather low for some time. Because of this, in the analysis, we simply discard such a portfolio and lose the opportunity to earn. It follows that correlation is not a good way to identify pairs.



It is better to use cointegration to identify pairs.



Cointegration



Often, to ensure the stationarity of the economic series, we take the difference. This leads to the following definition of integration.



The time series is called the integrated order. k and denoted by xt simI(k) if he and his difference to order kβˆ’1 inclusively nonstationary, and its order difference k stationary.



We only need the values ​​to get practical results. k=0 and k=1 . If a k=0 then the series itself will be stationary, and for brevity I will further denote such series I(0) . For k=1 the series will be non-stationary with stationary increments (first-order differences), and for brevity I will further denote such series I(1) .



May we have two I(1) row, xt and yt . Let, moreover, their linear combination ytβˆ’ betaxt is an I(0). In this case, the rows xt and yt are called cointegrated:

 varepsilont=ytβˆ’ betaxt simI(0).



In essence, cointegration is a regression of non-stationary series. It means that if  varepsilont has a zero mean, then this series will rarely deviate far from zero and often cross the zero level. In other words, from time to time, an exact equilibrium or a state close to it will be achieved.



Cointegration of logarithms of prices



We can consider cointegration not only between prices, but also between their logarithms. Unfortunately, cointegration between the logarithms of the prices of two stocks is less obvious and intuitive than just cointegration between the prices of two stocks. However, why is cointegration possible in the case of logarithms?



This is explained by the β€œefficient market hypothesis”, option pricing model and Ito's lemma. In fact, the effective market hypothesis does not have a strict formalization. This hypothesis suggests that in a liquid market, where the price of an asset will be the result of a balanced spontaneous supply and demand, the current price will accurately reflect all the information that is available to market players. Future changes in price can only be the result of β€œnews”, which by definition is unpredictable, so the best price forecast for any future date is just the price today. In other words, the price today is yesterday's price plus a random item.



The hypothesis of an effective market is connected with the basic pricing model of options. The fundamental assumption of this model is that the price of the underlying asset S satisfies the process of geometric Brownian motion (GBM):

 fracdSS= mudt+ sigmadW,



Where  mu and  sigma - constants, which represent, respectively, the displacement in the price of the asset and the volatility of profitability, and W - this is a Wiener process, that is, increments dW independent and normally distributed with zero mean and variance dt .



To see how the GBM equation is related to the efficient market hypothesis, one needs to apply Ito’s lemma to it. What is it? Assume that the values ​​of a variable x obey the stochastic differential equation (SDE)

dx= mudt+ sigmadW,



Where W Is a Wiener process, and  mu and  sigma - functions that depend on variables x and t . Assume also that the function f depends on variables x and t and has derivatives  frac partialf partialt ,  frac partialf partialx ,  frac partial2f partialx2 . Lemma Ito argues that this function obeys the equation

df=( frac partialf partialt+ mu frac partialf partialx+ frac sigma22 frac partial2f partialx2)dt+ sigma frac partialf partialxdW.



In essence, Ito's lemma is a formula for changing variables in the CDS, where under certain conditions the function of some CDS is also the CDS.



Let us return to the GBM equation and transform it into

dS= muSdt+ sigmaSdW.



Putting f=f(s,t) , by Ito's lemma we get:

df=( frac partialf partialt+ muS frac partialf partialS+ frac sigma2S22 frac partial2f partialS2)dt+ sigmaS frac partialf partialSdW.



We introduce a function f(S)= lnS . Insofar as

 frac partial lnS partialS= frac1S, frac partial2 lnS partialS2=βˆ’ frac1S2, frac partial lnS partialt=0,



we get:

$$ display $$ d \ ln ⁑S = (\ frac {\ partial \ ln S} {\ partial t} + \ mu S \ frac {\ partial \ ln ⁑S} {\ partial S} + \ frac { \ sigma ^ 2 S ^ 2} {2} \ frac {\ partial ^ 2 \ ln ⁑S} {\ partial S ^ 2}) dt + \ sigma S \ frac {\ partial \ ln ⁑S} {\ partial S } dW = \\ = (0 + \ mu S \ frac {1} {S} - \ frac {\ sigma ^ 2 S ^ 2} {2} \ frac {1} {S ^ 2}) dt + \ sigma S \ frac {1} {S} dW = (\ mu - \ frac {\ sigma ^ 2} {2}) dt + \ sigma dW. $$ display $$



The equation

d lnS=( muβˆ’ frac sigma22)dt+ sigmadW



can be rewritten in discrete form

 Delta lnSt=c+ varepsilont,



Where c= muβˆ’ sigma2/2 , but  varepsilont simNID(0, sigma2) so there is a process  varepsilont not just stationary, but a white noise. The concept of a stationary process is broader than white noise, and it differs in that a stationary process has a constant expectation, but it does not have to be zero, as is the case with white noise.



The discrete version of the equation given above can, in turn, be written as:

 lnSt=c+ lnStβˆ’1+ varepsilont.



This equation is a random walk (RW) model that is commonly used to simulate price logarithms in efficient financial markets, and is an example I(1) process. Thus, cointegration can also refer to the logarithms of stock prices.



Despite the fact that some skeptics (in particular, I) may doubt the adequacy of the description of the stock price by the GBM equation and, therefore, the possibility of cointegration between price logarithms, empirical data successfully dispel this skepticism. I checked: if the prices are cointegrated, then their logarithms are cointegrated.



Cointegration testing



The first method of testing cointegration came up with Robert Angle and Clive Granger. In 2003, they received the Nobel Prize in Economics for developing a cointegration method for analyzing time series. They described it 15 years before the prize, in 1987 in the article β€œCointegration and error correction: representation, estimation and testing”.



Conceptually, in order to determine from existing observations whether time series are xt and yt cointegrated, we need to test the null hypothesis H0: varepsilont simI(1) the absence of cointegration between the rows xt and yt against alternative hypothesis H0: varepsilont simI(0) . If the null hypothesis is rejected, then cointegration is recognized.



The original test for cointegration received the name of the test Angle-Granger in honor of its founders. It is a two-step process preceded by a check. xt and yt on first-order integrability, xt simI(1) and yt simI(1) . We discussed this in detail in the article on stationary increments . In fact, it describes all the preparatory work that needs to be done before proceeding directly to the Angle-Granger test. Let's say we did it.



Rows xt and yt are co-integrated if their spread ytβˆ’ betaxt simI(0) , that is, is stationary. The first step in the Engle-Granger test is to obtain a consistent assessment.  hat beta . This is done using the OLS (least squares method) for linear regression to the equation yt= betaxt+ varepsilont . The second step is to check for stationary residues  varepsilont obtained by OLS-estimation of the cointegration equation.



Usually we test stationarity with the Dickie-Fuller test. However, in 1990, Phillips and Uliaris in the article "Asymptotic properties of residual based tests for cointegration" showed that a series of  varepsilont Dicky-Fuller's test cannot be used.



The fact is that the OLS β€œchooses” the residues so that they have the smallest possible variation, therefore, even if the variables are not cointegrated, the OLS makes the residues β€œsimilar” to the stationary ones. Because of this, when using the Dickey-Fuller test, the hypothesis of non-stationarity is rejected too often and, accordingly, the hypothesis of cointegration is mistakenly accepted.



If we study the authors' article, we will see that in the appendix they give tables with critical values, however they turned out to be rather inaccurate. Later, in 1991, Engle and Granger published the Long-Run Economic Relationship book. In her 13th chapter, entitled β€œCritical value for cointegration tests,” McKinnon gave refined asymptotic critical values. t - statistics that were obtained by simulation and are suitable for this case.



In 1993, McKinnon, together with Davidson, published his book β€œEstimation and Inference in Econometrics”, where they also gave updated critical values. Thus, if  varepsilont simI(0) (residues are stationary), then ytβˆ’ betaxt simI(0) (the spread is also stationary), which means that there is a cointegration between xt and yt .



In general, the Angle-Granger method is reduced to:



  1. assessment  beta using OLS;
  2. spread calculation  varepsilont=ytβˆ’ betaxt and testing  varepsilont on stationarity with the help of specified critical values.


In standard packages such as Matlab, this test has already been written, let's use it.



MATLAB cointegration testing



So, we have two rows of stock prices, xt and yt . we want to xt and yt were co-integrated, that is, to spread  varepsilont=ytβˆ’ betaxt was stationary. If we want to get a stationary series with zero mean, we can include a constant in the equation, so the spread will look like  varepsilont=ytβˆ’ betaxtβˆ’ alpha .



Let's start with the results obtained on the Moscow Stock Exchange, which I described in the article about stationary increments . There I found five I(1) rows. We will make of them all sorts of combinations and check for cointegration with the help of Angle Granger's test.



First, we will select from the Microsoft SQL Server database, in which I store the stock price values ​​we needed from the Moscow Stock Exchange and the papers we need and import them as an array:



conn = database.ODBCConnection('uXXXXXX.mssql.masterhost.ru', 'uXXXXXX', 'XXXXXXXXXX'); curs = exec(conn, 'SELECT ALL PriceId, StockId, Date, Price FROM StockPrices WHERE StockId IN (52, 55, 67, 75, 162) AND Date >= ''2016-01-01 00:00:00.000'' AND Date < ''2017-01-01 00:00:00.000'''); curs = fetch(curs); data = curs.Data sqlquery = 'SELECT ALL StockId, ShortName, Code FROM Stocks WHERE StockId IN (52, 55, 67, 75, 162)'; curs = exec(conn, sqlquery); curs = fetch(curs); names = curs.Data close(conn); 


In this array for four out of five stocks there is data from January for 252 trading days. However, for one of the shares, deals began to be made only in February, so the data is only for 215 trading days. It is critically important for us that all stocks have an array of prices of the same length, so in such situations we have two options.



The first option is to exclude a stock with a short array of prices from the experiment and use the maximum number of price measurements in order to get more accurate results. The second option is to donate part of the data and include all the shares for the sake of greater practicality. I conducted both experiments, and in this case there was no difference in the results, so let's just cut off the January data:



 dates = unique(datetime(data(:,3))); % Cut dates array until price of stock with StockId=67 is not empty. dates(1:37,:) = []; prices = zeros(length(dates),length(names)); for i = 1:length(names) % Indexes with current stock's data indexes = find(cell2mat(data(:,2)) == cell2mat(names(i,1))); if length(indexes) == 252 indexes(1:37,:) = []; end for j=1:length(dates) % Fill prices according to date prices(j,i) = cell2mat(data(indexes(j),4)); end end 


The Angle-Granger test is performed using the egcitest function, which takes as its input an array of time series, in this case the size n times2 where n - number of trading days. At the output, the function returns a logical value of 1 if the null hypothesis is rejected in favor of the alternative, and 0 otherwise.



The next task we need to solve is what action to take for xt and which - for yt . In an amicable way, one should try both, and then compare test statistics. In most cases, there will be both direct and reverse regression. Let's start with the case when xt<yt .



We make all possible pairs of five identified I(1) series and perform the Engle-Granger test for both regression with a free member (by default) or without it (given by the 'creg' argument with a value of 'nc'):



 isCoint = zeros(length(nchoosek(names(:,1),2)), 3); k=1; for i=1:length(names) for j=i+1:length(names) if mean(prices(:,i)) < mean(prices(:,j)) isCoint(k,1) = cell2mat(names(j,1)); isCoint(k,2) = cell2mat(names(i,1)); testPrices(:,1) = prices(:,j); testPrices(:,2) = prices(:,i); else isCoint(k,1) = cell2mat(names(i,1)); isCoint(k,2) = cell2mat(names(j,1)); testPrices(:,1) = prices(:,i); testPrices(:,2) = prices(:,j); end isCoint(k,3) = egcitest(testPrices); isCoint(k,4) = egcitest(testPrices, 'creg', 'nc'); k = k + 1; end end 


In the case of regression with a free member, the program twice rejects the null hypothesis in favor of an alternative model, identifying cointegrated pairs of stocks with tickers (NKHP, VTRS), (NKHP, ZHIV). In the case of a regression without a free member, the program once rejects the null hypothesis in favor of the alternative, identifying a cointegrated pair of shares with tickers (VSYDP, NKHP).



In case of reverse regression ( yt<xt ) with a free member, the program twice rejects the null hypothesis in favor of an alternative model, identifying cointegrated pairs of shares with tickers (VTRS, NKHP), (ZHIV, NKHP). In the case of regression without a free member, the program four times rejects the null hypothesis in favor of the alternative, identifying cointegrated pairs of shares with tickers (GRNT, VTRS), (GRNT, VSYDP), (GRNT, ZHIV), (GRNT, NKHP).



Let's estimate the values  beta and  alpha , which can be obtained as return values ​​of the egcitest function, and draw a spread:



 % NKHP and VTRS indexY = 5; indexX = 1; testPrices(:,1) = prices(:,indexY); testPrices(:,2) = prices(:,indexX); [h,pValue,stat,cValue,reg1,reg2] = egcitest(testPrices); alpha = reg1.coeff(1); beta = reg1.coeff(2); spread = reg1.res; plot(dates,spread) legend(strcat(names(indexY,3),'-',names(indexX,3))); 


For stocks with tickers NKHP and VTRS, we obtain a spread with coefficients  beta=$37.552 and  alpha=$197.439 :



For reverse regression, we get a β€œmirror” spread with coefficients  beta=$0.085 and  alpha=βˆ’3,0064 :



For stocks with tickers NKHP and ZHIV, we obtain a spread with coefficients  beta=$3.352 and  alpha=$239.347 :



For reverse regression, we obtain a spread with coefficients  beta=0.2194 and  alpha=βˆ’49,6077 :



For stocks with tickers VSYDP and NKHP, we obtain a spread with a coefficient  beta=$35.652 :





Similar experiments were carried out for the shares of the New York Stock Exchange (NYSE). As a result, 158 cointegrated pairs were obtained for direct regression in the case of regression with a free member and 130 cointegrated pairs in the case of regression without a free member. For backward regression, 170 cointegrated pairs were obtained in the case of regression with a free member and 144 cointegrated pairs in the case of regression without a free member.



Regression Statistics



Let's look at the regression statistics of cointegrated regression for a pair (NKHP, VTRS).

StatisticsDirect regressionInverse regression
Coefficients beta=$37.552 ,  alpha=$197.439 beta=$0.085 ,  alpha=βˆ’3,0064
Test statisticstcalc=βˆ’3.7562 , tcrit=βˆ’3.3654tcalc=βˆ’3,5906 , tcrit=βˆ’3.3654
t -statisticst beta=$21.975 , t alpha=53.3845t beta=$21.975 , t alpha=βˆ’12,8953
F -statistics482,9196482,9196
Statistics of Durbin-Watson0.25480,2203
Coefficient of determination0,69390,6939
Corrected coefficient of determination0,69250,6925
Akaike Information Criterion1726.588,8336
Schwarz's Baes information criterion1733.295,5748
Hannan-Quinn Information Criterion1729.291.5574


Test statistics in both direct and inverse regression tells us that the variable  beta in this case, insignificant ( tcalc<tcrit ). This means that the price may be slightly exogenous, even though the variables are cointegrated.



In order to apply the Student’s criterion and the Fisher criterion, it is necessary that the statistics have a normal distribution. In our case, the statistics has a distribution similar to what Dickie and Fuller established (I also wrote about it in the article about stationary increments ), therefore the calculated values ​​of these statistics will be quite large and nothing meaningful will tell us.



Durbin-Watson statistics are acceptable (with positive autocorrelation, the statistics tend to zero). In the case of reverse regression, it is slightly better than in the case of direct.



The coefficient of determination is acceptable (for acceptable models it is assumed that the coefficient of determination should be at least at least 50%). Judging by this criterion, there is no difference between direct and reverse regression.



Judging by the information criteria, the inverse regression greatly benefits the direct (it is believed that the model with the lowest criterion value will be best).



Look at the regression statistics of the cointegrated regression for the pair (NKHP, ZHIV).

StatisticsDirect regressionInverse regression
Coefficients beta=$3.352 and  alpha=$239.347 beta=0.2194 and  alpha=βˆ’49,6077
Test statisticstcalc=βˆ’3,4762 , tcrit=βˆ’3.3654tcalc=βˆ’3.3878 , tcrit=βˆ’3.3654
t -statisticst beta=$24.344 , t alpha=$137.97t beta=$24.344 , t alpha=βˆ’19,8524
F -statistics592,652592,652
Statistics of Durbin-Watson0.26140,2104
Coefficient of determination0.73560.7356
Corrected coefficient of determination0.73440.7344
Akaike Information Criterion16951108,8
Schwarz's Baes information criterion1701.71115,5
Hannan-Quinn Information Criterion1697.71111.5


Test statistics in both direct and inverse regression tells us that the variable  beta in this case, insignificant. Durbin-Watson statistics are acceptable, in the case of reverse regression, slightly better than in the case of direct. The coefficient of determination is acceptable, there is no difference between direct and reverse regression. According to the information criteria, the inverse regression is slightly better than the direct one.



Coagulation regression statistics for the pair (VSYDP, NKHP).

StatisticsDirect regression
Coefficients beta=$35.652
Test statisticstcalc=βˆ’2,8339 , tcrit=βˆ’2,7761
t -statistics82.5035
F -statistics infty
Statistics of Durbin-Watson0.1305
Coefficient of determination0,1928
Corrected coefficient of determination0,1928
Akaike Information Criterion3823,8
Schwarz's Baes information criterion3827.1
Hannan-Quinn Information Criterion3825.1


Variable  beta judging by the test statistics, again insignificant. Fisher criterion flew into space. Durbin-Watson statistics are acceptable. The coefficient of determination is small, so the model is considered bad.



findings



There are a sufficient number of cointegrated shares in stock markets, that is, such that their spread is a stationary process. The presence of such pairs provides the basis for further research and a stable profit, but we'll talk about specific strategies next time.



What to read on the topic?



Robert F. Engle, C.W.J. Granger. Cointegration and error correction: presentation, evaluation and testing // Applied Econometrics. - 2015. - 39 (3). - p. 107-135.



This is a translation of the original article by the authors of 1987; the definition of cointegration is described in more detail there. You can also continue to read Magnus, whom I recommended in the article on stationary increments , there is also a section on cointegration.



UPD. Analytics on cointegrated couples for 2017 on the Moscow Stock Exchange .

Source: https://habr.com/ru/post/332558/



All Articles