11/21/88 get 5/31/89 (minor revisions, get) 7/26/00 (changed filenames, arg) References: Gallant, A. Ronald, Peter E. Rossi, and George E. Tauchen (1992), "Stock Prices and Volume," The Review of Financial Studies 5, 199-242. Gallant, A. Ronald, Peter E. Rossi, and George E. Tauchen (1993), "Nonlinear Dynamic Structures," Econometrica 61, 871-907. DOCUMENTATION FOR nyse.dat by George Tauchen, Department of Economics Duke University Durham, NC 27706 1. Variables in Data Set --------------------- The file nyse.dat contains daily data on the adjusted and unadjusted trading volume and price changes on the New York Stock Exchange for the years 1928-1987. There are 16127 observations arranged as nine columns of data: Col SAS Name Description 1 YYMMDD Date in YY/MM/DD format. 2 DAY Day of the week (1=Monday, 2=Tuesday, ..., 6=Saturday). 3 HOL Holiday dummy (1=preceded by a holiday, 0 otherwise). 4 WKEND Weekend dummy (1=preceded by a weekend, 0 otherwise). 5 GAP Number of calendar days since preceding trading day. 6 VADJ Adjusted trading volume on the New Stock Exchange, where the adjustment is for systematic location and scale effects as described in Section 2 below. 7 PADJ Adjusted log first difference of S&P price index, 100*(log(S&P) - log(S&P_1)) where S&P is the Standard and Poor's composite price index, and where the adjustment is for systematic location and scale effects as described in Section 2 below. 8 V Log of raw volume data. 9 P Log first difference of S&P price index, 100*(log(S&P) - log(S&P_1)), where S&P is the composite price index. The source data sets are the files DGETAU.SP.DATA.RAW.SPWORK20 ... DGETAU.SP.DATA.RAW.SPWORK80. 2. Adjustment for systematic location and scale effects. ---------------------------------------------------- Both the log first difference of the price index and the volume series are adjusted in a similar manner. Let w denote the variable to be adjusted. Initially, the regression (mean equation) w = x*beta + u is fitted, where x consists of calendar variables as described in Section 3 below. To the residuals, u_hat, the model (variance equation) log(u_hat^2) = x*gamma + e is estimated. Next u_hat/sqrt(exp(x*gamma_hat)) is formed, leaving a series with mean zero and (approximately) unit variance given x. Lastly, the series w_adj = a + b*(u_hat/sqrt(exp(x*gamma_hat))) is taken as the adjusted series, where a and b are chosen so that sample average(w_adj) = sample average(w) sample std dev(w_adj) = sample std dev(u_hat) The purpose of the final location and scale transformation is to aid interpretation. In particular, the unit of measurement of the adjusted series is the same as that of the original series. 3. Calendar variables ------------------ With the exceptions noted below, the x variables for the mean and variance equations are (i) Linear and quadratic trends. (ii) Dummies for the years 1941-45. (iii) Day of the week dummies. (iv) Five dummies for the value of the calendar day separation variable: GAP=1, GAP=2, GAP=3, GAP=4, GAP=5. (v) Dummies for month of the year, excluding January and December, which are treated separately. (vi) Four dummies for subperiods within January and December; specifically dummies for the following cases of DD 1 le DD le 7 8 le DD le 14 15 le DD le 21 22 le DD le 31 where DD is the calendar day of the month. The first exception is the mean equation for first difference of the price index, where the trend and trend squared are omitted. The others are the mean and variance equations for the volume, where the coefficients of GAP2, GAP3, GAP4, and GAP5 are constrained to lie on a line, i.e., constraints implying GAP5-GAP4 = GAP4-GAP3 = GAP3-GAP2 are imposed. The reason for imposing the constraints is that unrestricted estimates of the GAP4 and GAP5 dummies are imprecise and dominated by a few extreme observations. Without such restrictions, the adjusted volume series would also contain extreme values. One final note concerns the calendar separation variable (GAP). The frequency distribution for this variable is Frequency GAP= 1: 12686 GAP= 2: 1339 GAP= 3: 1873 GAP= 4: 223 GAP= 5: 5 GAP=12: 1 ----- 16127 (days) The adjustment regressions include five dummies for GAP=1 through GAP=5, but not a dummy for GAP=12. The single observation on GAP=12 is the Bank Holiday in March 1933. ....:....1....:....2....:....3....:....4....:....5....:....6....:....7....:....8 data tmp; infile 'nyse.dat'; input (yy mm dd day hol wkend gap vadj padj v p) (2. 2. 2. 2. 2. 2. 3. 14.10 e18.10 14.10 e18.10); ....:....1....:....2....:....3....:....4....:....5....:....6....:....7....:....8 subroutine getdat(data,m,n) implicit none integer*4 yy,mm,dd,day,hol,wkend,gap real*8 lvadj,lpadj,lv,lp real*8 data(m,n) include 'snpincl7.f' do 10 i=1,n read(u14,14001) yy,mm,dd,day,hol,wkend,gap,lvadj,lpadj,lv,lp data(1,i)=lpadj data(2,i)=lvadj 10 continue return 14001 format(6i2,i3,f14.10,e18.10,f14.10,e18.10) end ....:....1....:....2....:....3....:....4....:....5....:....6....:....7....:....8