We consider statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters. Examples include data on individuals with clustering on village or region or other category such as industry, and state-year differences-in-differences studies with clustering on state. We outline the basic method as well as many complications that can arise in practice. One can use a bias-corrected cluster-robust variance matrix, make T-distribution adjustments, or use bootstrap methods with asymptotic refinements, such as the percentile-t or wild bootstrap, that can lead to improved finite sample inference.
Examples include data on individuals with clustering on village or region. In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. Instead, if the number of clusters is large, statistical inference after OLS should be based on cluster-robust standard errors. But the data demands are greater, because the asymptotics for 2-way cluster-robust covariance estimator require the number of time periods to go off to infinity and the number of industries to go off to infinity. A. Colin Cameron and Douglas L. Miller Labeling time-variation as discriminatory is difficult because the cost of an unsold airline seat changes with inventory, days before departure and aggregate demand expectations.
Page 20 onward should help you out. On regression adjustments to experimental data.
