Further Topics in Regression Analysis
In this section, we consider important extensions of the multiple linear regression model. The section is divided into four chapters:
- Chapter 17 Regression with Panel Data
- Chapter 18 Regression with a Binary Dependent Variable
- Chapter 19 Instrumental Variables Regression
- Chapter 20 Experiments and Quasi-Experiments
- Chapter 21 Prediction with Many Regressors and Big Data
In Chapter 17, we introduce panel data models, which require datasets containing multiple observations on the same individuals or entities. We consider entity and time fixed-effects models, which allow us to control for observed and unobserved factors that are invariant across entities or over time. We also examine entity and time random-effects models. To illustrate, we use a dataset of the 48 contiguous U.S. states over the period 1969–2010 to study the relationship between state traffic fatalities and alcohol taxes.
In Chapter 18, we introduce binary choice models, which are used when the dependent variable is binary. We discuss the linear probability model, the logit model, and the probit model. We explain how to interpret the estimated parameters and calculate marginal effects in these models. In this chapter, we use a dataset on mortgage applications from the Boston metropolitan area in 1990. We use the binary choice models to study the effects of various factors on the probability of mortgage denial.
In Chapter 19, we introduce the instrumental variable regression method to address endogenous regressors. We outline the conditions necessary for valid instrumental variables and present the two-stage least squares (2SLS) estimator. We discuss how to check instrument validity. As an application, we use a dataset on cigarette demand in the United States to estimate the price elasticity of demand.
In Chapter 20, we study experimental and quasi-experimental methods for causal inference. We introduce the potential outcomes framework to define the individual causal effect, the average causal effect, and the average treatment effect on the treated. We discuss randomized controlled trials (RCTs) and outline the main threats to their internal and external validity. We also examine quasi-experimental methods, including difference-in-differences and regression discontinuity design. These methods are illustrated using two different datasets to study the causal effects of policy changes/interventions on outcomes.
In Chapter 21, we consider popular machine learning methods for modeling high-dimensional data. We introduce the ridge regression, lasso, elastic net, and principal components regression. We discuss why these learners provide better predictions than the OLS estimator when there are many regressors. For this chapter, we use a disaggregated dataset on elementary schools in California in 2013 to predict student test scores. We compare the predictive performance of machine learning estimators with that of the OLS estimator.