Fundamentals of Regression Analysis

This section contains the following chapters on the fundamentals of regression analysis:

In Chapter 11, we introduce the linear regression model with one regressor and define the least squares estimator for its parameter. We also define two measures of fit for assessing prediction accuracy. Next, we provide the least squares assumptions that ensure a causal interpretation for the slope parameter. Under these assumptions, we discuss that the least squares estimator is consistent and has an asymptotic normal distribution. In the accompanying Appendix A, we show how to derive the OLS estimator and discuss its algebraic properties for the linear regression model with one regressor.

In Chapter 12, we show how the asymptotic variance of the least squares estimator can be estimated under both homoskedastic and heteroskedastic error terms. We cover how to test null hypotheses formulated for the slope parameter and how to construct confidence intervals for it. We discuss the interpretation of the slope parameter when the regressor is a binary variable. We conclude the chapter by presenting the Gauss-Markov theorem, which shows that the OLS estimator is the best linear conditionally unbiased estimator.

In Chapter 13, we begin by defining omitted variable bias and show how the multiple linear regression model can be used to address it. We then show how the OLS estimator can be applied to estimate the parameters of the model. Next, we introduce the assumptions that ensure the OLS estimator is consistent and asymptotically normal. The chapter concludes with a discussion of the conditional mean independence assumption for causal inference. In the following related chapter, Chapter 14, we provide inference methods for the multiple linear regression model. In particular, we discuss how to formulate joint null hypotheses about parameters and how F-statistics can be used to test these joint null hypotheses.

In Chapter 15, we consider some commonly used methods for modeling nonlinear relationships between the dependent variable and a regressor. We introduce polynomial regression, which accounts for nonlinear relationships by including higher-order terms of the regressor. We also discuss logarithmic transformations, which can be used to formulate linear-log, log-linear, or log-log models. When the effect of a variable on the dependent variable depends on another variable, interaction terms can be used to model this relationship. Finally, we apply various nonlinear regression specifications to study the effect of the student-teacher ratio on the average test score in an empirical application.

Finally, in Chapter 16, we introduce a framework for assessing the strengths and limitations of linear regression models. This framework is based on internal and external validity. We discuss how to apply this framework by identifying main threats that invalidate internal and external validity. In an empirical application, we apply this framework to assess the internal and external validity of a study that estimates the effect of the student-teacher ratio on test scores using a dataset on public elementary schools in Massachusetts.