## Interpretation of regression results

from **Lars Syll**

When econometric and statistical textbooks present simple (and multiple) regression analysis for cross-sectional data, they often do it with regressions like “regress test score (y) on study hours (x)” and get the result

y = constant + slope coefficient*x + error term.

When speaking of increases or decreases in x in these interpretations, we have to remember that it is a question of cross-sectional data and ‘increases’ means that we are referring to ‘increases’ in the value of a variable from *one* unit in the population to *another* unit in the same population. Strictly seen it is only admissible to give slope coefficients a *dynamic* interpretation when we are dealing with time-series regression. For cross-sectional data, we should stick to *static* interpretations and look upon slope coefficients as giving information about what we can expect to happen to the value of the dependent variable when there is a change in the independent variable *from one unit to another*.

Although it is tempting to say that a change in the independent variable leads to a change in the dependent variable, we should resist that temptation. Students that put a lot of study hours into their daily routine on average achieve higher scores on their tests than *other* students that study for fewer hours. But — the regressions made do not analyse what happens to individual students as they increase or decrease their study hours.

Why is this important? It is important most of all because interpreting the regression coefficients wrong may give a totally wrong causal view of what is going on in your data. A positive relation between test scores and study hours in a cross-sectional regression does not mean that *you as an individual student* should expect to get higher test scores by increasing study time.