The normality rule for lm: what should be normal

Let give an answer to this question:

"In a lm, is it important that the variable are distributed Gaussianly or is it the residual?"

In a linear model (lm), it is the residuals that matter — not the distribution of the original variables.

Here is the key distinction:

✅  What must be (approximately) Gaussian?

•          The residuals (errors), conditional on the predictors, should be roughly normally distributed if you want valid confidence intervals, standard errors, and p-values.

❌  What does not need to be Gaussian?

•          The raw variables (predictor or response) do NOT need to follow a normal distribution.

•          Linear regression works fine with skewed variables, non-Gaussian predictors, etc.

Why residual normality matters

Normality of residuals ensures:

•          estimates of standard errors are valid

•          hypothesis tests (t-tests, F-tests) are reliable

•          confidence intervals behave correctly

The coefficients themselves remain unbiased even if residuals are not normal, as long as:

•          the model is linear

•          errors have mean zero and constant variance

•          errors are independent

Practical rule

•          Check residuals, not raw data.

•          If residuals are clearly non-normal and sample size is small, then consider transformations or robust methods.

Commentaires

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

stepAIC from package MASS with AICc

Multivariable analysis and correlation of iconography