The normality rule for lm: what should be normal
Let give an answer to this question:
"In a lm, is it important that the variable are distributed Gaussianly or is it the residual?"
In a linear model (lm), it is the residuals that matter — not the distribution of the original variables.
Here is the key distinction:
✅ What must be (approximately) Gaussian?
• The residuals (errors), conditional on the predictors, should be roughly normally distributed if you want valid confidence intervals, standard errors, and p-values.
❌ What does not need to be Gaussian?
• The raw variables (predictor or response) do NOT need to follow a normal distribution.
• Linear regression works fine with skewed variables, non-Gaussian predictors, etc.
Why residual normality matters
Normality of residuals ensures:
• estimates of standard errors are valid
• hypothesis tests (t-tests, F-tests) are reliable
• confidence intervals behave correctly
The coefficients themselves remain unbiased even if residuals are not normal, as long as:
• the model is linear
• errors have mean zero and constant variance
• errors are independent
Practical rule
• Check residuals, not raw data.
• If residuals are clearly non-normal and sample size is small, then consider transformations or robust methods.
Commentaires
Enregistrer un commentaire