Fitting an exponential model with log(y) = a + b t or y = exp(a + b t)
Data
(11 points)
To simplify interpretation, the year is often centered:
1️⃣ Linear regression on log(y)
Model
Key assumption
the error is additive on the log scale
therefore multiplicative on the original scale
Fit (order of magnitude)
One typically obtains something like:
Back to the original scale
👉 regular exponential growth
👉 relative errors are roughly constant
👉 small values have as much weight as large ones
2️⃣ Direct nonlinear regression on y
Model
Key assumption
the error is additive on y
variance is assumed constant on the original scale
Typical fit
Consequences
large values (300, 400) strongly dominate the fit
early years are often poorly fitted
residuals are strongly heteroscedastic
3️⃣ Why the results differ (crucial point)
Even if the mean function is the same:
the error assumptions differ:
| Model | Error type |
|---|---|
| multiplicative | |
| additive |
➡️ these are not the same least-squares problem
4️⃣ Intuitive illustration with data
Look at successive ratios of :
| Year | y | Ratio |
|---|---|---|
| 2010 | 10 | – |
| 2011 | 10 | 1.0 |
| 2012 | 15 | 1.5 |
| 2013 | 20 | 1.33 |
| 2014 | 30 | 1.5 |
| 2015 | 60 | 2.0 |
| 2016 | 100 | 1.67 |
| 2017 | 120 | 1.2 |
| 2018 | 200 | 1.67 |
| 2019 | 300 | 1.5 |
| 2020 | 400 | 1.33 |
➡️ ratios are relatively stable, not absolute differences
➡️ a multiplicative assumption is reasonable
➡️ using log(y) is therefore statistically justified
5️⃣ Beware of retransformation bias
If you compute:
then:
Classical correction (smearing estimator):
This becomes important when doing prediction.
6️⃣ Key takeaway
👉 Putting log(y) in a linear regression is not equivalent to fitting an exponential model directly to y because the error structure changes completely
Here is clean, reproducible R code using these data, showing the two approaches side by side and how to handle retransformation correctly.
How to interpret the output
lm(log(y) ~ t)assumes multiplicative error
stabilizes variance
gives equal weight to early and late years
nls(y ~ a * exp(b * t))assumes additive error
large values dominate the fit
often shows heteroscedastic residuals
The smearing correction is essential if you want unbiased predictions on the original scale from a log-linear model.
Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association, 78(383), 605-610. https://doi.org/10.2307/2288126
Commentaires
Enregistrer un commentaire