Fitting an exponential model with log(y) = a + b t or y = exp(a + b t)

décembre 21, 2025

Data

x = 2010 : 2020

(11 points)

y = (10, 10, 15, 20, 30, 60, 100, 120, 200, 300, 400)

To simplify interpretation, the year is often centered:

t = x - 2010 = 0, 1, \dots, 10

1️⃣ Linear regression on log(y)

Model

\log (y) = α + β t + ε

Key assumption

the error is additive on the log scale
therefore multiplicative on the original scale

Fit (order of magnitude)

One typically obtains something like:

\log (y) \approx 2.2 + 0.36 t

Back to the original scale

\hat{y} = \exp (2.2 + 0.36 t)

👉 regular exponential growth
👉 relative errors are roughly constant
👉 small values have as much weight as large ones

2️⃣ Direct nonlinear regression on y

Model

y = a e^{b t} + ε

Key assumption

the error is additive on y
variance is assumed constant on the original scale

Typical fit

\hat{y} \approx 9.5 e^{0.39 t}

Consequences

large values (300, 400) strongly dominate the fit
early years are often poorly fitted
residuals are strongly heteroscedastic

3️⃣ Why the results differ (crucial point)

Even if the mean function is the same:

E [y ∣ t] \propto e^{b t}

the error assumptions differ:

Model	Error type
$\log (y) = α + β t + ε$	multiplicative
$y = a e^{b t} + ε$	additive

➡️ these are not the same least-squares problem

4️⃣ Intuitive illustration with data

Look at successive ratios of $y$ :

Year	y	Ratio
2010	10	–
2011	10	1.0
2012	15	1.5
2013	20	1.33
2014	30	1.5
2015	60	2.0
2016	100	1.67
2017	120	1.2
2018	200	1.67
2019	300	1.5
2020	400	1.33

➡️ ratios are relatively stable, not absolute differences
➡️ a multiplicative assumption is reasonable
➡️ using log(y) is therefore statistically justified

5️⃣ Beware of retransformation bias

If you compute:

\hat{y} = \exp (\hat{\log (y)})

then:

E [y] \neq \exp (E [\log (y)])

Classical correction (smearing estimator):

\hat{y} = \exp (\hat{μ}) \times \exp (σ^{2} / 2)

This becomes important when doing prediction.

6️⃣ Key takeaway

👉 Putting log(y) in a linear regression is not equivalent to fitting an exponential model directly to y because the error structure changes completely

Here is clean, reproducible R code using these data, showing the two approaches side by side and how to handle retransformation correctly.


## Data
x <- 2010:2020
y <- c(10, 10, 15, 20, 30, 60, 100, 120, 200, 300, 400)

## Centered time variable
t <- x - 2010

## -------------------------------
## 1) Linear regression on log(y)
## -------------------------------
m_log <- lm(log(y) ~ t)
summary(m_log)

## Fitted values on original scale (naive back-transformation)
yhat_log_naive <- exp(predict(m_log))

## Smearing correction (Duan, 1983)
sigma2 <- mean(residuals(m_log)^2)
smearing <- exp(sigma2 / 2)

yhat_log_corrected <- yhat_log_naive * smearing

## -------------------------------
## 2) Nonlinear regression on y
## -------------------------------
m_nls <- nls(
  y ~ a * exp(b * t),
  start = list(a = 10, b = 0.3)
)
summary(m_nls)

yhat_nls <- fitted(m_nls)

## -------------------------------
## 3) Compare fits
## -------------------------------
plot(
  x, y,
  log = "y",
  pch = 16,
  xlab = "Year",
  ylab = "y (log scale)",
  main = "Log-linear vs nonlinear exponential fit"
)

lines(x, yhat_log_corrected, col = "blue", lwd = 2)
lines(x, yhat_nls, col = "red", lwd = 2)

legend(
  "topleft",
  legend = c("Observed", "log(y) ~ t (smearing)", "y ~ a exp(bt)"),
  pch = c(16, NA, NA),
  lwd = c(NA, 2, 2),
  col = c("black", "blue", "red")
)

## -------------------------------
## 4) Residual diagnostics
## -------------------------------
par(mfrow = c(1, 2))

plot(
  fitted(m_log),
  residuals(m_log),
  pch = 16,
  main = "Residuals: log-linear model",
  xlab = "Fitted log(y)",
  ylab = "Residuals"
)
abline(h = 0, lty = 2)

plot(
  yhat_nls,
  y - yhat_nls,
  pch = 16,
  main = "Residuals: nonlinear model",
  xlab = "Fitted y",
  ylab = "Residuals"
)
abline(h = 0, lty = 2)

How to interpret the output

lm(log(y) ~ t)
- assumes multiplicative error
- stabilizes variance
- gives equal weight to early and late years
nls(y ~ a * exp(b * t))
- assumes additive error
- large values dominate the fit
- often shows heteroscedastic residuals
The smearing correction is essential if you want unbiased predictions on the original scale from a log-linear model.
Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association, 78(383), 605-610. https://doi.org/10.2307/2288126

Rechercher dans ce blog

BiostatR Blog