Fitting an exponential model with log(y) = a + b t or y = exp(a + b t)

 

Data

x=2010:2020

(11 points)

y=(10,10,15,20,30,60,100,120,200,300,400)

To simplify interpretation, the year is often centered:

t=x2010=0,1,,10

1️⃣ Linear regression on log(y)

Model

log(y)=α+βt+ε

Key assumption

  • the error is additive on the log scale

  • therefore multiplicative on the original scale

Fit (order of magnitude)

One typically obtains something like:

log(y)2.2+0.36t

Back to the original scale

y^=exp(2.2+0.36t)

👉 regular exponential growth
👉 relative errors are roughly constant
👉 small values have as much weight as large ones


2️⃣ Direct nonlinear regression on y

Model

y=aebt+ε

Key assumption

  • the error is additive on y

  • variance is assumed constant on the original scale

Typical fit

y^9.5e0.39t

Consequences

  • large values (300, 400) strongly dominate the fit

  • early years are often poorly fitted

  • residuals are strongly heteroscedastic


3️⃣ Why the results differ (crucial point)

Even if the mean function is the same:

E[yt]ebt

the error assumptions differ:

ModelError type
log(y)=α+βt+εmultiplicative
y=aebt+εadditive

➡️ these are not the same least-squares problem


4️⃣ Intuitive illustration with data

Look at successive ratios of y:

YearyRatio
201010
2011101.0
2012151.5
2013201.33
2014301.5
2015602.0
20161001.67
20171201.2
20182001.67
20193001.5
20204001.33

➡️ ratios are relatively stable, not absolute differences
➡️ a multiplicative assumption is reasonable
➡️ using log(y) is therefore statistically justified


5️⃣ Beware of retransformation bias

If you compute:

y^=exp(log(y)^)

then:

E[y]exp(E[log(y)])

Classical correction (smearing estimator):

y^=exp(μ^)×exp(σ2/2)

This becomes important when doing prediction.


6️⃣ Key takeaway

👉 Putting log(y) in a linear regression is not equivalent to fitting an exponential model directly to y because the error structure changes completely


Here is clean, reproducible R code using these data, showing the two approaches side by side and how to handle retransformation correctly.

## Data x <- 2010:2020 y <- c(10, 10, 15, 20, 30, 60, 100, 120, 200, 300, 400) ## Centered time variable t <- x - 2010 ## ------------------------------- ## 1) Linear regression on log(y) ## ------------------------------- m_log <- lm(log(y) ~ t) summary(m_log) ## Fitted values on original scale (naive back-transformation) yhat_log_naive <- exp(predict(m_log)) ## Smearing correction (Duan, 1983) sigma2 <- mean(residuals(m_log)^2) smearing <- exp(sigma2 / 2) yhat_log_corrected <- yhat_log_naive * smearing ## ------------------------------- ## 2) Nonlinear regression on y ## ------------------------------- m_nls <- nls( y ~ a * exp(b * t), start = list(a = 10, b = 0.3) ) summary(m_nls) yhat_nls <- fitted(m_nls) ## ------------------------------- ## 3) Compare fits ## ------------------------------- plot( x, y, log = "y", pch = 16, xlab = "Year", ylab = "y (log scale)", main = "Log-linear vs nonlinear exponential fit" ) lines(x, yhat_log_corrected, col = "blue", lwd = 2) lines(x, yhat_nls, col = "red", lwd = 2) legend( "topleft", legend = c("Observed", "log(y) ~ t (smearing)", "y ~ a exp(bt)"), pch = c(16, NA, NA), lwd = c(NA, 2, 2), col = c("black", "blue", "red") ) ## ------------------------------- ## 4) Residual diagnostics ## ------------------------------- par(mfrow = c(1, 2)) plot( fitted(m_log), residuals(m_log), pch = 16, main = "Residuals: log-linear model", xlab = "Fitted log(y)", ylab = "Residuals" ) abline(h = 0, lty = 2) plot( yhat_nls, y - yhat_nls, pch = 16, main = "Residuals: nonlinear model", xlab = "Fitted y", ylab = "Residuals" ) abline(h = 0, lty = 2)

How to interpret the output

  • lm(log(y) ~ t)

    • assumes multiplicative error

    • stabilizes variance

    • gives equal weight to early and late years

  • nls(y ~ a * exp(b * t))

    • assumes additive error

    • large values dominate the fit

    • often shows heteroscedastic residuals

  • The smearing correction is essential if you want unbiased predictions on the original scale from a log-linear model.

    Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association, 78(383), 605-610. https://doi.org/10.2307/2288126 


Commentaires

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

stepAIC from package MASS with AICc

Multivariable analysis and correlation of iconography