Articles

Update rJava in Ubuntu

 First, I have been forced to delete all old installations of open-jdk: sudo apt-get purge openjdk-\* sudo apt autoremove It can be necessary to do: sudo rm -rf /usr/lib/jvm Install most recent version for Ubuntu 22.04 sudo apt install openjdk-21-jdk Then update the variables using: sudo R CMD javareconf And the update of rJava was installing

Pourquoi la p-value est fausse et produit des faux-positifs

 Parce qu’une  p-value  est calculée  conditionnellement à l’hypothèse nulle étant vraie , elle ne représente pas la probabilité de commettre une erreur de type I dans la situation réelle dans laquelle on se trouve. Lorsqu’elle est interprétée de cette façon, elle  surestime systématiquement  le « risque de premier ordre ». Voici le raisonnement précis. 1. Ce qu’est réellement le « risque de premier ordre » Le taux d’erreur de type I (risque de premier ordre) est : α = P ( rejeter  H 0 ∣ H 0  est vraie ) Il s’agit d’une propriété  à long terme , fixée  a priori , d’une règle de décision (par exemple : « rejeter si  p < 0,05 »). Ce n’est  pas  une probabilité concernant l’expérience en cours. 2. Ce qu’est réellement une  p-value Une  p-value  est : p = P ( T ≥ t obs ∣ H 0 ) Points clés : Elle est conditionnelle au fait que  H 0  soit vraie Ce n’est pas  P ( H 0 ∣ donné e s ) Ce n’est pa...

Why p-values over-estimate first order risk ?

 The short answer is: 👉  Because a p-value is computed  conditional on the null hypothesis being true , it does  not  represent the probability of making a Type I error in the situation you are actually in. When it is interpreted as such, it systematically overstates (over-estimates) the “first-order risk”. Below is the precise reasoning. 1. What “first-order risk” really is The  Type I error rate  (first-order risk) is: α = P ( reject  H 0 ∣ H 0  is true ) This is a  long-run, pre-specified  property of a decision rule (e.g. “reject if  p < 0.05 ”). It is  not a probability about the current experiment . 2. What a p-value actually is A p-value is: p = P ( T ≥ t obs ∣ H 0 ) Key points: It is  conditional on  H 0 ​  being true It is  not   P ( H 0 ∣ data ) It is  not   P ( Type I error ) 3. Where the over-estimation comes from The common (incorrect) interpretation “If...

Hypotheses and ANOVA

👉  For ANOVA,  both  the homogeneity of variances and the normality assumptions concern the  errors of the model , so they should be assessed on the  residuals . Below is the precise reasoning, with practical nuances. 1. What ANOVA actually assumes The classical ANOVA model is: Y i j = μ + α i + ε i j Y ij ​ = μ + α i ​ + ε ij ​ with the assumptions: Normality : ε i j ∼ N ( 0 , σ 2 ) ε ij ​ ∼ N ( 0 , σ 2 ) Homoscedasticity : V a r ( ε i j ) = σ 2 Var ( ε ij ​ ) = σ 2  for all groups Independence  of  ε i j ε ij ​ So  both assumptions apply to the  errors , not to the raw response  Y Y . 2. Consequences for diagnostics ✅ Normality Should be assessed on  residuals , not on original data. Raw data can be non-normal simply because group means differ. Correct tools: Q–Q plot of residuals Histogram of residuals Shapiro–Wilk test  on residuals  (with caution) ✅ Homogeneity of variances Also concerns  residual variance ...

Install gsl package in Ubuntu 24.04

 You must first install  sudo apt install libgsl-dev and then you can install gsl package in R: install.packages("gsl")

Fitting an exponential model with log(y) = a + b t or y = exp(a + b t)

  Data x = 2010 : 2020 (11 points) y = ( 10 , 10 , 15 , 20 , 30 , 60 , 100 , 120 , 200 , 300 , 400 ) y = ( 10 , 10 , 15 , 20 , 30 , 60 , 100 , 120 , 200 , 300 , 400 ) To simplify interpretation, the year is often centered: t = x − 2010 = 0 , 1 , … , 10 t = x − 2010 = 0 , 1 , … , 10 1️⃣ Linear regression on log(y) Model log ⁡ ( y ) = α + β t + ε Key assumption the error is  additive on the log scale therefore  multiplicative on the original scale Fit (order of magnitude) One typically obtains something like: log ⁡ ( y ) ≈ 2.2 + 0.36   t Back to the original scale y ^ = exp ⁡ ( 2.2 + 0.36   t ) 👉 regular exponential growth 👉 relative errors are roughly constant 👉 small values have as much weight as large ones 2️⃣ Direct nonlinear regression on y Model y = a e b t + ε Key assumption the error is  additive on y variance is assumed constant on the original scale Typical fit y ^ ≈ 9.5   e 0.39 t Consequences large values (300, 400) strongly dominate the fit early years ...

Confidence interval vs credible interval

  1. Confidence interval (frequentist) Definition A  95% confidence interval  is a procedure that, if repeated many times on new data generated under the same conditions, would contain the true parameter  95% of the time . Key point The parameter is fixed but unknown; the interval is random. Correct interpretation “If we were to repeat this study infinitely many times and compute a 95% confidence interval each time, 95% of those intervals would contain the true parameter.” Incorrect (but common) interpretation “There is a 95% probability that the true parameter lies within this interval.” ❌ That statement is  not  valid in frequentist statistics. Example You estimate a mean nest temperature and obtain a 95% CI of [28.1, 29.3] °C. You cannot assign a probability to the true mean being inside this specific interval—either it is or it isn’t. 2. Credible interval (Bayesian) Definition A  95% credible interval  is an interval within which the parameter...