Retrieve data from data.frame

The command subset is very efficient to retrieve data from data.frame. For example, let retrieve the size for a specific stage in this data frame:

> library(embryogrowth)
> s <- TSP.list[[1]]
> s
   stages metric
1       8     NA
2       9     NA
3      10     NA
4      11     NA
5      12     NA
6      13     NA
7      14  0.016
8      15  0.023
9      16  0.035
10     17  0.044
11     18  0.057
12     19  0.072
13     20  0.090
14     21  0.140
15     22  0.240
16     23  0.340
17     24  0.550
18     25  0.750
19     26  1.000

For example:

> subset(x = s, subset=stages==20, select="metric", drop = TRUE)
[1] 0.09

An alternative solution for this simple case is:
> s$metric[s$stages == 20]
[1] 0.09

Let measure the relative speed of both solutions; clearly subset is very useful for complex situation but simpler ones are faster using direct comparison.


> system.time(expr = {
+   for (i in 1:100000) {
+     g <- subset(x = s, subset=stages==20, select="metric", drop = TRUE)
+   }
+ }
+ )
utilisateur     système      écoulé 
      3.124       0.099       3.247 
> system.time(expr = {
+   for (i in 1:100000) {
+     g <- s$metric[s$stages == 20]
+   }
+ }
+ )
utilisateur     système      écoulé 
      1.331       0.045       1.387 

Commentaires

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

Install treemix in ubuntu 20.04

stepAIC from package MASS with AICc