[0-9] or \d or [:digit:] in grep

If you want search for a number using grep, you have 4 options:
[0123456789] or [0-9] or \d or [:digit:]

Here is a comparison:
library(microbenchmark)

microbenchmark({grep(pattern="[0123456789][0123456789][0123456789][0123456789]", "Ussiosuus8980JJDUD98")}, 
               {grep(pattern="[0-9][0-9][0-9][0-9]", "Ussiosuus8980JJDUD98")}, 
               {grep(pattern="[[:digit:]][[:digit:]][[:digit:]][[:digit:]]", "Ussiosuus8980JJDUD98")}, 
               {grep(pattern="\\d\\d\\d\\d", "Ussiosuus8980JJDUD98")}, times = 1000L)

    min      lq      mean  median      uq    max neval  cld
 21.729 22.1840 22.887585 22.3090 22.4645 64.542  1000    d
  5.219  5.3740  5.696937  5.4665  5.8495 47.618  1000 a   
  9.778 10.0425 10.482301 10.1470 10.5400 25.602  1000  b  
 10.296 10.5000 11.099359 10.6360 11.0440 39.759  1000   c 

Clearly the solution [0-9] is the best

This is the same for letters:

microbenchmark({grep(pattern="[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]", "Ussiosuus8980JJDUD98")}, 
               {grep(pattern="[A-Z][A-Z][A-Z][A-Z]", "Ussiosuus8980JJDUD98")}, 
               {grep(pattern="[[:upper:]][[:upper:]][[:upper:]][[:upper:]]", "Ussiosuus8980JJDUD98")}, 
               times = 1000L)

    min      lq      mean  median      uq     max neval cld
 72.475 74.2315 80.536616 79.0655 79.7905 251.430  1000   c
  5.150  6.3040  7.304987  6.8855  7.5765  96.529  1000 a  
 13.637 14.7285 16.466666 16.0080 17.1880  78.308  1000  b 

Commentaires

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

Install treemix in ubuntu 20.04

stepAIC from package MASS with AICc