logging - process logger files in R -


i’m using r version 2.15.3 (2013-03-01) rstudio 0.97.312 on ubuntu 12.10. i’m trying create histograms of logger data in r. however, sensors weren’t working, got tables #n/a , o/c in it. here’s excerpt of log:

date    time    type    control.value (v)   light.barrier (v)   t hotplate ('c) t mesh ('c) t exhaust ('c)  t camera ('c)   ref. junction 1 ('c)  30.03.2012  13:47:50    interval    0.001   23.556  411.0   o/c 30.5    35.1    23.14 30.03.2012  13:47:51    interval    0.001   23.556  411.1   o/c 30.3    35.2    23.14 30.03.2012  13:47:52    interval    0.001   23.556  411.1   o/c 30.2    35.5    23.14 30.03.2012  13:47:53    interval    0.001   23.556  410.9   o/c 29.8    35.5    23.14 30.03.2012  13:47:54    interval    0.001   23.556  410.9   o/c 30.1    35.3    23.14 30.03.2012  13:47:55    interval    0.001   23.556  411.1   o/c 30.2    35.4    23.14 30.03.2012  13:47:56    interval    0.001   23.556  410.8   o/c 29.8    35.4    23.14 30.03.2012  13:47:57    interval    0.001   23.556  410.2   o/c 29.4    35.3    23.14 30.03.2012  13:47:58    interval    0.001   23.556  409.5   o/c 29.1    35.0    23.14 30.03.2012  13:47:59    interval    0.000   23.556  408.9   o/c 29.3    34.6    23.14 30.03.2012  13:48:00    interval    0.000   23.556  408.7   o/c #n/a    #n/a    23.14 

output of dput (head(logs), file = "dput.txt"): http://pastebin.de/34176

r refuses process columns #n/a , o/c. can’t reformat hand, file has 185 000 lines.

when load log , try create histogram:

> logs <- read.delim("../data/logger/logs/logs.txt", header=true)  > hist (logs$mesh) 

i error message:

fehler in hist.default(logs$mesh) : 'x' muss nummerisch sein 

rough translation (see: how change locale of r in rstudio?):

error in hist.default(logs$mesh) : 'x' must numeric 

the columns can create histograms numerical ones listed sapply. thought have remove these invalid values numeric ones.

how can remove invalid rows? i’m open other ways processing them r, e.g perl or python if that’s more suitable task.

this output of sapply after loading log:

> sapply (logs, is.numeric)      date          time          type control.value light.barrier      hotplate          mesh       exhaust      false         false         false          true         false          true         false         false     camera     reference      false          true  

after replacing #n/a , o/c na (https://stackoverflow.com/a/16350443/2333821)

  logs.clean <- data.frame (check.rows = true, apply(logs, 2, sub, pattern = "o/c|#n/a", replacement = na)) 

i this:

> sapply (logs.clean, is.numeric)      date          time          type control.value light.barrier      hotplate          mesh       exhaust      false         false         false         false         false         false         false         false     camera     reference     false         false  

since you've asked removing rows, here's how i'd it, alternative below.

#makes data df <- data.frame(a = c("o/c", "#n/a", 1:3), b = c(4:6, "o/c", "#n/a"))      #    b # 1  o/c    4 # 2 #n/a    5 # 3    1    6 # 4    2  o/c # 5    3 #n/a  #find rows contain either value remove <- apply(df, 1, function(row) any(row == "o/c" | row == "#n/a")) #subset using negated index df.rows <- df[!remove,] #   b # 3 1 6 

alternatively values , set them na, won't remove rows, allow functions work data.

df.clean <- data.frame(apply(df, 2, sub, pattern = "o/c|#n/a", replacement = na)) 

i use data.frame() convert numeric quickly, there might more elegant way that...


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -