logging - process logger files in R -
i’m using r version 2.15.3 (2013-03-01) rstudio 0.97.312 on ubuntu 12.10. i’m trying create histograms of logger data in r. however, sensors weren’t working, got tables #n/a
, o/c
in it. here’s excerpt of log:
date time type control.value (v) light.barrier (v) t hotplate ('c) t mesh ('c) t exhaust ('c) t camera ('c) ref. junction 1 ('c) 30.03.2012 13:47:50 interval 0.001 23.556 411.0 o/c 30.5 35.1 23.14 30.03.2012 13:47:51 interval 0.001 23.556 411.1 o/c 30.3 35.2 23.14 30.03.2012 13:47:52 interval 0.001 23.556 411.1 o/c 30.2 35.5 23.14 30.03.2012 13:47:53 interval 0.001 23.556 410.9 o/c 29.8 35.5 23.14 30.03.2012 13:47:54 interval 0.001 23.556 410.9 o/c 30.1 35.3 23.14 30.03.2012 13:47:55 interval 0.001 23.556 411.1 o/c 30.2 35.4 23.14 30.03.2012 13:47:56 interval 0.001 23.556 410.8 o/c 29.8 35.4 23.14 30.03.2012 13:47:57 interval 0.001 23.556 410.2 o/c 29.4 35.3 23.14 30.03.2012 13:47:58 interval 0.001 23.556 409.5 o/c 29.1 35.0 23.14 30.03.2012 13:47:59 interval 0.000 23.556 408.9 o/c 29.3 34.6 23.14 30.03.2012 13:48:00 interval 0.000 23.556 408.7 o/c #n/a #n/a 23.14
output of dput (head(logs), file = "dput.txt")
: http://pastebin.de/34176
r refuses process columns #n/a
, o/c
. can’t reformat hand, file has 185 000 lines.
when load log , try create histogram:
> logs <- read.delim("../data/logger/logs/logs.txt", header=true) > hist (logs$mesh)
i error message:
fehler in hist.default(logs$mesh) : 'x' muss nummerisch sein
rough translation (see: how change locale of r in rstudio?):
error in hist.default(logs$mesh) : 'x' must numeric
the columns can create histograms numerical ones listed sapply. thought have remove these invalid values numeric ones.
how can remove invalid rows? i’m open other ways processing them r, e.g perl or python if that’s more suitable task.
this output of sapply after loading log:
> sapply (logs, is.numeric) date time type control.value light.barrier hotplate mesh exhaust false false false true false true false false camera reference false true
after replacing #n/a
, o/c
na
(https://stackoverflow.com/a/16350443/2333821)
logs.clean <- data.frame (check.rows = true, apply(logs, 2, sub, pattern = "o/c|#n/a", replacement = na))
i this:
> sapply (logs.clean, is.numeric) date time type control.value light.barrier hotplate mesh exhaust false false false false false false false false camera reference false false
since you've asked removing rows, here's how i'd it, alternative below.
#makes data df <- data.frame(a = c("o/c", "#n/a", 1:3), b = c(4:6, "o/c", "#n/a")) # b # 1 o/c 4 # 2 #n/a 5 # 3 1 6 # 4 2 o/c # 5 3 #n/a #find rows contain either value remove <- apply(df, 1, function(row) any(row == "o/c" | row == "#n/a")) #subset using negated index df.rows <- df[!remove,] # b # 3 1 6
alternatively values , set them na
, won't remove rows, allow functions work data.
df.clean <- data.frame(apply(df, 2, sub, pattern = "o/c|#n/a", replacement = na))
i use data.frame()
convert numeric quickly, there might more elegant way that...
Comments
Post a Comment