cran - Native method in R to test if file is ascii -
is there native method in r test if file on disk ascii text file, or binary file? similar file
command in linux, method work cross platform?
the file.info()
function can distinguish file dir, doesn't seem go beyond that.
if care whether file ascii or binary...
well, first definitions. files binary @ level:
is.binary <- function(file){ if(system.type() != "quantum computer"){ return(true) }else{ return(cat=alive&dead) } }
ascii encoding system characters. therefore impossible tell if file ascii or binary, because ascii-ness matter of interpretation. if save file , decide binary number 01001101 q , 01001110 z might decode ascii you'll wrong message. luckily americans muscled in , said "hey, use ascii code text! 128 characters , parity bit! woo! go usa!". ibm tried tell people use ebcdic nobody listened. thing.
so packing ascii-coded text 8-bit bytes, , using eighth bit parity checking. people stopped doing parity checking because tcp/ip handled that, thing, , eighth bit expected zero. if not, there trouble.
because people (read "microsoft") started abusing eighth bit, , making own encoding schemes, , unless knew encoding scheme file using, stuffed. , file told encoding scheme was. , have unicode , more encoding schemes. , third thing. digress.
nowadays when people ask if file binary, asking "does byte in file have it's highest bit set?". can in r reading raw file connection unsigned integers , testing highest value. like:
is.binary <- function(filepath,max=1000){ f=file(filepath,"rb",raw=true) b=readbin(f,"int",max,size=1,signed=false) return(max(b)>128) }
this default test @ first 1000 characters. think file
command similar.
you may want change test check printable character codes, , whitespace, , line feed, carriage return, , other codes might want consider plausible in non-binary files...
Comments
Post a Comment