encoding - iconv equivalent code in Java doesn't return same results -
i have requirement encode file utf-8 shift_jis. previously, done using iconv command below
iconv -f utf8 -t sjis $input_file
the input file supply returns error, saying
illegal input sequence @ position 2551
i have written java code:
fileinputstream fis = new fileinputstream( "input.txt"); inputstreamreader in = new inputstreamreader(fis, "utf-8"); fileoutputstream fos = new fileoutputstream("output.txt"); outputstreamwriter out = new outputstreamwriter(fos, "shift_jis"); int val = 0; stringbuilder sb = new stringbuilder(); while((val =in.read() )!= -1){ system.out.println(integer.tohexstring(val)); sb.append((char)val); } out.write(sb.tostring()); out.flush(); fis.close(); out.close();
the code executes fine same input file , doesn't return error.
am missing here?
joachim. looks answer. have added code in question. getting unmappable character error. fails encode normal characters text "hello". doing wrong anywhere
private static charsetdecoder decoder(string encoding) { return charset.forname(encoding).newdecoder() .onmalformedinput(codingerroraction.report) .onunmappablecharacter(codingerroraction.report); } private static charsetencoder encoder(string encoding) { return charset.forname(encoding).newencoder() .onmalformedinput(codingerroraction.report) .onunmappablecharacter(codingerroraction.report); } public static void main(string[] args) throws ioexception { fileinputstream fis = new fileinputstream( "d:\\input.txt"); inputstreamreader in = new inputstreamreader(fis, decoder("utf-8")); fileoutputstream fos = new fileoutputstream("d:\\output.txt"); outputstreamwriter out = new outputstreamwriter(fos, encoder("shift_jis")); char[] buffer = new char[4096]; int length; while ((length = in.read(buffer)) != -1) { out.write(buffer, 0, length); } out.flush(); }
that should merely problem concerning utf-8. inputstream , start hex dumping position 2551, or bit earlier preceding text.
especially interesting is, iconv delivers there.
a dump:
so can see data caused problem.
public static void main(string[] args) { try (bufferedinputstream in = new bufferedinputstream( new fileinputstream("d:\\input.txt"))) { dumpbytes(in, 2551 - 10, 20); } catch (ioexception ex) { ex.printstacktrace(); } } private static void dumpbytes(inputstream in, long offset, int length) throws ioexception { long pos = in.skip(offset); while (length >= 0) { int b = in.read(); if (b == -1) { break; } b &= 0xff; system.out.printf("%6d: 0x%02x %s '%c'%n", pos, b, tobinarystring(b), (32 <= b && b < 127 ? (char)b : '?')); --length; ++pos; } } private static string tobinarystring(int b) { string s = integer.tobinarystring(b); s = "00000000" + s; s = s.substring(s.length() - 8); s = s.substring(0, 4) + "_" + s.substring(4); return s; }
Comments
Post a Comment