encoding - iconv equivalent code in Java doesn't return same results -


i have requirement encode file utf-8 shift_jis. previously, done using iconv command below

iconv -f utf8 -t sjis $input_file  

the input file supply returns error, saying

illegal input sequence @ position 2551

i have written java code:

fileinputstream fis = new fileinputstream(         "input.txt"); inputstreamreader in = new inputstreamreader(fis, "utf-8"); fileoutputstream fos = new fileoutputstream("output.txt"); outputstreamwriter out = new outputstreamwriter(fos, "shift_jis");          int val = 0;         stringbuilder sb = new stringbuilder();          while((val =in.read() )!= -1){             system.out.println(integer.tohexstring(val));             sb.append((char)val);         }         out.write(sb.tostring());         out.flush();         fis.close();         out.close(); 

the code executes fine same input file , doesn't return error.

am missing here?

joachim. looks answer. have added code in question. getting unmappable character error. fails encode normal characters text "hello". doing wrong anywhere

    private static charsetdecoder decoder(string encoding) {         return charset.forname(encoding).newdecoder()             .onmalformedinput(codingerroraction.report)             .onunmappablecharacter(codingerroraction.report);     }      private static charsetencoder encoder(string encoding) {         return charset.forname(encoding).newencoder()             .onmalformedinput(codingerroraction.report)             .onunmappablecharacter(codingerroraction.report);     }      public static void main(string[] args) throws ioexception {         fileinputstream fis = new fileinputstream(         "d:\\input.txt"); inputstreamreader in = new inputstreamreader(fis, decoder("utf-8")); fileoutputstream fos = new fileoutputstream("d:\\output.txt"); outputstreamwriter out = new outputstreamwriter(fos, encoder("shift_jis"));         char[] buffer = new char[4096];         int length;          while ((length = in.read(buffer)) != -1) {             out.write(buffer, 0, length);         }          out.flush();     } 

that should merely problem concerning utf-8. inputstream , start hex dumping position 2551, or bit earlier preceding text.

especially interesting is, iconv delivers there.


a dump:

so can see data caused problem.

public static void main(string[] args) {     try (bufferedinputstream in = new bufferedinputstream(             new fileinputstream("d:\\input.txt"))) {         dumpbytes(in, 2551 - 10, 20);     } catch (ioexception ex) {         ex.printstacktrace();     } }  private static void dumpbytes(inputstream in, long offset, int length)         throws ioexception {     long pos = in.skip(offset);     while (length >= 0) {         int b = in.read();         if (b == -1) {             break;         }         b &= 0xff;         system.out.printf("%6d: 0x%02x %s '%c'%n", pos, b,             tobinarystring(b), (32 <= b && b < 127 ? (char)b : '?'));          --length;         ++pos;     } }  private static string tobinarystring(int b) {     string s = integer.tobinarystring(b);     s = "00000000" + s;     s = s.substring(s.length() - 8);     s = s.substring(0, 4) + "_" + s.substring(4);     return s; } 

Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -