c++ - Why a Windows console with Chinese code page set can show a UTF-16 encoded character? -


per msdn:

"for microsoft c/c++ compiler, source , execution character sets both ascii."

c++03

2.1 phases of translation

"..any source file character not in basic source character set (2.2) replaced universal-character-name designates character. (an implementation may use internal encoding, long actual extended character encountered in source file, , same extended character expressed in source file universal-character-name (i.e. using \uxxxx notation), handled equivalently.)"

2.13.2 character literals

"a universal-character-name translated encoding, in execution character set, of character named. if there no such encoding, universal-character-name translated implementation-defined encoding."

to test execution character set used msvc++, wrote following code:

wchar_t *str = l"中"; unsigned char *p = reinterpret_cast<unsigned char*>(str); (int = 0; < sizeof(l"中"); ++i) {    printf ("%x ", *(p + i)); } 

the output shows 2d 4e 0 0, , 0x4e2d utf-16 encoding of chinese character. conclude: utf-16 used execution character set msvc (my version: 2012 4.5.50709)

after, tried print character out windows console. since default locale used console "c", set locale code page 936 representing simplified chinese characters.

// use execution environment locale setting, 936 wchar_t *str = l"中"; char* locale = setlocale(lc_all, ""); wprintf (l"%ls\n", str); 

which outputs:

what i'm curious is, how can character encoded in utf-16 decoded windows console locale(decoder) set non-utf-16(ms code page 936)? how can happen?

i think it.

in microsoft c++ 2008(probably 2005+), crt functions wprintf, wcout implemented such convert wide string literal l"中" encoded in utf-16, under hood, match current locale/code page setting. happens here l"中" converted bytes d6 d0 in code page 936 simplified chinese.

i wrong setlocale set console code page. set current program code page used crt functions during "conversion". changing console code page, command chcp or win api setconsoleoputputcp() achieves.

since console's default page 936, character can correctly shown w/o problem.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -