Unicode

OK. I’m an idiot.

So here’s today’s BIG Unicode lesson; understand this and, maybe, half your troubles will evaporate.
Unicode is NOT a “code”.
No. Unicode is a kind of platonic ideal of which everything else is an “encoding”.
ASCII is an encoding. UTF-8 is an encoding. That weird character set you got with Portuguese accented letters is an encoding.
Hence the verb “encode” means to turn a Unicode string into a byte string.
And “decode” means to turn a byte string (say one imported from another application) back into the pure Unicode. 
I repeat. You DO NOT encode byte-strings into Unicode-strings. You decode them into Unicode. And then you re-encode them when you want to export them (as, say, XML or JSON).
read –> decode –> do stuff in your app –> encode –> write
Thanks … that’s all.

Leave a comment