The problem is quite nasty: processing the unicode may cause the encoding for writing to fail. While processing the unicode characters, you may add characters which are not in the set which is used to write the output file. Possibly, the character set of the output file must become something different from what the environment variables tell.