[Scons-dev] Merge PR #235 before release

Gary Oberbrunner garyo at oberbrunner.com
Wed May 27 08:42:42 EDT 2015


On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik <techtonik at gmail.com>
wrote:

> What I need is a bulletproof way to convert from anything to unicode. This
> requires some kind of escaping to go forward and back. Some helper
> methods like u2b() (unicode to binary) and b2u(). I am quite surprised that
> so far I found nothing for this "simple" case.
>

That's because in general the encoding of the "binary" string is unknown.
Is it ascii, utf-8, Windows CP-1252, shift-JIS, or something else?  You
can't decode such a string to Unicode without knowing the encoding.  Check
out the python-3 branch where we've been working through some of those
issues.  Your u2b is "easy" if you assume you want the binary to be utf-8
encoded, which is normally safe; this conversion is guaranteed to work.
Your b2u is not so easy.  You can't just assume utf-8 as you might think;
if the string has invalid utf-8 bytes it'll raise an error or generate
dummy chars depending on the args you pass to str.decode().  At least it'll
get mangled if it's in a different encoding than you expect.

-- 
Gary
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/scons-dev/attachments/20150527/0cea5293/attachment.html>


More information about the Scons-dev mailing list