[Scons-dev] Merge PR #235 before release

anatoly techtonik techtonik at gmail.com
Thu May 28 03:28:43 EDT 2015


I found a way to convert any binary string to Unicode without crashing -
http://stackoverflow.com/a/27527728/239247 That would correctly
convert all `ascii` characters (and will probably make it possible to use
ANSI graphics if unicode font supports that), but it will not work for other
utf-8 characters.

Python 3 adds some surrogateescape, but that is not present in Python 2.
http://stackoverflow.com/questions/19649463/how-to-do-surrogateescape-in-python2
I don't know why they called it "surrogate" - it is a freaky word.

On Wed, May 27, 2015 at 4:33 PM, Kenny, Jason L <jason.l.kenny at intel.com> wrote:
> I would agree with this.
>
>
>
> In general the OS today store file data ( ie the file system data not the
> data in the file) in Unicode ( be it utf-16 or utf-8). On Linux this is not
> always the case it could be big5 or some other locale encoding.  On Linux
> there are means to see what the “native” encoding is to use it.
>
>
>
> I should note that the idea of converting binary to Unicode does not really
> exist. The point of a binary string to is to hold random data ( ie like a
> double in the raw form 64-bit vs the dec values of 1.2385). One can assume
> that it is a certain code page encoding and convert from that. And like I
> stated above there are api to see what the locale code page encoding is and
> that can be used to convert the code to the local ANSI/OEM encoding. This is
> different from a binary string.
>
>
>
> Jason
>
>
>
>
>
>
>
> From: Scons-dev [mailto:scons-dev-bounces at scons.org] On Behalf Of Gary
> Oberbrunner
> Sent: Wednesday, May 27, 2015 7:43 AM
> To: SCons developer list
> Subject: Re: [Scons-dev] Merge PR #235 before release
>
>
>
>
>
> On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik <techtonik at gmail.com>
> wrote:
>
> What I need is a bulletproof way to convert from anything to unicode. This
> requires some kind of escaping to go forward and back. Some helper
> methods like u2b() (unicode to binary) and b2u(). I am quite surprised that
> so far I found nothing for this "simple" case.
>
>
> That's because in general the encoding of the "binary" string is unknown.
> Is it ascii, utf-8, Windows CP-1252, shift-JIS, or something else?  You
> can't decode such a string to Unicode without knowing the encoding.  Check
> out the python-3 branch where we've been working through some of those
> issues.  Your u2b is "easy" if you assume you want the binary to be utf-8
> encoded, which is normally safe; this conversion is guaranteed to work.
> Your b2u is not so easy.  You can't just assume utf-8 as you might think; if
> the string has invalid utf-8 bytes it'll raise an error or generate dummy
> chars depending on the args you pass to str.decode().  At least it'll get
> mangled if it's in a different encoding than you expect.
>
>
>
> --
>
> Gary
>
>
> _______________________________________________
> Scons-dev mailing list
> Scons-dev at scons.org
> https://pairlist2.pair.net/mailman/listinfo/scons-dev
>



-- 
anatoly t.


More information about the Scons-dev mailing list