[reportlab-users] speeding up parse_utf8?

Marius Gedminas reportlab-users@reportlab.com
Tue, 14 Oct 2003 12:56:49 +0300


--NMuMz9nt05w80d4+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 14, 2003 at 09:01:38AM +0100, Robin Becker wrote:
> >My tests with Python 2.3 show parse_utf8 to be about 5x slower, both for
> >short and for long strings
[...]
> weird I tried the code at home with 2.3 and still see this
>=20
> C:\>\tmp\ttt.py
> <function parse_utf8 at 0x007F6DF0> 7.49100005627
> <function <lambda> at 0x007F6070> 10.3350000381

Here

  mg: /tmp$ python2.3 ttt.py
  <function parse_utf8 at 0x402fa064> 11.9326989651
  <function <lambda> at 0x40215cdc> 17.1550990343

> C:\>cat \tmp\ttt.py
> from time import time
> import codecs
> from reportlab.pdfbase.ttfonts import parse_utf8
> nparse_utf8=3Dlambda x, decode=3Dcodecs.lookup('utf8')[1]:
> map(ord,decode(x)[0])
> assert nparse_utf8('abcdefghi')=3D=3Dparse_utf8('abcdefghi')
>=20
> for fn in (parse_utf8,nparse_utf8):
>     t0 =3D time()
>     for i in xrange(500):
>         map(fn,i*'abcdefghi')

You're calling the function many times with single-character arguments.
After changing this line to

          fn(i*'abcdefghi')

>     print str(fn), time()-t0

I get

  mg: /tmp$ python2.3 ttt2.py
  <function parse_utf8 at 0x402fa064> 4.49495100975
  <function <lambda> at 0x40215cdc> 0.863283991814

> Also
>=20
> C:\>\python\lib\timeit.py  -s "import codecs;nparse_utf8=3Dlambda x,
> decode=3Dcodecs.lookup('utf8')[1]:map(ord,decode(x)[0])
> " map(nparse_utf8,5000*'abcdefghi')
> 10 loops, best of 3: 4.72e+005 usec per loop
>=20
> C:\>\python\lib\timeit.py -s "from reportlab.pdfbase.ttfonts import
> parse_utf8" map(parse_utf8,5000*'abcdefghi')
> 10 loops, best of 3: 3.58e+005 usec per loop

Same circumstances here: parse_utf8 is faster for single character
strings.

Marius Gedminas
--=20
Premature optimization is the root of all evil.
                -- D.E. Knuth

--NMuMz9nt05w80d4+
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/i8hhkVdEXeem148RAnyUAKCIv8AMgRxBvCjcdgalbDSgRyOldQCeKvTe
WQ2T2xYtJbrujOh87OIu5+g=
=3qpC
-----END PGP SIGNATURE-----

--NMuMz9nt05w80d4+--