[reportlab-users] encoding errors (was: Small errors in reportlab code)

Henning von Bargen henning.vonbargen at arcor.de
Wed Jan 17 14:32:28 EST 2007


Dirk Holtwick wrote:

> BTW, wouldn't it be better to ignore useless Unicode and UTF-8 decoding

> errors by using e.g.

> s.decode('utf8','ignore')

> instead of

> s.decode('utf8')

> ?

> Bye,

> Yours Dirk


Last autumn, I began porting the ReportLab integration of my deco-cow
hyphenation library
(http://deco-cow.sourceforge.net) to ReportLab 2.0.
Though the documentation on the the web site is still mentioning RL 1.19,
you can download the RL 2 port from
http://sourceforge.net/project/showfiles.php?group_id=105867

During the port, I was struggling with the RL 2 code mainly because of
varying encoding issues.
I think that the RL 2 code is a little bit "unclean" concerning unicode.
There are various places
in the code where either unicode or string variables can be used, and are
encoded/decoded
on-the-fly. It could probably be improved, but I don't fully understand it
in-depth and I'm not
aware of possible side-effects.
Perhaps more "public" documentation (for the internal helper functions, too)
and assert statements
throughout the code could help.
I'm not 100% sure about it, but from what I remember it seemed that in most
places, that
"encoding/decoding on the fly" routines are using utf8, but other encoding
are used in some places.

After developing the hyphenation library to a level that worked ok for me, I
tried running the
whole RL test suite against it, and I found some issues (in my modified RL
code with hyphenation)
that belong to the "encoding/decoding" kind of issues.
One problem was in genreference.py, and another one in graphdocpy.py
('ascii' encoding is used here!)
I remember also that I had trouble with rl_codecs.py, for example regarding
the "shy" character (I am using it
for hyphenation because that's what it is intended for, and it integrates
nicely with the Adobe Reader
text selection feature). And there was a problem with the bullets in the
pythonpoint sample.

I'd like to donate the hyphenation library to the ReportLab open-source
project (and see it fully
integrated), but these unicode issues prevent me from saying that the
library is production quality code
(the SiSiSi implementation for german hyphenation is so-called spaghetti
code anyway, but it is working
quite well).
I'd be happy if someone from the RL development team could take a look (or
two) at the library and
perhaps fix these unicode bugs; I don't have the time now and in the near
future to do it myself.

Henning



More information about the reportlab-users mailing list