[reportlab-users] Status of Python 3 port?

Henning von Bargen H.vonBargen at t-p.com
Fri Feb 5 03:28:57 EST 2010



> From: Andy Robinson <andy at reportlab.com>

> Subject: Re: [reportlab-users] Status of Python 3 port?

> On 4 February 2010 22:00, Neil Schemenauer <nas-python at arctrix.com> wrote:

> > Generally you want to handle text inside your application as unicode

> > strings and only encode when writing data. ?However, for Reportlab,

> > since a PDF really is a sequence of bytes (as far as I know) it may

> > make sense to manipulate byte strings in many places.

>

> This is one major reason why I want to do fairly deep work which may

> change APIs. We could be clear about what's a byte array and what's

> natural-language text. Right now the code flips back and forth

> internally in many places and forgives people who muddle them.


I think this is actually the main problem.
As I said a year or longer ago (when I was adopting the Paragraph class
for wordaxe), right now the user can feed the reportlab APIs with
unicode or utf-8 encoded strings (in Python 2.x speech).
That caused be a lot of trouble then.

What a new ReportLab really needs is a clear distinction:
which APIs expect "text" (Python 2: unicode; Python 3: str) and
which APIs expect "encoded text" (Python 2: str; Python 3: bytes)
and what encoding must be used.

There should be a clear border defined somewhere:
On this side of the border, use "text" (2.x: unicode, 3.x: str)
on the other side, the APIs expects "encoded text" (2.x: str, 3.x: bytes)

As Neil mentioned, a natural place for this border would be the code
that actually writes data to the PDF - which usually won't be called
directly by the user.
So, for everyday usage, all the APIs would expect "text".

Apart from allowing a Python 3 compatible ReportLab (*),
this would make the USAGE of the library more clean and easy
(whatever Python version is used).

It WILL break backward-compatibility, though.
But I don't see this as a big problem; it is quite comparable to
the situation a few years ago when moving from ReportLab 1.x to 2.0.

Another concern is memory usage.
Again, I think this is not much of an issue.

(*)
Basically:
if python_version < 3:
text, encoded_text = (unicode, str)
else:
text, encoded_text = (str, bytes)

- Then THINK about every function definition,
- document the expected type (is it text or encoded_text)
of parameters and return values,
- create a debug version of the code that uses
assert statements to verify the correct type
of parameters and return values.
- run the test suite with non-english text (e.g. containing umlauts).

Once that's done,
creating a Python 3 version of reportlab could be a snap.

Note:
Once Python 2 compatibility can be given up (that means:
in ten years or so?), the parameter types can be
expressed as a syntax element like:
class Paragraph:
def __init__(self, text:text, ...)

Henning



More information about the reportlab-users mailing list