[reportlab-users] MediaWiki's "Download as PDF" feature uses ReportLab but has a problem
Yao Ziyuan
yaoziyuan at gmail.com
Tue Jan 10 11:59:16 EST 2012
On Wed, Jan 11, 2012 at 12:40 AM, Andy Robinson <andy at reportlab.com> wrote:
> On 10 January 2012 16:19, Ziyuan Yao <yaoziyuan at gmail.com> wrote:
>>
>>
>> Actually, ReportLab doesn't need a wordwrap=CJK option. Instead, ReportLab
>> can wrap text (Western or CJK or mixed) in a unified manner:
>>
>> IF there is a whitespace near the page's right margin THEN
>> wrap after that whitespace;
>> ELSE IF there is a CJK character near the page's right margin THEN
>> wrap after that CJK character;
>> ELSE
>> wrap forcibly at the page's right margin.
>>
>
>
> I understand the principles fully, but I am sorry that we haven't yet
> found the time to implement this. When we launched it was
> pre-Unicode. ReportLab has one major Asian-language commercial
> customer, who is quite happy with their output now as they don't mix
> languages or have long english technical expressions. When we first
> wrote the package it was before Python's unicode support. We would
> probably also need some support from C code for speed.
>
> If some contributors (e.g. you?) have time to work on this and supply
> a better wrapping algorithm, we would be very happy to review code and
> migrate everything onto it.
>
> The ideal wrapping algorithm must support
> (a) CJK wrapping when detected
> (b) hyphenation (for long German words etc), and some sane rules for
> breaking long URLs
> (c) inline non-text objects, such as equation images used heavily by Wikipedia
> (d) support for varying fonts, and maybe even kerning or horizontal
> compression, and
> (e) right-to-left text for Arabic.
>
> This is not a trivial problem. We "cheated" badly by having an
> English and then a CJK wrapping algorithm which is how we got to the
> present position.
>
> I would love to have some more people working on it but sadly it's not
> a requirement for current customers and our team is pretty busy these
> days...
I'm not familiar with Python. But I have a simple way for ReportLab to
process CJK line-wrapping transparently:
Before everything, for every CJK character found in the text, insert a
U+200B ("zero-width space") after it. This will logically make every
CJK character a possible line-wrapping point.
Then, recognize U+200B as a kind of whitespace in ReportLab's non-CJK
line-wrapping code.
This way, ReportLab won't need a separate wordwrap=CJK wrapping
algorithm. It will be able to handle CJK using the same wrapping
algorithm for Western text.
>
>
> Best Regards,
> --
> Andy Robinson
> Managing Director
> ReportLab Europe Ltd.
> Thornton House, Thornton Road, Wimbledon, London SW19 4NG, UK
> Tel +44-20-8405-6420
> _______________________________________________
> reportlab-users mailing list
> reportlab-users at lists2.reportlab.com
> http://two.pairlist.net/mailman/listinfo/reportlab-users
More information about the reportlab-users
mailing list