[reportlab-users] Text with multiple languages.

Mon Oct 27 15:44:14 EDT 2014

On 10/26/2014 12:48 PM, Tim Butram wrote:
>
> Unfortunately, I found that generating pdfs with multiple languages 
> was too difficult, and moved to using HTML and utf-8.
>
> A web/browser based solution worked really well for me.
>
> On Oct 26, 2014 12:15 AM, "Steve Young" <wereapwhatwesow at gmail.com 
> <mailto:wereapwhatwesow at gmail.com>> wrote:
>
>     Hi Tim, I am working on a project with similar needs as you
>     mentioned.  If you found answers would you mind sharing?
>
>     Thanks.
>
>     On Thursday, July 31, 2014 2:37:08 PM UTC-5, Tim Butram wrote:
>
>         I'm generating a document that has a wide varity of languages,
>         from Arabic to English to Chinese. Unfortunatly, I'm unable to
>         find a single font that will allow me to print such a varity
>         of characters. Additionally, I don't have previous knowledge
>         of what language the String I'm trying to print out which
>         makes it difficult to switch fonts based on the contents of
>         the text.
>
>         What are some solutions to this problem?
>
>         Thanks,
>         Tim
>

I had earlier questions on this list regarding multi-font PDF 
generation. It seems that reportlab doesn't attempt to do font 
substitution, but the browsers do.

I basically gave up on the attempt at that time to generate a Chinese & 
English combination text, because at the time it would have had to have 
been coded in Python 2, and I was a beginning user of Python 3.

The general technique that seems to be necessary is to do your own font 
selection based on incoming codepoints, and explicitly tell reportlab to 
switch fonts when needed.

One could investigate the ways that browser to font substitution since 
several are open source.

Alternately, one could do a Google search on the topic of font 
substitution, and perhaps find academic writings or practical solutions 
to the matter.

My best speculation without resorting to the above is as follows:

given a collections of scripts that are expected to arrive in UTF-8, 
find a suitable font for use with each script. Determine the "coverage" 
of each font with respect to codepoints. Choose a current font, and 
select it for use. For each codepoint processed, if it is not covered by 
the current font, choose as current a font that does include that 
codepoint, and select it for use.

The fonts may overlap in their coverage; you need to determine whether 
you wish to do any refinements to the above algorithm, such as having a 
preference for particular fonts for particular codepoints, and adjust 
your "coverage" tables appropriatley, rather than using everything that 
is covered by a font, and/or algorithms for choosing a particular font 
among several that might cover the current codepoint (possibly by doing 
look back or look ahead to determine the length of the run of characters 
that might be covered by a particular font).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20141027/0d9b7475/attachment.html>