[reportlab-users] Incorrect character composition

Thu Apr 23 01:03:29 EDT 2015

On 4/21/2015 5:45 AM, Robin Becker wrote:
> On 21/04/2015 11:50, Glenn Linderman wrote:
>> On 4/21/2015 2:51 AM, Robin Becker wrote:
>>> Glenn,
>>>
>>> my reading of the control sequence(s) is that these glyphs are being
>>> individually positioned in PDF; I see 12 separate Tm operators.
>>
>> I agree.
>>
>>> I ideally we should see a single BT with a string containing 14 
>>> bytes which
>>> would imply that acrobat handles all the glyph positioning.
>>
>> I think we are on the same wavelength here, but I think you meant to 
>> say "Adobe
>> Reader (or other PDF display tool)" where you said "Acrobat".  I 
>> think it is the
>> case that "Acrobat", (or other PDF generation tool), is doing all the
>> positioning, and encoding it into the PDF file.
>
> yes the positioning is not being done by the renderer (acrobat 
> reader/evince etc etc).
>
> If that is the case then positioning has to be done by the software 
> that produces the PDF ie illustrator/acrobat reader pro/reportlab. If 
> this is true then there's no point in including the GPOS information 
> into the embedded fonts.
>
> If reportlab has to do the positioning of glyphs it should not affect 
> the existing standard mechanisms. Probably we'll need a cumbersome, 
> slow and fairly complicated text output mechanism.

Sounds like it. Having read only small snippets of reportlab code, I 
wonder if there is a reasonable way to say up front... I want "typeset 
quality" (or some number of variant levels thereof†) versus I want "speed".

>>
>> The below seems to be referring to the Nuance generated file, the 
>> Acrobat file
>> used HEX codes.
>>
>> "Ideally", of course, refers to the way it should work if the PDF 
>> viewer's
>> renderer was responsible for combined glyph positioning. Of course, 
>> if it was,
>> it should also be responsible for rendering the kerning too, and then 
>> you
>> wouldn't be able to do right justification very well... it would have 
>> to be
>> predicted in one place and matched in the other... so I think the PDF 
>> technique
>> is to have the viewer only convert curves to pixels, following 
>> instructions by
>> the PDF creator as to where those curves should be placed, actually 
>> produces
>> more consistent results across platforms and devices... as much as it 
>> hurts to
>> have to do the calculations for the Td or Tm parameters when 
>> generating the PDF.
> .........
>
> well I think kerning is a separate issue. Here we are talking about a 
> standard unicode approach to composite glyph construction. 
> Pairs/groups of glyphs are supposed to be treated in a specific way; 
> kerning is optional.

Kerning and composite glyph construction are, indeed, two separate 
items. Both, however, are needed for quality typesetting. I suppose a 
case could be made that composite glyph construction is needed for 
accuracy, not just quality typesetting. Kerning only for quality 
typesetting, and in that sense it could be considered optional... except 
to folks that want to produce quality typesetting.... and really, I 
guess there are only two reasons to use PDF files... ubiquitous format 
(display everywhere), and quality typesetting (printers/publishers 
accept the PDF format, and often accept _only_ PDF format).

†I could see the following "levels" of quality / complexity, maybe 
individual features should be turned on/off separately, or maybe that 
would overly complicate the code, and some ordering among the following 
items would provide increasing capability without a combinatorial explosion.
1. Speed
2. Kerning
3. combining diacritical character composition
4. other combining characters (ligatures, alternate glyphs in localized 
context)
5. RTL language support
"Speed" is what you seem to have now, and there is certainly a benefit 
to it when the other needs are not present, and any additional work is 
going to sacrifice at least some speed. The order I list is not based on 
knowledge of the code, but is mostly based on "bang for the coding 
buck"... Kerning is useful for all Latin & Cyrillic languages (I'm not 
sure about others, not needed for Chinese, I guess), and a significant 
implementation exists, if it can be reasonably integrated; diacriticals 
allow support of many more Latin-based languages; the last two are 
critical to supporting some languages, but the incremental market is no 
doubt smaller.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20150422/d682466b/attachment.html>