[reportlab-users] Incorrect character composition

Glenn Linderman v+python at g.nevcal.com
Tue Apr 21 06:50:26 EDT 2015


On 4/21/2015 2:51 AM, Robin Becker wrote:
> Glenn,
>
> my reading of the control sequence(s) is that these glyphs are being 
> individually positioned in PDF; I see 12 separate Tm operators.

I agree.

> I ideally we should see a single BT with a string containing 14 bytes 
> which would imply that acrobat handles all the glyph positioning.

I think we are on the same wavelength here, but I think you meant to say 
"Adobe Reader (or other PDF display tool)" where you said "Acrobat".  I 
think it is the case that "Acrobat", (or other PDF generation tool), is 
doing all the positioning, and encoding it into the PDF file.

The below seems to be referring to the Nuance generated file, the 
Acrobat file used HEX codes.

"Ideally", of course, refers to the way it should work if the PDF 
viewer's renderer was responsible for combined glyph positioning. Of 
course, if it was, it should also be responsible for rendering the 
kerning too, and then you wouldn't be able to do right justification 
very well... it would have to be predicted in one place and matched in 
the other... so I think the PDF technique is to have the viewer only 
convert curves to pixels, following instructions by the PDF creator as 
to where those curves should be placed, actually produces more 
consistent results across platforms and devices... as much as it hurts 
to have to do the calculations for the Td or Tm parameters when 
generating the PDF.

>
> I believe that the text strings are actually using two bytes per 
> glyph; the map looks like
>
> 6 beginbfchar
> <006d> <00e3>
> <047a> <0303>
> <0690> <0186>
> <0699> <0190>
> <0727> <0254>
> <072d> <025b>
> endbfchar

Ah, yes, I missed looking at the map... so I was unaware that it was 
legal to use the character codes themselves in the <>, I thought <> was 
only for HEX codes... but then again, that was just by observation of 
various PDF files, not from the spec... And I've not tried to understand 
very many.

>
> so the byte strings required correspond to the first of each pair.
>
> 006d = 00 m = \000m
> 047a = 04 z = ^Dz   the tilde
> 06?? = 06 ?? = ^f?
> 0727 = 07 ' = ^G'
> 072d = 07 - = ^G-
>
> etc etc. My mailer can't actually cope with the odd characters in the 
> 06 lines.

Understood... my mailer seemed to drop those control characters, also.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20150421/973b4cbe/attachment.html>


More information about the reportlab-users mailing list