[reportlab-users] Spreadsheet Table
    Tomasz Świderski 
    contact at tomaszswiderski.com
       
    Tue Feb 23 08:48:03 EST 2010
    
    
  
You are right. rowHeights copying in __init__ seem unnecessary. I'm not 
sure why I did that.
This Longtable stuff seems very strange to me. I understand how it works 
but I have no idea why it is used? In old implementation Table instance 
calculate rowHeights, spanRanges, nosplitRanges etc. each time Table 
split because Table's data change. Longtables stuff prevents unnecessary 
rowHeights calculations – it stops calculations when it detects that no 
more rows will fit into current page (so there is no sense to calculate 
more row heights). It makes sense because calculated rowHeights are not 
reused. But why they are not reused in first place? It is easier to 
calculate all rowHeights and pass them to splited Tables in _splitRows 
method – this way Table have to calculate row heights only once and 
Longtable optimization stuff is unnecessary.
I can guess intention of this repeated row height calculation method. If 
table have variable width elements like paragraphs it is possible to 
shrink table widths if next frame is not so wide like previous. If 
column widths shirks, heights must be recalculated. BUT this situation 
will never happen because on split Table implementation passes rowWidths 
instead of _argW! Just look at this code snippet from Table _splitRows 
method :
R1 = self.__class__(data[:repeatRows]+data[n:],colWidths=self._colWidths,
    rowHeights=self._argH[:repeatRows]+self._argH[n:],
    repeatRows=repeatRows, repeatCols=repeatCols,
    splitByRow=splitByRow)
On split Table created new parts passing colWidths and argH. Since 
colWidths contains fixed col widths (calculated by _calc_widths), 
recalculation of row heights makes no sens to me – event for variable 
size elements like paragraphs. Or maybe I just don't understand something.
I removed Longtable stuff in my implementation. I calculate row heights 
once and reuse them after split. My implementation can reuse most of 
Table internal state (all except rowpositions, colpositions, _spanRects, 
_vBlocks and _hBlocks). This should provide some performance boost when 
dealing with spans or nosplits.
It is hard to compare performance of implementations since reportlab 2.4 
does not contain this patch: 
http://two.pairlist.net/pipermail/reportlab-users/2010-February/009275.html 
but my spreadsheet implementation does. So I decided to compare 3 
versions: spreadsheet, reportlab 2.4 with patch (optimizedlongtable), 
and reportlab 2.4 without patch. Results below:
SpreadsheetTable generation time (1000 rows): 2.19117712975.
OptimizedLongTable generation time (1000 rows): 1.98596405983.
LongTable generation time (1000 rows): 2.94729304314.
SpreadsheetTable generation time (2000 rows): 5.23209905624.
OptimizedLongTable generation time (2000 rows): 4.06802105904.
LongTable generation time (2000 rows): 7.60760498047.
SpreadsheetTable generation time (3000 rows): 9.08862996101.
OptimizedLongTable generation time (3000 rows): 6.21168804169.
LongTable generation time (3000 rows): 14.6669168472.
SpreadsheetTable generation time (4000 rows): 13.6766881943.
OptimizedLongTable generation time (4000 rows): 8.18623518944.
LongTable generation time (4000 rows): 23.6615509987.
SpreadsheetTable generation time (5000 rows): 19.132267952.
OptimizedLongTable generation time (5000 rows): 10.3774158955.
LongTable generation time (5000 rows): 35.2170841694.
SpreadsheetTable generation time (6000 rows): 26.460157156.
OptimizedLongTable generation time (6000 rows): 13.2904510498.
LongTable generation time (6000 rows): 55.368956089.
SpreadsheetTable generation time (7000 rows): 38.1424150467.
OptimizedLongTable generation time (7000 rows): 17.2318229675.
LongTable generation time (7000 rows): 76.6997680664.
SpreadsheetTable generation time (8000 rows): 47.3637280464.
OptimizedLongTable generation time (8000 rows): 19.8427381516.
LongTable generation time (8000 rows): 100.836438894.
SpreadsheetTable generation time (9000 rows): 60.7862567902.
OptimizedLongTable generation time (9000 rows): 23.2700841427.
LongTable generation time (9000 rows): 134.000416994.
As you can see reportlab 2.4 with patch is faster. It's because of line 
and background commands rewriting stuff in spreadsheet implementation. I 
believe it can be improved – I used simplest possible way just to get it 
working.
For example snippet from drawbackground method:
visible = []
for row_num in xrange(sr, er + 1):
    if not self._is_visible_row(row_num):
        continue
    visible.append(row_num)
So it calls _is_visible_row many times if background command span on all 
data :P It can be easily improved – it's just matter of time and effort.
I'm currently between jobs, so I will probably find time to improve 
performance. There is still a lot of room for improvements :) I believe, 
I can make some improvements to span commands – current implementation 
is very slow. Comparison of spreadsheet implementation with patched 
reportlab 2.4 below:
SpreadsheetTable generation time with span (1000 rows): 4.28560996056.
OptimizedTable generation time with span (1000 rows): 9.91152501106.
SpreadsheetTable generation time with span (2000 rows): 14.6250700951.
OptimizedTable generation time with span (2000 rows): 48.9020631313.
SpreadsheetTable generation time with span (3000 rows): 32.16908288.
OptimizedTable generation time with span (3000 rows): 144.50296998.
Best regards,
Tomasz Świderski
P.S. Reportlab 2.4 LongTable breaks span commands on split :(
    
    
More information about the reportlab-users
mailing list