[reportlab-users] BUGFIX: Re:    in paragraph
    Robin Becker 
    robin at reportlab.com
       
    Fri Dec  5 06:07:31 EST 2008
    
    
  
Dirk Holtwick wrote:
...........
> 
> I tested it and it works fine. Another suggestion is not to test for 
> "\x0a" any more to profit from the more elaborated whitespace table for 
> usual cases. Here is my modification:
> 
> -----------------8<---------------[cut here]
> def split(text, delim=None):
>     if type(text) is str:
>         text = text.decode('utf8')
>     if type(delim) is str:
>         delim = delim.decode('utf8')
>     elif delim is None:
>         return [uword.encode('utf8') for uword in _wsc_re_split(text)]
>     return [uword.encode('utf8') for uword in text.split(delim)]
> -----------------8<---------------[cut here]
> 
> Dirk
.........
unfortunately that version suffers in speed for the common case when no \xa0 is 
present. Below are my timings for my split and Dirk's (which I called _plit so 
the names are the same length in case that altered the timing somehow).
common case no nbsp
> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from reportlab.platypus.paragraph import split" "split(u'The
>  difference in default timer function is because on Windows, clock() has microsecond granularity but time()\'s granulari
> ty is 1/60th of a second; on Unix, clock() has 1/100th of a second granularity and time() is much more precise.  On eith
> er platform, the default timer functions measure wall clock time, not the CPU time.  This means that other processes run
> ning on the same computer may interfere with the timing.  The best thing to do when accurate timing is necessary is to r
> epeat the timing a few times and use the best time.  The -r option is good for this; the default of 3 repetitions is pro
> bably enough in most cases.  On Unix, you can use clock() to measure CPU time.')"
> 10000 loops, best of 3: 173 usec per loop
> 
> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from reportlab.platypus.paragraph import _plit" "_plit(u'The
>  difference in default timer function is because on Windows, clock() has microsecond granularity but time()\'s granulari
> ty is 1/60th of a second; on Unix, clock() has 1/100th of a second granularity and time() is much more precise.  On eith
> er platform, the default timer functions measure wall clock time, not the CPU time.  This means that other processes run
> ning on the same computer may interfere with the timing.  The best thing to do when accurate timing is necessary is to r
> epeat the timing a few times and use the best time.  The -r option is good for this; the default of 3 repetitions is pro
> bably enough in most cases.  On Unix, you can use clock() to measure CPU time.')"
> 1000 loops, best of 3: 233 usec per loop
> 
less common, one nbsp both take about the same time. Dirk's time is faster 
presumably because the one nbsp reduces the number of matches.
> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from reportlab.platypus.paragraph import split" "split(u'The
>  difference in default timer function is because on Windows, clock() has microsecond granularity but\xa0time()\'s granul
> arity is 1/60th of a second; on Unix, clock() has 1/100th of a second granularity and time() is much more precise.  On e
> ither platform, the default timer functions measure wall clock time, not the CPU time.  This means that other processes
> running on the same computer may interfere with the timing.  The best thing to do when accurate timing is necessary is t
> o repeat the timing a few times and use the best time.  The -r option is good for this; the default of 3 repetitions is
> probably enough in most cases.  On Unix, you can use clock() to measure CPU time.')"
> 1000 loops, best of 3: 230 usec per loop
> 
> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from reportlab.platypus.paragraph import _plit" "_plit(u'The
>  difference in default timer function is because on Windows, clock() has microsecond granularity but\xa0time()\'s granul
> arity is 1/60th of a second; on Unix, clock() has 1/100th of a second granularity and time() is much more precise.  On e
> ither platform, the default timer functions measure wall clock time, not the CPU time.  This means that other processes
> running on the same computer may interfere with the timing.  The best thing to do when accurate timing is necessary is t
> o repeat the timing a few times and use the best time.  The -r option is good for this; the default of 3 repetitions is
> probably enough in most cases.  On Unix, you can use clock() to measure CPU time.')"
> 1000 loops, best of 3: 230 usec per loop
so I guess we should stick with the test unless there's a compelling reason for 
removing it.
Are you suggesting that our table is more comprehensive than the unicode default 
argument set? I got it from the C code that implements the unicodectype so I 
hope it is the same for unicode.split.
-- 
Robin Becker
    
    
More information about the reportlab-users
mailing list