[reportlab-users] BUGFIX: Re:    in paragraph
    Dirk Holtwick 
    dirk.holtwick at gmail.com
       
    Fri Dec  5 06:17:45 EST 2008
    
    
  
Hi Robin,
I did not think about speed ;) You are absolutely right that we should 
just handle the special case regarding u"\xa0" with our own whitespace 
table.
 > Are you suggesting that our table is more comprehensive than the unicode
 > default argument set? I got it from the C code that implements the
 > unicodectype so I hope it is the same for unicode.split.
I don't know, I just trusted your data ;)
Another thing I would suggest is to rename the functions "split" and 
"strip" to something like "split_" or "customSplit" to avoid confusion 
with the functions from the "string" module.
Cheers
Dirk
Robin Becker schrieb:
> Dirk Holtwick wrote:
> ...........
>>
>> I tested it and it works fine. Another suggestion is not to test for 
>> "\x0a" any more to profit from the more elaborated whitespace table 
>> for usual cases. Here is my modification:
>>
>> -----------------8<---------------[cut here]
>> def split(text, delim=None):
>>     if type(text) is str:
>>         text = text.decode('utf8')
>>     if type(delim) is str:
>>         delim = delim.decode('utf8')
>>     elif delim is None:
>>         return [uword.encode('utf8') for uword in _wsc_re_split(text)]
>>     return [uword.encode('utf8') for uword in text.split(delim)]
>> -----------------8<---------------[cut here]
>>
>> Dirk
> .........
> 
> unfortunately that version suffers in speed for the common case when no 
> \xa0 is present. Below are my timings for my split and Dirk's (which I 
> called _plit so the names are the same length in case that altered the 
> timing somehow).
> 
> common case no nbsp
>> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from 
>> reportlab.platypus.paragraph import split" "split(u'The
>>  difference in default timer function is because on Windows, clock() 
>> has microsecond granularity but time()\'s granulari
>> ty is 1/60th of a second; on Unix, clock() has 1/100th of a second 
>> granularity and time() is much more precise.  On eith
>> er platform, the default timer functions measure wall clock time, not 
>> the CPU time.  This means that other processes run
>> ning on the same computer may interfere with the timing.  The best 
>> thing to do when accurate timing is necessary is to r
>> epeat the timing a few times and use the best time.  The -r option is 
>> good for this; the default of 3 repetitions is pro
>> bably enough in most cases.  On Unix, you can use clock() to measure 
>> CPU time.')"
>> 10000 loops, best of 3: 173 usec per loop
>>
>> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from 
>> reportlab.platypus.paragraph import _plit" "_plit(u'The
>>  difference in default timer function is because on Windows, clock() 
>> has microsecond granularity but time()\'s granulari
>> ty is 1/60th of a second; on Unix, clock() has 1/100th of a second 
>> granularity and time() is much more precise.  On eith
>> er platform, the default timer functions measure wall clock time, not 
>> the CPU time.  This means that other processes run
>> ning on the same computer may interfere with the timing.  The best 
>> thing to do when accurate timing is necessary is to r
>> epeat the timing a few times and use the best time.  The -r option is 
>> good for this; the default of 3 repetitions is pro
>> bably enough in most cases.  On Unix, you can use clock() to measure 
>> CPU time.')"
>> 1000 loops, best of 3: 233 usec per loop
>>
> 
> less common, one nbsp both take about the same time. Dirk's time is 
> faster presumably because the one nbsp reduces the number of matches.
> 
>> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from 
>> reportlab.platypus.paragraph import split" "split(u'The
>>  difference in default timer function is because on Windows, clock() 
>> has microsecond granularity but\xa0time()\'s granul
>> arity is 1/60th of a second; on Unix, clock() has 1/100th of a second 
>> granularity and time() is much more precise.  On e
>> ither platform, the default timer functions measure wall clock time, 
>> not the CPU time.  This means that other processes
>> running on the same computer may interfere with the timing.  The best 
>> thing to do when accurate timing is necessary is t
>> o repeat the timing a few times and use the best time.  The -r option 
>> is good for this; the default of 3 repetitions is
>> probably enough in most cases.  On Unix, you can use clock() to 
>> measure CPU time.')"
>> 1000 loops, best of 3: 230 usec per loop
>>
>> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from 
>> reportlab.platypus.paragraph import _plit" "_plit(u'The
>>  difference in default timer function is because on Windows, clock() 
>> has microsecond granularity but\xa0time()\'s granul
>> arity is 1/60th of a second; on Unix, clock() has 1/100th of a second 
>> granularity and time() is much more precise.  On e
>> ither platform, the default timer functions measure wall clock time, 
>> not the CPU time.  This means that other processes
>> running on the same computer may interfere with the timing.  The best 
>> thing to do when accurate timing is necessary is t
>> o repeat the timing a few times and use the best time.  The -r option 
>> is good for this; the default of 3 repetitions is
>> probably enough in most cases.  On Unix, you can use clock() to 
>> measure CPU time.')"
>> 1000 loops, best of 3: 230 usec per loop
> 
> so I guess we should stick with the test unless there's a compelling 
> reason for removing it.
> 
> Are you suggesting that our table is more comprehensive than the unicode 
> default argument set? I got it from the C code that implements the 
> unicodectype so I hope it is the same for unicode.split.
    
    
More information about the reportlab-users
mailing list