python - numpy float: 10x slower than builtin in arithmetic operations? -
i getting weird timings following code:
import numpy np s = 0 in range(10000000): s += np.float64(1) # replace np.float32 , built-in float - built-in float: 4.9 s
- float64: 10.5 s
- float32: 45.0 s
why float64 twice slower float? , why float32 5 times slower float64?
is there way avoid penalty of using np.float64, , have numpy functions return built-in float instead of float64?
i found using numpy.float64 slower python's float, , numpy.float32 slower (even though i'm on 32-bit machine).
numpy.float32 on 32-bit machine. therefore, every time use various numpy functions such numpy.random.uniform, convert result float32 (so further operations performed @ 32-bit precision).
is there way set single variable somewhere in program or in command line, , make numpy functions return float32 instead of float64?
edit #1:
numpy.float64 10 times slower float in arithmetic calculations. it's bad converting float , before calculations makes program run 3 times faster. why? there can fix it?
i want emphasize timings not due of following:
- the function calls
- the conversion between numpy , python float
- the creation of objects
i updated code make clearer problem lies. new code, seem see ten-fold performance hit using numpy data types:
from datetime import datetime import numpy np start_time = datetime.now() # 1 of following lines uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0 in range(10000000): s = (s + 8) * s % 2399232 print(s) print('runtime:', datetime.now() - start_time) the timings are:
- float64: 34.56s
- float32: 35.11s
- float: 3.53s
just hell of it, tried:
from datetime import datetime import numpy np
start_time = datetime.now() s = np.float64(1) in range(10000000): s = float(s) s = (s + 8) * s % 2399232 s = np.float64(s) print(s) print('runtime:', datetime.now() - start_time) the execution time 13.28 s; it's 3 times faster convert float64 float , use is. still, conversion takes toll, overall it's more 3 times slower compared pure-python float.
my machine is:
- intel core 2 duo t9300 (2.5ghz)
- winxp professional (32-bit)
- activestate python 3.1.3.5
- numpy 1.5.1
edit #2:
thank answers, me understand how deal problem.
but still know precise reason (based on source code perhaps) why code below runs 10 times slow float64 float.
edit #3:
i rerun code under windows 7 x64 (intel core i7 930 @ 3.8ghz).
again, code is:
from datetime import datetime import numpy np start_time = datetime.now() # 1 of following lines uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0 in range(10000000): s = (s + 8) * s % 2399232 print(s) print('runtime:', datetime.now() - start_time) the timings are:
- float64: 16.1s
- float32: 16.1s
- float: 3.2s
now both np floats (either 64 or 32) 5 times slower built-in float. still, significant difference. i'm trying figure out comes from.
end of edits
cpython floats allocated in chunks
the key problem comparing numpy scalar allocations float type cpython allocates memory float , int objects in blocks of size n.
internally, cpython maintains linked list of blocks each large enough hold n float objects. when call float(1) cpython checks if there space available in current block; if not allocates new block. once has space in current block initializes space , returns pointer it.
on machine each block can hold 41 float objects, there overhead first float(1) call next 40 run faster memory allocated , ready.
slow numpy.float32 vs. numpy.float64
it appears numpy has 2 paths can take when creating scalar type: fast , slow. depends on whether scalar type has python base class can defer argument conversion.
for reason numpy.float32 hard-coded take slower path (defined _work0 macro), while numpy.float64 gets chance take faster path (defined _work1 macro). note scalartypes.c.src template generates scalartypes.c @ build time.
you can visualize in cachegrind. i've included screen captures showing how many more calls made construct float32 vs. float64:
float64 takes fast path

float32 takes slow path

updated - type takes slow/fast path may depend on whether os 32-bit vs 64-bit. on test system, ubuntu lucid 64-bit, float64 type 10 times faster float32.
Comments
Post a Comment