python - numpy float: 10x slower than builtin in arithmetic operations? -
i getting weird timings following code:
import numpy np s = 0 in range(10000000): s += np.float64(1) # replace np.float32 , built-in float
- built-in float: 4.9 s
- float64: 10.5 s
- float32: 45.0 s
why float64
twice slower float
? , why float32
5 times slower float64?
is there way avoid penalty of using np.float64
, , have numpy
functions return built-in float
instead of float64
?
i found using numpy.float64
slower python's float, , numpy.float32
slower (even though i'm on 32-bit machine).
numpy.float32
on 32-bit machine. therefore, every time use various numpy functions such numpy.random.uniform
, convert result float32
(so further operations performed @ 32-bit precision).
is there way set single variable somewhere in program or in command line, , make numpy functions return float32
instead of float64
?
edit #1:
numpy.float64 10 times slower float in arithmetic calculations. it's bad converting float , before calculations makes program run 3 times faster. why? there can fix it?
i want emphasize timings not due of following:
- the function calls
- the conversion between numpy , python float
- the creation of objects
i updated code make clearer problem lies. new code, seem see ten-fold performance hit using numpy data types:
from datetime import datetime import numpy np start_time = datetime.now() # 1 of following lines uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0 in range(10000000): s = (s + 8) * s % 2399232 print(s) print('runtime:', datetime.now() - start_time)
the timings are:
- float64: 34.56s
- float32: 35.11s
- float: 3.53s
just hell of it, tried:
from datetime import datetime import numpy np
start_time = datetime.now() s = np.float64(1) in range(10000000): s = float(s) s = (s + 8) * s % 2399232 s = np.float64(s) print(s) print('runtime:', datetime.now() - start_time)
the execution time 13.28 s; it's 3 times faster convert float64
float
, use is. still, conversion takes toll, overall it's more 3 times slower compared pure-python float
.
my machine is:
- intel core 2 duo t9300 (2.5ghz)
- winxp professional (32-bit)
- activestate python 3.1.3.5
- numpy 1.5.1
edit #2:
thank answers, me understand how deal problem.
but still know precise reason (based on source code perhaps) why code below runs 10 times slow float64
float
.
edit #3:
i rerun code under windows 7 x64 (intel core i7 930 @ 3.8ghz).
again, code is:
from datetime import datetime import numpy np start_time = datetime.now() # 1 of following lines uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0 in range(10000000): s = (s + 8) * s % 2399232 print(s) print('runtime:', datetime.now() - start_time)
the timings are:
- float64: 16.1s
- float32: 16.1s
- float: 3.2s
now both np
floats (either 64 or 32) 5 times slower built-in float
. still, significant difference. i'm trying figure out comes from.
end of edits
cpython floats allocated in chunks
the key problem comparing numpy scalar allocations float
type cpython allocates memory float
, int
objects in blocks of size n.
internally, cpython maintains linked list of blocks each large enough hold n float
objects. when call float(1)
cpython checks if there space available in current block; if not allocates new block. once has space in current block initializes space , returns pointer it.
on machine each block can hold 41 float
objects, there overhead first float(1)
call next 40 run faster memory allocated , ready.
slow numpy.float32 vs. numpy.float64
it appears numpy has 2 paths can take when creating scalar type: fast , slow. depends on whether scalar type has python base class can defer argument conversion.
for reason numpy.float32
hard-coded take slower path (defined _work0
macro), while numpy.float64
gets chance take faster path (defined _work1
macro). note scalartypes.c.src
template generates scalartypes.c
@ build time.
you can visualize in cachegrind. i've included screen captures showing how many more calls made construct float32
vs. float64
:
float64
takes fast path
float32
takes slow path
updated - type takes slow/fast path may depend on whether os 32-bit vs 64-bit. on test system, ubuntu lucid 64-bit, float64
type 10 times faster float32
.
Comments
Post a Comment