python - numpy float: 10x slower than builtin in arithmetic operations? -

i getting weird timings following code:

import numpy np s = 0 in range(10000000):     s += np.float64(1) # replace np.float32 , built-in float

built-in float: 4.9 s
float64: 10.5 s
float32: 45.0 s

why float64 twice slower float? , why float32 5 times slower float64?

is there way avoid penalty of using np.float64, , have numpy functions return built-in float instead of float64?

i found using numpy.float64 slower python's float, , numpy.float32 slower (even though i'm on 32-bit machine).

numpy.float32 on 32-bit machine. therefore, every time use various numpy functions such numpy.random.uniform, convert result float32 (so further operations performed @ 32-bit precision).

is there way set single variable somewhere in program or in command line, , make numpy functions return float32 instead of float64?

edit #1:

numpy.float64 10 times slower float in arithmetic calculations. it's bad converting float , before calculations makes program run 3 times faster. why? there can fix it?

i want emphasize timings not due of following:

the function calls
the conversion between numpy , python float
the creation of objects

i updated code make clearer problem lies. new code, seem see ten-fold performance hit using numpy data types:

from datetime import datetime import numpy np  start_time = datetime.now()  # 1 of following lines uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0  in range(10000000):     s = (s + 8) * s % 2399232  print(s) print('runtime:', datetime.now() - start_time)

the timings are:

float64: 34.56s
float32: 35.11s
float: 3.53s

just hell of it, tried:

from datetime import datetime import numpy np

start_time = datetime.now()  s = np.float64(1) in range(10000000):     s = float(s)     s = (s + 8) * s % 2399232     s = np.float64(s)  print(s) print('runtime:', datetime.now() - start_time)

the execution time 13.28 s; it's 3 times faster convert float64 float , use is. still, conversion takes toll, overall it's more 3 times slower compared pure-python float.

my machine is:

intel core 2 duo t9300 (2.5ghz)
winxp professional (32-bit)
activestate python 3.1.3.5
numpy 1.5.1

edit #2:

thank answers, me understand how deal problem.

but still know precise reason (based on source code perhaps) why code below runs 10 times slow float64 float.

edit #3:

i rerun code under windows 7 x64 (intel core i7 930 @ 3.8ghz).

again, code is:

from datetime import datetime import numpy np  start_time = datetime.now()  # 1 of following lines uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0  in range(10000000):     s = (s + 8) * s % 2399232  print(s) print('runtime:', datetime.now() - start_time)

the timings are:

float64: 16.1s
float32: 16.1s
float: 3.2s

now both np floats (either 64 or 32) 5 times slower built-in float. still, significant difference. i'm trying figure out comes from.

end of edits

cpython floats allocated in chunks

the key problem comparing numpy scalar allocations float type cpython allocates memory float , int objects in blocks of size n.

internally, cpython maintains linked list of blocks each large enough hold n float objects. when call float(1) cpython checks if there space available in current block; if not allocates new block. once has space in current block initializes space , returns pointer it.

on machine each block can hold 41 float objects, there overhead first float(1) call next 40 run faster memory allocated , ready.

slow numpy.float32 vs. numpy.float64

it appears numpy has 2 paths can take when creating scalar type: fast , slow. depends on whether scalar type has python base class can defer argument conversion.

for reason numpy.float32 hard-coded take slower path (defined _work0 macro), while numpy.float64 gets chance take faster path (defined _work1 macro). note scalartypes.c.src template generates scalartypes.c @ build time.

you can visualize in cachegrind. i've included screen captures showing how many more calls made construct float32 vs. float64:

float64 takes fast path

float64 takes fast path

float32 takes slow path

float32 takes slow path

updated - type takes slow/fast path may depend on whether os 32-bit vs 64-bit. on test system, ubuntu lucid 64-bit, float64 type 10 times faster float32.

Search This Blog

Expalin

python - numpy float: 10x slower than builtin in arithmetic operations? -

Comments

Post a Comment

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

c# - Regex to match full lines of text excluding crlf -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -