cuda - Is any optimization done if one run the same kernel with the same input again and again? -

- August 15, 2015

if run same kernel with same input several times, this

#define n 2000 for(int = 0; < 2000; i++) {     mykernel<<<1,120>>>(...); }

what happens? timed , played around n: halving n (to 1000), halved time took.

yet i'm bit cautious belive runs kernel 2000 times because speed non-cuda code dramatic (~900 sec ~0.9 sec). kind of optimization cuda in case? caching results?

setting cuda_launch_blocking=1 didn't change nothing.

mykernel replaces inner loop in non-cuda code.

hardware geforce gtx 260

cuda doesn't optimization of kind, or caching of results. if launch 2000 kernels, runs 2000 kernels.

however, kernel launches asynchronous, measuring time taken launch 2000 kernel instances in loop isn't same total execution time of 2000 kernel instances. seeing artifact of incorrect time measurement , not true speed-up.

Search This Blog

Expalin

cuda - Is any optimization done if one run the same kernel with the same input again and again? -

Comments

Post a Comment

Popular posts from this blog

c++ - error: use of deleted function -

delphi - ESC/P programming! -

Cursor error with postgresql, pgpool and php -