c - Cost of context switch between threads of same process, on Linux -


is there empirical data on cost of context switching between threads of same process on linux (x86 , x86_64, mainly, of interest)? i'm talking number of cycles or nanoseconds between last instruction 1 thread executes in userspace before getting put sleep voluntarily or involuntarily, , first instruction different thread of same process executes after waking on same cpu/core.

i wrote quick test program performs rdtsc in 2 threads assigned same cpu/core, stores result in volatile variable, , compares sister thread's corresponding volatile variable. first time detects change in sister thread's value, prints difference, goes looping. i'm getting minimum/median counts of 8900/9600 cycles way on atom d510 cpu. procedure seem reasonable, , numbers seem believable?

my goal estimate whether, on modern systems, thread-per-connection server model competitive or outperform select-type multiplexing. seems plausible in theory, transition performing io on fd x fd y involves merely going sleep in 1 thread , waking in another, rather multiple syscalls, it's dependent on overhead of context switching.

(disclaimer: isn't direct answer question, it's suggestions hope helpful).

firstly, numbers you're getting sound they're within ballpark. note, however, interrupt / trap latency can vary lot among different cpu models implementing same isa. it's different story if threads have used floating point or vector operations, because if haven't kernel avoids saving/restoring floating point or vector unit state.

you should able more accurate numbers using kernel tracing infrastructure - perf sched in particular designed measure , analyse scheduler latency.

if goal model thread-per-connection servers, shouldn't measuring involuntary context switch latency - in such server, majority of context switches voluntary, thread blocks in read() waiting more data network. therefore, better testbed might involve measuring latency 1 thread blocking in read() being woken same.

note in well-written multiplexing server under heavy load, transition fd x fd y involve same single system call (as server iterates on list of active file descriptors returned single epoll()). 1 thread ought have less cache footprint multiple threads, through having 1 stack. suspect way settle matter (for definition of "settle") might have benchmark shootout...


Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -