contributed by < RZHuangJeff
>
linux2021
threadtask_t
This is a structure that holds tasks that assigned to the thread pool, these tasks are collected by singly linked list.
func
: A pointer that points to the function that is asked to be done.arg
: Address of the argument that will be passed to func
.future
: Pointer that points to the corresponding future results of given task.next
: Points to next task in job queue.jobqueue_t
This structure represents job queue that holds tasks waiting for a worker to deal with.
head
, tail
: These pointers points to the head and the tail of list of assigned works, respectively.cond_nonempty
: This is a condition variable used to inidcate that the queue is not empty.rwlock
: This is a mutex lock that can be acquired in critical section while ones want to access the queue.tpool_future_t
This structure represents the future of an assigned task.
flag
: This field holds flags set on this future.result
: This is where the return value of corresponding task will be placed in.mutex
: This mutex lock performs similar task as rwlock
mentioned before, which avoids potential errors while different threads tries to access such structure at the same time.cond_finished
: This condition variable allows a signal that indicates the given task has finished being broadcasted and waited for.tpool_t
This is the main structure represents the pool of threads.
count
: This field indicates the amount of active workers running in the pool.workers
: A list that holds worker threads.jobqueue
: A pointer that points to the job queue.tpool_create
This function aims to create a thread pool with count
workers processing given work. At very begining, a jobqueue
is created via involking function jobqueue_create
where assigned works lined up waiting for processing (line 3), then, a structure struct __threadpool
is created (line 4), after such creation, an error checking is performed and NULL
is returned while there is something wrong when creating the queue and thread pool (line 5 - 10). Next, worker routes (function jobqueue_fetch
) are created by involking function pthread_create
, when facing any problem that causes failure on creating a worker route, workers that were already created is canceled and joined, moreover, queue and pool are also destroyed (line 15 - 25). If every thing works fine that workers are all ready for dealing with arriving of jobs, the thread pool created before is returned (line 27). After all, while it failed for allocating space for workers, the queue and pool is destroyed and NULL
is returned (line 30 - 32).
jobqueue_create
This function aims to allocate the the space for new jobqueue_t
entity (line 3). head
and tail
is initialized to NULL
at very begining, cond_nonempty
and rwlock
are also initialized by corresponding initializing function (line 4 - 8).
jobqueue_destroy
This function aims to destroy a given job queue. All non-destroyed tasks in the queue are canceled by setting its cancel flag, and the task itself is freed (line 3 - 18), after all, the mutex lock and condition variable corresponding to given job queue is destroyed (line 20 - 21), and the queue itself is freed (line 22).
tpool_apply
This function will attach the given function to the end of job queue. At begining, an entity of threadtask_t
and future that corresponding to the task are created (line 6 - 7). The job function and the argument are assigned to newly created task (line 9). Finally, the newly created task is attached to the head of job queue (line 11 - 17), when the queue is empty initially, a signal is broadcasted through cond_nonempty
to tell workers that there is a job waiting for dealing with (line 16). It is important that asking for rwlock
before accessing the job queue (line 10), since this queue is shared between workers and the caller, and that lock should be unlocked after modifying is done (line 18). Program from line 19 to line 25 is dealing with exceptional situations, that is, failed to allocate space for new task or future.
jobqueue_fetch
This is the most important function in this implementation of thread pool, since that this is a worker function which will keep fetching tasks from given job queue and dealing with it, this worker function will blocking wait a task arrives while the job queue is empty (line 14 - 15), which will not consume any calculating power comparing with polling wait.
Just mentioned before, since job queue is shared by lots of workers, a worker would need to ask for rwlock
before accesses queue (line 10). Notice that the lock will be released while a thread is waiting for a condition happened via involking pthread_cond_wait
, and it will be re-locked when the function returns.
The worker can only be canceled when it is waiting for a job, which is set by function pthread_setcancelstate
(line 11, 17).
After a job was fetched from the job queue (line 18 - 29), worker will check whether pointer func
points to a valid function (line 32), if so, the state of that task will be checked before it actually runs, when it finds that the task is cancelled, that task will simply be discarded (line 34 - 37), otherwise, the state of the task is set to __FUTURE_RUNNING
and is involked with argument assigned before (line 39, 43).
After return from the job function, the state of the task will again be checked whether it was set to __FUTURE_DESTROYED
, if so, the task will be freed (line 45 - 49), otherwise, the state of that task is set to __FUTURE_FINISHED
, the result is assigned to ret_value
, and finally, a signal is sent via condition variable cond_finished
.
When a worker recieves a task whose func
points to NULL
, it will break out from its working loop and exit the thread that it stayed in (line 58 - 67).
After all, we can find that the entire working loop is wrapped by two macros : pthread_cleanup_push
and pthread_cleanup_pop
, which will be expended to program shown below.
We can figure out that these macros acturally forms a do ... while
loop. In this section, an entity of __pthread_unwind_buf_t
is created (line 3), and the given function and argument are assigned to local variable __cancel_route
and __cancel_arg
, respectively. Next, an important statement is __sigsetjmp
which is similar as sigsetjmp
, both of them set a breakpoint for unconditional long jump, we can find out that the clean up function we set is involked in the if
statement around line 7 - 10. The macro __glibc_unlikely(__not_first_call)
, which is expended to __builtin_expect((__not_first_call), 0)
, such macro provides branch prediction information to compiler to optimize such if
statement. Such condition indicates that whether the return from __sigsetjmp
is from long jump (from a call to pthread_exit
or pthread_cancel
) or not, if so, the clean up functions set before will be involked to perform clean up process.
TODO: clarify the use of cancellation of POSIX threads. You should emphasize on its high-level motiviation and purpose at first glance.
tpool_future_create
This function aims to allocate space for a new future and initialize fields of it to their default value.
tpool_future_get
With involking this function, a caller can perform two kinds of waiting on given future. While seconds
is set to 0, this function will blocking wait desired future until recieves its cond_finished
signal (line 19). Otherwise, this function will wait given future upto seconds
seconds. To perform timed wait, it waits cond_finished
with function pthread_cond_timedwait
which will return not only when it recieved desired condition, but also when ran out of time (line 7 - 17). For the situation that pthread_cond_timedwait
returns due to running out of time, its return value will equal to ETIMEDOUT
, with performing checking on that error code, a future can be marked as __FUTURE_TIMEOUT
properly (line 13 - 17).
tpool_future_destroy
This function will destroy the given future. For futures that has already been marked as __FUTURE_FINISHED
or __FUTURE_CANCELLED
, they are destroyed directly (line 5 - 10). Otherwise, it is marked as __FUTURE_DESTROYED
(line 11 14) and will be freed while a worker detect such flag.
tpool_join
This function will send pool->count
null tasks to the job queue (line 3 - 5), which will make all of the workers stop their working loop, after that it waits for all workers to join (line 6 - 7), finally, the job queue an the pool itself is freed (8 - 10).
__jobqueue_fetch_cleanup
This is the clean up function that will be involked while a worker is cancelled or exit normally, which will unlock the mutex lock while that lock is initially locked by cancelled worker. With this clean up function, we can ensure that cancelling a worker in any time will not cause a deadlock.
Time complexity taken on queue popping
In the original version of thread pool, a worker will run through all tasks in waiting queue before it really targets a task and pops it from the queue, following is corresponding part of code (the for
loop around line 6 - 8):
How serious will queue traversing affect the performace depends on the tasks assigned to the thread pool. While assigned tasks are simple that the speed of workers completeing a task is faster than producer assigning a task, the length of queue can hardly grouth, which makes traversing take little amount of time. But this becomes different while producer assigning heavy load tasks that workers need lots of time to complete a task, which makes the length of queue becoming large easily, and traversing through a large queue consumes lots of time.
Commit dc80584 fixes such problem, following figures present performace improved from one to another:
heavy work load
trivial work load
As shown in figure, there is a tremendous improvement for complex tasks, while there is only a slightly difference for trivial tasks.
Improve scalability
Following figure shows different result of scalability of various implementation of queue.
Throughput of executing on different number of cores
As shown in figure, we can see that the original version has poor scalability, only a little increament on throughput when executed on more and more cores.
On the other hand, we can figure out that both atomic and O(1) have better scalabitiy than original one, since there is massive increament on throughput while number of executing core grows.
We can see that scalability of the O(1) version is almost the same as one of atomic version. The detail of O(1) queue is described here, with that description, we can conclude that scalability of an implementation of thread pool can be improved by shrinking size of critical sections.