# Performance Tuning: naivepool * The source code is at [`unknowntpo/naivepool`](https://github.com/unknowntpo/naivepool) ## Performance comparison between different implementation of naivepool <img src="https://i.imgur.com/obpiC4l.png"></img> ![]() ## 1. Only restrict the number of goroutines First implementation: just limit the number of workers In this implementation, we use a jobChan to receive jobs from the user, and then just start a new goroutine for every job. * trace the program running simple CPU bound task <img src="https://i.imgur.com/8Q6D3tB.png"></img> * It take more time to schedule the task, rather than do real work. ## 2. Reuse goroutines ### 2.1 Pool: For-select loop, workers: for-select loop This version of naivepool uses for-select loop to receive a job from `Pool.jobChan`, and send it to workers through `workerChan`. The worker use for-select pattern to receive job from Pool until channel is closed. #### The cost of select statement Select statement needs to... * Step1: shuffle all channel it waits. * Step2: check each channel one by one (doesn't need lock), to see if it's ready for communication, if it does, do communication and return. * Step3: Prepare for blocking on all channels. * Step4: block * 1. go for every channel * 2. lock that channel, add goroutine with `select` statement to that channel's wait queue. * 3. Check if channel is ready, if true, unlock that channel, prepare for send / receive value from channel. * if not ready, unlock the channel, go to next channel ```c Scase *select(Select *sel) { randomize channel order; for(;;) { // Phase 1. foreach(Scase *cas in sel) { if(chansend/recv_nonblock(cas->c, ...)) return cas; } // Phase 2. selectstate = nil; foreach(Scase *cas in sel) { lock(cas->c); cas->sg->g = g; cas->sg->selectstatep = &selectstate; addwaiter(&cas->c->sendq/recvq, cas->sg); if(isready(cas->c)) { unlock(c); goto ready; } unlock(cas->c); } // Phase 3. block(); ready: CAS(&selectstate, nil, 1); foreach(Scase *cas in sel) { lock(cas->c); removewaiter(&cas->c->sendq/recvq, cas->sg); unlock(cas->c); } // If we were unblocked by a sync chan operation, // the communication has completed. if(selectstate > 1) return selectstate; // denotes the completed case } } ``` See [Go channels on steroids](https://docs.google.com/document/d/1yIAYmbvL3JxOKOjuCyon7JhW4cSv1wy5hC0ApeGMV9s/pub) for more information. ### 2.2 Use for-range loop on both pool and workers * Less lock contention! * Because a worker / dispatcher just need to lock channel 1 time to add itself to channel's recvq / sendq * See [chanrecv](https://cs.opensource.google/go/go/+/master:src/runtime/chan.go;l=455?q=chanrecv&sq=&ss=go%2Fgo), [chansend](https://cs.opensource.google/go/go/+/master:src/runtime/chan.go;l=159?q=chansend&sq=&ss=go%2Fgo) ### 2.3 Remove jobChan This version of naivepool use pool.Schedule to send job directly to pool.workerChan, The worker use for-range pattern to receive job from Pool until channel is closed. ### 2.4 Auto-Scaling * Growing * If `len(workerChan) >= 2/3 cap(workerChan)` and `p.NumIdleWorkers < some ratio` * dispatch more workers * Shrinking