Timing tools(clock_gettime vs RDTSC)

# Timing tools(clock_gettime vs RDTSC) ###### tags: `Software Optimization` #### 整理：BY.Y ,SleepyCat1108 2021/09/28 #### src: 1. http://www.jonathanbeard.io/tutorials/Timing.html 2. https://www.itread01.com/p/149683.html 3. http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/#1-latency-measurement-in-user-space ### clock_gettime() clock_gettime是一個vsyscall，為了減少user space到kernel space的overhead ![](https://i.imgur.com/fmHFsOY.png) 原文： >This function takes about **250ns** on average to return. What this means is that if the thing you are **attempting to time something anywhere close to 250ns in execution time** then you will be measuring mostly the variation in time to call clock_gettime and not your code. 翻譯： >如果量測的時間在ns等級，那clock_gettime()的代價相對太大了，不適合使用clock_gettime() >所以使用RDTSC! There are some of the advantages of clock_gettime()： >1) its **relatively portable** on Linux machines(follows POSIX standard) >2) it is constant across multiple cores.(不懂原因?) ```cpp= #include <cstdlib> #include <iostream> #include <cstring> #include <time.h> int main( int argc, char **argv ) { struct timespec curr_time; std::memset( &curr_time, 0, sizeof( struct timespec ) ); if( clock_gettime( CLOCK_REALTIME, &curr_time ) != 0 ) { perror( "Failed to get time!\n" ); } std::cout << "Seconds: " << curr_time.tv_sec << "\n"; std::cout << "Nanoseconds: " << curr_time.tv_nsec << "\n"; std::cout << "Combined: " << (double) curr_time.tv_sec + ((double) curr_time.tv_nsec * 1.0e-9 ) << "\n"; return( EXIT_FAILURE ); } ``` ### Read Time-Stamp Counter(RDTSC) 首先，我們要先知道TSC是什麼東東？根據wikipedia，他是x86架構下的一個暫存器，它紀錄了CPU cycles。 ![](https://i.imgur.com/uJfoHeY.png) 我整理了兩個前提： 1. 必須關掉frequency scaling，讓CPU的頻率維持在定值 2. 必須在同一個processor上計算(set CPU affinity) >1. 很直覺，比如我們計算TSC2 - TSC1 = 2G cycles >我們的處理器是2GHz，那就是經過1秒 >如果我們的處理器頻率非定值，我們就無法得知2G cycles到底經過了多久 >----------------------我是分隔線---------------------- >2. 原因是：執行RDTSC`1`和RDTSC`2`可能發生在不同processor上，而每一個processor都有自己的TSC暫存器。 ![](https://i.imgur.com/1Jqe0vL.png) ### Conclusion >一般情況下(**測量時間>>250ns**)，使用clock_gettime()即可，clock_gettime可以跨processor(原因還不清楚)。 >若要求更精確的時間，可以使用RDTSC，但是必須固定在一個core上進行(set CPU affinity)