Try   HackMD

Linux 核心設計: Timer 及其管理機制

Copyright (慣C) 2019 宅色夫

直播錄影

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

簡介

一個作業系統之所以能運作,仰賴著某種計時的機制,通常會借助硬體振盪器產生週期性訊號,並透過軟體計數。Linux 核心在系統啟動之際,會參照 HZ 的數值,將計時器中斷次數對應於 jiffies 數值,在核心計算後換算為時間間隔,這也是排程的時間依據。不過現代的 Linux 核心已不只如此,在 tickless kernel (即 Dynamic Tick Timer,簡稱 dyn-tick) 的引入後,新型態的 NO_HZ 處理機制就大異於典型週期 tick。

HRT (high-resolution timer) 的引入,除了帶來微秒 (microsecond) 等級的時鐘精準度,更將 Linux 核心的時間管理機制推上另一個新層次,不僅大為強化系統分析的精準度,也是 Linux 核心強化即時處理的關鍵特徵。本議程預計從 timer 中斷處理開始探討,涵蓋 jiffies, POSIX clock, Timekeeping, timer resolution delay, deferred scheduling 等議題。

核心如何看待時間

  • The kernel knows the preprogrammed tick rate, so it knows the time between any two successive timer interrupts. This period is called a tick and is equal to 1/(tick rate) seconds.
  • This is how the kernel keeps track of both wall time and system uptime.
  • Wall time—the actual time of day—is important to user-space applications.
  • The system uptime—the relative time since the system booted—is useful to both kernel-space and user-space.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Tickless kernel

  • Linux kernel supports an option known as a tickless operation. When a kernel is built with the CONFIG_HZ configuration option set, the system dynamically schedules the timer interrupt in accordance with pending timers. Instead of firing the timer interrupt every, say, 1ms, the interrupt is dynamically scheduled and rescheduled as needed.

  • With a tickless system, moments of idleness are not interrupted by unnecessary time interrupts, reducing system power consumption also reduction in overhead.

  • Tickless kernel practical experience

  • Status of Linux dynticks

jiffies

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

jiffy 這詞源不明,像是 "in a jiffy" (一瞬間) 一類的詞語,據說來自 18 世紀英格蘭
In computer engineering, a jiffy is often the time between two successive clock cycles. In electrical engineering, a jiffy is the time to complete one AC (alternating current) cycle.

程式碼的使用:

unsigned long time_stamp = jiffies;
unsigned long next_tick = jiffies + 1;
unsigned long later = jiffies + 5 * HZ;
unsigned long fraction = jiffies + HZ / 10;
  • The jiffies variable has always been an unsigned long, 32 bits in size on 32-bit architectures and 64-bits on 64-bit architectures.
  • With a tick rate of 100, a 32-bit jiffies variable would overflow in about 497 days. With HZ increased to 1000, however, that overflow now occurs in just 49.7 days!
  • If jiffies were stored in a 64-bit variable on all architectures, then for any reasonable HZ value the jiffies variable would never overflow in anyone’s lifetime
  • A second variable is also defined in extern u64 jiffies_64;
    • Code that accesses jiffies simply reads the lower 32 bits of jiffies_64.
    • The function get_jiffies_64() can be used to read the full 64-bit value.

考慮以下範例:

unsigned long timeout = jiffies + HZ/2; /* timeout in 0.5s */
/* do some work ... */
/* then see whether we took too long */
if (timeout > jiffies) {
    /* we did not time out, good ... */
} else {
    /* we timed out, error ... */
}

ELC-E 2018 Edinburgh: The End of Time19 years to go

the kernel provides four macros for comparing tick counts that correctly handle wraparound in the tick count. They are in . Listed here are simplified versions of the macros:

#define time_after(unknown, known) \
    ((long)(known) - (long)(unknown) < 0)
#define time_before(unknown, known) \
    ((long)(unknown) - (long)(known) < 0)
#define time_after_eq(unknown, known) \
    ((long)(unknown) - (long)(known) >= 0)
#define time_before_eq(unknown, known) \
    ((long)(known) - (long)(unknown) >= 0)
  • time_after(a,b) returns true if the time a is after time b.
  • A special function is needed because 32-bit architectures cannot atomically access both 32-bit words in a 64-bit value.
  • The special function locks the jiffies count via the xtime_lock lock before reading
  • Suppose b is 253, and five ticks later jiffies has wrapped around to 2. We would therefore expect
    • time_after(2,253) to return true. And it does (using int8_t to denote a signed 8-bit value):
    • (int8_t) 253 - (int8_t) 2 == -3 - 2 == -5 < 0

clock and timer

Embedded Systems - Timer/Counter

  • A timer is a specialized type of clock which is used to measure time intervals. A timer that counts from zero upwards for measuring time elapsed is often called a stopwatch. It is a device that counts down from a specified time interval and used to generate a time delay, for example, an hourglass is a timer.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • Architectures provide two hardware devices to help with time keeping

    • Real-Time Clock (RTC)
      • provides a nonvolatile device for storing the system time
      • On boot, the kernel reads the RTC and uses it to initialize the wall time, which is stored in the xtime variable
    • System Timer
      • The idea behind the system time is—to provide a mechanism for driving an interrupt at a periodic rate
  • 2 ways to implement System Timer

    • Electronic clock: oscillates at a programmable frequency
    • Counter
      • Set to some initial value and decrements at a fixed rate until the counter reaches zero, an interrupt is triggered.
      • On x86, the primary system timer is the programmable interrupt timer (PIT)
      • Other include the local APIC timer and the processor’s time stamp counter (TSC).

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • Clocks and Timers 相關議題

  • The simplest data structure is time_t, defined in the header <time.h>

  • On most Unix systems, Linux included, the type is a simple typedef to the C long type:

typedef long time_t;
#include <sys/time.h>
struct timeval {
    time_t tv_sec; /* seconds */
    suseconds_t tv_usec; /* microseconds */
};

The timespec data structure is defined in <linux/time.h> as:

struct timespec {
    time_t tv_sec; /* seconds */
    long tv_nsec; /* nanoseconds */
};
  • The xtime variable stores the current time and date; it is a structure of type timespec having two fields

    • tv_sec - Stores the number of seconds that have elapsed since midnight of January 1, 1970 (UTC)
    • This date is called the epoch (reference date). Most Unix systems base their notion of the current wall time as relative to this epoch.
    • tv_nsec - Stores the number of nanoseconds that have elapsed within the last second (its value ranges between 0 and 999,999,999) second.
  • user programs get the current time and date from the xtime variable.

  • The kernel also often refers to it, for instance, when updating i-node timestamps.

  • The xtime variable is usually updated once in a tick, that is, roughly 1000 times per The xtime_lock seqlock avoids the race conditions that could occur due to concurrent

  • accesses to the xtime variable. Remember that xtime_lock also protects the jiffies_64 variable; in general, this seqlock is used to define several critical regions of the timekeeping architecture.

  • The timer interrupt is broken into two pieces

    • interrupt handler: Architecture Depedent
    • tick_periodic(): Architecture independent routine

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Timer interrupt

static void tick_periodic(int cpu) {
    if (tick_do_timer_cpu == cpu) {
        write_seqlock(&xtime_lock);
        /* Keep track of the next tick event */
        tick_next_period = ktime_add(tick_next_period, tick_period);
        do_timer(1);
        write_sequnlock(&xtime_lock);
    }
    update_process_times(user_mode(get_irq_regs()));
    profile_tick(CPU_PROFILING);
}

do_timer() is responsible for actually performing the increment to jiffies_64:

void do_timer(unsigned long ticks) {
    jiffies_64 += ticks; update_wall_time();
    calc_global_load();
}
  • update_wall_time(): updates the wall time in accordance with the elapsed ticks
  • calc_global_load(): updates the system’s load average statistics.

Timer

  • sometimes called dynamic timers or kernel timers—are essential for managing the flow of time in kernel code. Kernel code often needs to delay execution of some function until a later time.
  • The given function runs after the timer expires.
    • Timers are not cyclic. The timer is destroyed after it expires
    • Timers are represented by struct timer_list, which is defined in <linux/timer.h>:
struct timer_list {
    struct list_head entry; /* entry in linked list of timers */
    unsigned long expires; /* expiration value, in jiffies */
    void (*function)(unsigned long); /* the timer handler function */
    unsigned long data; /* lone argument to the handler */
    struct tvec_t_base_s *base; /* internal timer field, do not touch */
};


Delaying Execution

  • Small delay

  • for Hz = 100 we can not provide time delay less than 10 ms

  • even for Hz= 1000 we can not provide time delay less than 1 ms

  • so for smaller delay, kernel provides three functions for microsecond, nanosecond, and millisecond delays, defined in <linux/delay.h> and <asm/delay.h>, which do not use jiffies:

    • void udelay(unsigned long usecs)
    • void ndelay(unsigned long nsecs)
    • void mdelay(unsigned long msecs)
  • BogoMips

    • Its name is a contraction of bogus (that is, fake) and MIPS (million of instructions per second).
    • on a 2.4GHz 7300-series Intel Xeon):
    ​​​​Detected 2400.131 MHz processor.
    ​​​​Calibrating delay loop... 4799.56 BogoMIPS
    
    • This value is stored in the loops_per_jiffy variable and is readable from /proc/cpuinfo
  • schedule_timeout()

    • This call puts your task to sleep until at least the specified time has elapsed. When the specified time has elapsed, the kernel wakes the task up and places it back on the runqueue.
/* set task’s state to interruptible sleep */
set_current_state(TASK_INTERRUPTIBLE);
/* take a nap and wake up in “s” seconds */
schedule_timeout(s * HZ);
  • Sleeping on a Wait Queue, with a Timeout
    • Sometimes it is desirable to wait for a specific event or wait for a specified time to elapse—whichever comes first.

schedule_timeout


參照 Time, Delays, and Deferred Work

The Kernel Latency

待整理