HackMD
  • API
    API  HackMD API beta testing
    HackMD API is now in beta, join us for a test drive.
    Getting started Got it
    • API  HackMD API beta testing
      API  HackMD API beta testing
      HackMD API is now in beta, join us for a test drive.
      Getting started Got it
      • Options
      • Versions and GitHub Sync
      • Transfer ownership
      • Delete this note
      • Template
      • Insert from template
      • Export
      • Dropbox
      • Google Drive
      • Gist
      • Import
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
      • Download
      • Markdown
      • HTML
      • Raw HTML
      • ODF (Beta)
      • Sharing Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • More (Comment, Invitee)
      • Publishing
        Everyone on the web can find and read all notes of this public team.
        After the note is published, everyone on the web can find and read this note.
        See all published notes on profile page.
      • Commenting Enable
        Disabled Forbidden Owners Signed-in users Everyone
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Invitee
      • No invitee
    Menu Sharing Help
    Menu
    Options
    Versions and GitHub Sync Transfer ownership Delete this note
    Export
    Dropbox Google Drive Gist
    Import
    Dropbox Google Drive Gist Clipboard
    Download
    Markdown HTML Raw HTML ODF (Beta)
    Back
    Sharing
    Sharing Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Note Permission
    Read
    Owners
    • Owners
    • Signed-in users
    • Everyone
    Owners Signed-in users Everyone
    Write
    Owners
    • Owners
    • Signed-in users
    • Everyone
    Owners Signed-in users Everyone
    More (Comment, Invitee)
    Publishing
    Everyone on the web can find and read all notes of this public team.
    After the note is published, everyone on the web can find and read this note.
    See all published notes on profile page.
    More (Comment, Invitee)
    Commenting Enable
    Disabled Forbidden Owners Signed-in users Everyone
    Permission
    Owners
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Invitee
    No invitee
       owned this note    owned this note      
    Published Linked with GitHub
    Like1 BookmarkBookmarked
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # lua: system-wide profiler * **Status**: In progress * **Start date**: 02-07-2021 * **Authors**: Mikhail Shishatskiy @Shishqa m.shishatskiy@tarantool.org, Maxim Kokryashkin @fckxorg m.kokryashkin@tarantool.org * **Issues**: [#781](https://github.com/tarantool/tarantool/issues/781), [#4001](https://github.com/tarantool/tarantool/issues/4001) ## Summary The document describes the platform profiler for LuaJIT. It is needed to obtain a complete view of platform performance. Existing LuaJIT profiler only able to give you information about virtual machine states and guest stack. Hence, the document proposes to provide a new sampling profiler, so it will be able to gather stack traces from both C and Lua. ## Background and motivation Currently, available options for profiling LuaJIT are not fine enough to get an understanding of performance. For example, perf only able to show host stack, so all the Lua calls are seen as single pcall. Oppositely, jit.p module provided with LuaJIT is not able to give any information about the host stack. To get a detailed perspective of the platform performance, a more advanced profiler is needed. The desired profiler must be able to capture both guest and host stacks simultaneously, along with virtual machine states. Assume we have the following chunks of code: ```c= // file c_payload.c // fibonnacci number double c_payload(double n) { if (n <= 1) { return n; } return c_payload(n - 1) + c_payload(n - 2); } ``` ```lua= -- file lua_payload.lua local jit = require("jit") local ffi = require("ffi") local os = require("os") ------------------------------------------------ -- fibonacci number function lua_payload(n) if n <= 1 then return n end return lua_payload(n - 1) + lua_payload(n - 2) end -- mind project-specific package location -- local c = ffi.load("c_payload") ffi.cdef[[ double c_payload(double n); ]] ------------------------------------------------ local start_time = os.clock() local benchmark_time = 10 jit.off() jit.flush() local i = 0 while os.clock() - start_time < benchmark_time do if i % 2 == 0 then c.c_payload(20) else lua_payload(20) end i = i + 1 end ``` The `c_payload.c` code compiled into a shared library and loaded by LuaJIT via the FFI. To get the difference, you can take a look at the flamegraphs (learn more about [flamegraphs](https://www.brendangregg.com/flamegraphs.html)) for this code generated by perf, jit.p, and PoC for the proposed profiler below. ### perf ![perf](https://i.imgur.com/DlKbFpo.png) ### jit.p ![jit.p](https://i.imgur.com/sDZZDZx.png) ### sysprof ![sysprof](https://i.imgur.com/Yf80MDE.png) As you can see, the sysprof's output looks like merged flamegraphs produced by perf and jit.p. ## Detailed design The proposed approach is to reuse the backend implemented in the existing `jit.p` profiler and add host stack capturing on top of it. It is important, that we create a different API for the new profiler, so the user can choose, which one is preferable for a particular case. But only one profiler can be used at a time. ### Prerequisites Firstly, it is necessary to make timer machinery in `lj_profile.c` self-reliant by introducing `lj_profile_timer` structure and start/stop interface for it. This makes the timer useful for not only for jit.p but for sysprof too. Another thing to do is to fix an invalid vmstate during stack resize. The issue itself is similar to the `tarantool/tarantool#5842`. ### Host stack The default sampling profiler implementation in LuaJIT, which can be seen [here](https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c), follows this flow: ``` luaJIT_profile_start --> profile_timer_start ... |lock VM state [signal emmited] --> profile_signal_trigger: __|prepare args for a callback |schedule callback execution |unlock VM state ... luaJIT_profile_stop --> profile_timer_stop ``` Callback, which is scheduled by `profile_signal_trigger` can be used to dump needed information, including VM stack. However, the mechanism of VM hooks is used for a callback, so it executes later than the signal was emitted. That is why even though the guest stack is still the same by the time when a callback is executed, the host stack has been changed already, so the final stack dump can not be considered valid. Hence, to get a valid final snapshot of both stacks, a dump should be done right at the signal, like [there](https://github.com/Shishqa/luajit/blob/c0da971640512696f5c166e8f2dc1ed982a8f451/src/profile/sysprof.c#L63). The host stack can be dumped with `backtrace(3)`. However, for complex stack layouts (e.g. fiber stacks in Tarantool), it may not be enough, so one can change default backtracing function to another. ### VM stack Depends on the VM state when signal occured, we consider guest stack valid or not. For example, we can not dump Lua stack, when VM is in the INTERP state because stack pointers can be inconsistent. We are using an implementation similar to the one, which is used in [lj_debug_dumpstack](https://github.com/tarantool/luajit/blob/af889e4608e6eca495dd85e6161d8bcd7d3628e6/src/lj_debug.c#L580) to dump guest stack. But there is a problem with that because sometimes the VM stack can be invalid, thanks to this [bug](https://github.com/tarantool/luajit/blob/af889e4608e6eca495dd85e6161d8bcd7d3628e6/src/vm_x64.dasc#L4594). As you can see down the link, VM state changes to LFUNC, and after that stack reallocation takes place. So if our signal takes place in between, we will get a segmentation fault. Anyway, that issue is easy to fix, so this approach is suitable. ### Traces If JIT is enabled while the profiler is running, traces are recorded as well. For now, we report only a traceno and info about a proto, where trace recording started. ### Event streaming The event stream has the following format: ``` Event stream format: stream := symtab sysprof symtab := see symtab description sysprof := prologue sample* epilogue prologue := 'l' 'j' 'p' version reserved version := <BYTE> reserved := <BYTE> <BYTE> <BYTE> sample := sample-guest | sample-host | sample-trace sample-guest := sample-header stack-lua stack-host sample-host := sample-header stack-host sample-trace := sample-header traceno sym-addr line-no sample-header := <BYTE> stack-lua := frame-lua* frame-lua-last stack-host := frame-host* frame-host-last frame-lua := frame-lfunc | frame-cfunc | frame-ffunc frame-lfunc := frame-header sym-addr line-no frame-cfunc := frame-header exec-addr frame-ffunc := frame-header ffid frame-lua-last := frame-header frame-header := <BYTE> frame-host := exec-addr frame-host-last := <ULEB128> line-no := <ULEB128> traceno := <ULEB128> ffid := <ULEB128> sym-addr := <ULEB128> exec-addr := <ULEB128> epilogue := sample-header <BYTE> : A single byte (no surprises here) <ULEB128>: Unsigned integer represented in ULEB128 encoding (Order of bits below is hi -> lo) version: [VVVVVVVV] * VVVVVVVV: Byte interpreted as a plain integer version number sample-header: [FUUUUEEE] * EEE : 3 bits for representing vmstate (LJ_VMST_*) * UUUU : 4 unused bits * F : 0 for regular samples, 1 for epilogue's Final header (if F is set to 1, all other bits are currently ignored) frame-header: [FUUUUUEE] * EE : 2 bits for representing frame type (FRAME_*) * UUUUU : 5 unused bits * F : 0 for regular frames, 1 for final frame (if F is set to 1, all other bits are currently ignored) frame-host-last = NULL ``` ### Symtab It is a heavy task to dump names of functions every time, so instead, we will dump a symbol table in the beginning. Later on, it will be sufficient to dump only a function's address. However, some functions can be loaded and unloaded several times, and their addresses will be different each time. Hence, we will update the symbol table accordingly. To carry out the symtab update, we will drop in new symtab record into the main event stream. A symbol table looks like this (the same format as symtab in memprof): <!-- ``` 1 byte 8 bytes 8 bytes _______________________________________________________________ | type | address of function | function name | first line number| --------------------------------------------------------------- ``` --> ``` symtab := prologue sym* prologue := 'l' 'j' 's' version reserved version := <BYTE> reserved := <BYTE> <BYTE> <BYTE> sym := sym-lua | sym-final sym-lua := sym-header sym-addr sym-chunk sym-line sym-header := <BYTE> sym-addr := <ULEB128> sym-chunk := string sym-line := <ULEB128> sym-final := sym-header string := string-len string-payload string-len := <ULEB128> string-payload := <BYTE> {string-len} <BYTE> : A single byte (no surprises here) <ULEB128>: Unsigned integer represented in ULEB128 encoding (Order of bits below is hi -> lo) version: [VVVVVVVV] * VVVVVVVV: Byte interpreted as a plain numeric version number sym-header: [FUUUUUTT] * TT : 2 bits for representing symbol type * UUUUU : 5 unused bits * F : 1 bit marking the end of the symtab (final symbol) ``` ## API For the system profiler both C API and Lua API are implemented. The C API for the sysprof is described in the <lmisclib.h> header. The C API looks like this: ```c= /* Configure the proflier. */ LUAMISC_API int luaM_sysprof_configure(const struct luam_sysprof_config *config); /* Start profiling. */ LUAMISC_API int luaM_sysprof_start(lua_State *L, const struct luam_sysprof_options *opt); /* Stop profiling. */ LUAMISC_API int luaM_sysprof_stop(lua_State *L); /* Get counters for each VM state. */ LUAMISC_API int luaM_sysprof_report(struct luam_sysprof_counters *counters); ``` The profiler can work in three modes: ```c= enum luam_sysprof_mode { /* ** DEFAULT mode collects only data for luam_sysprof_counters, which is stored ** in memory and can be collected with luaM_sysprof_report after profiler ** stops. */ LUAM_SYSPROF_DEFAULT, /* ** LEAF mode = DEFAULT + streams samples with only top frames of host and ** guests stacks in format described above. */ LUAM_SYSPROF_LEAF, /* ** CALLGRAPH mode = DEFAULT + streams samples with full callchains of host ** and guest stacks in format described above. */ LUAM_SYSPROF_CALLGRAPH }; ``` Available profiler options are stored in the `luam_sysprof_options` structure. ```c= /* Profiler options */ struct luam_sysprof_options { /* Profiling mode */ enum luam_sysprof_mode mode; /* Sampling interval in msec */ uint64_t interval; /* Custom buffer to write data. */ uint8_t *buf; /* The buffer's size. */ size_t len; /* Context for the profile writer and final callback. */ void *ctx; }; ``` As you can see, writer backend is customizable, and our implementation does not lock one at the C-API level. Backtracing function, frame writer and on-stop callback is a one more degree of freedom: ```c= struct luam_sysprof_config { /* ** Writer function for profile events. ** Should return amount of written bytes on success or zero in case of error. ** Setting *data to NULL means end of profiling. ** For details see <lj_wbuf.h>. */ size_t (*writer)(const void **data, size_t len, void *ctx); /* ** Callback on profiler stopping. Required for correctly cleaning ** at VM finalization when profiler is still running. ** Returns zero on success. */ int (*on_stop)(void *ctx, uint8_t *buf); /* ** Backtracing function for the host stack. Should call `frame_writer` on ** each frame in the stack in the order from the stack top to the stack ** bottom. The `frame_writer` function is implemented inside the sysprof ** and will be passed to the `backtracer` function. If `frame_writer` returns ** NULL, backtracing should be stopped. If `frame_writer` returns not NULL, ** the backtracing should be continued if there are frames left. */ void (*backtracer)(void *(*frame_writer)(int frame_no, void *addr)); }; ``` One can replace backtracing function by passing properly filled `luam_sysprof_config` structure to `luaM_sysprof_configure` function. To start the profiler one needs to fill the `luam_sysprof_options` with desired values and pass it to the `luaM_sysprof_start` function. In case of successful start, it returns `SYSPROF_SUCCESS`. To stop the profiler, one should call `luaM_sysprof_stop` function, which will return `SYSPROF_SUCCESS` in case of successful profiling finalization. Starting profiler from Lua is simple: ```lua local started, err, errno = misc.sysprof.start{ mode = profiling_mode, interval = sampling_interval, path = fname, } ``` where `profiling_mode` is one of the following: ``` "D" - DEFAULT mode "L" - LEAF mode "C" - CALLGRAPH mode ``` In the `DEFAULT` mode, only the counters for each state will be collected. In the `LEAF` mode the sysprof provides you with information about a top stack frame for each sample. And in the `CALLCHAIN` mode, the profiler gives you a complete call stack for each sample. Take a look at the examples below to understand the difference. | ![](https://i.imgur.com/4J9ItbN.png) | |:---:| |`DEFAULT` mode | | ![Leaf mode](https://i.imgur.com/ALIVNn1.png) | |:--:| |`LEAF` mode| | ![](https://i.imgur.com/X0nL83r.png) | | :--: | | `CALLCHAIN` mode | `sampling_interval` is the sampling period in milliseconds. `fname` is name of the file where profile events are written. Writer for this function perform `fwrite()` for each call retrying in case of EINTR. When the profiling is stopped `fclose()` is called. The profiler's function's contract is similar to standard io.* interfaces. If it is impossible to open a file for writing or profiler fails to start, nil is returned on failure (plus an error message as a second result and a system-dependent error code as a third result). Otherwise, returns true value. Stopping profiler from Lua is simple too: ``` local stopped, err, errno = misc.sysprof.stop() ``` If there is any error occurred at profiling stopping (an error when file descriptor was closed) `sysprof.stop()` returns `nil` (plus an error message as a second result and a system-dependent error code as a third result). Returns true otherwise. If you want to build LuaJIT without the Lua and platform profiler, you should build it with `-DLUAJIT_DISABLE_SYSPROF=ON`. If it is disabled `misc.sysprof.start()` and `misc.sysprof.stop()` always return false. ### Retrieving the results Profiler's binary output can be parsed with the `luajit-parse-sysprof` utility: ``` luajit-parse-sysprof - parser of the profile collected with LuaJIT's sysprof. SYNOPSIS luajit-parse-sysprof [options] sysprof.bin Supported options are: --help Show this help and exit --split Split callchains by vmstate ``` Or, alternatively, you can do it like this: ``` tarantool -e 'require("sysprof")(arg)' - sysprof_output.bin ``` It produces another file, which can be read by `flamegraph.pl` utility to generate a flamegraph image like this ![sysprof](https://i.imgur.com/Yf80MDE.png) ## Rationale and alternatives Another way to implement such a thing is to make perf to see guest stack. To do so, we need to map virtual machine symbols (and that functionality is present in LuaJIT ([link](https://github.com/tarantool/luajit/blob/d4e12d7ac28e3bc857d30971dd77deec66a67297/src/lj_trace.c#L96))) and do something so perf could unwind the virtual machine stack. Stack unwinding from outside of the LuaJIT is the problem we didn’t manage to solve for today. There are different approaches to do this: - *Save rsp register value to rbp and preserve rbp.* However, LuaJIT uses rbp as a general-purpose register, and it is hard not to break everything trying to use it only for stack frames. - *Coordinated work of `jit.p` and perf.* This approach requires modifying perf the way it will send LuaJIT suspension signal, and after getting info about the host stack, it will receive information about the guest stack and join them. This solution is quite possible, but modified perf doesn't seem like a production-ready solution. - *Dwarf unwinding*

    Import from clipboard

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lost their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template is not available.


    Upgrade

    All
    • All
    • Team
    No template found.

    Create custom template


    Upgrade

    Delete template

    Do you really want to delete this template?

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in via Google

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Tutorials

    Book Mode Tutorial

    Slide Mode Tutorial

    YAML Metadata

    Contacts

    Facebook

    Twitter

    Feedback

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions

    Versions and GitHub Sync

    Sign in to link this note to GitHub Learn more
    This note is not linked with GitHub Learn more
     
    Add badge Pull Push GitHub Link Settings
    Upgrade now

    Version named by    

    More Less
    • Edit
    • Delete

    Note content is identical to the latest version.
    Compare with
      Choose a version
      No search result
      Version not found

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub

        Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

         Sign in to GitHub

        HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Available push count

        Upgrade

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Upgrade

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully