HackMD
    • Sharing Link copied
    • /edit
    • View mode
      • Edit mode
      • View mode
      • Book mode
      • Slide mode
      Edit mode View mode Book mode Slide mode
    • Customize slides
    • Note Permission
    • Read
      • Owners
      • Signed-in users
      • Everyone
      Owners Signed-in users Everyone
    • Write
      • Owners
      • Signed-in users
      • Everyone
      Owners Signed-in users Everyone
    • Commenting & Invitee
    • Publishing
      Please check the box to agree to the Community Guidelines.
      Everyone on the web can find and read all notes of this public team.
      After the note is published, everyone on the web can find and read this note.
      See all published notes on profile page.
    • Commenting Enable
      Disabled Forbidden Owners Signed-in users Everyone
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Invitee
    • No invitee
    • Options
    • Versions and GitHub Sync
    • Transfer ownership
    • Delete this note
    • Note settings
    • Template
    • Insert from template
    • Export
    • Dropbox
    • Google Drive Export to Google Drive
    • Gist
    • Import
    • Dropbox
    • Google Drive Import from Google Drive
    • Gist
    • Clipboard
    • Download
    • Markdown
    • HTML
    • Raw HTML
Menu Note settings Sharing Help
Menu
Options
Versions and GitHub Sync Transfer ownership Delete this note
Export
Dropbox Google Drive Export to Google Drive Gist
Import
Dropbox Google Drive Import from Google Drive Gist Clipboard
Download
Markdown HTML Raw HTML
Back
Sharing
Sharing Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Comment & Invitee
Publishing
Please check the box to agree to the Community Guidelines.
Everyone on the web can find and read all notes of this public team.
After the note is published, everyone on the web can find and read this note.
See all published notes on profile page.
More (Comment, Invitee)
Commenting Enable
Disabled Forbidden Owners Signed-in users Everyone
Permission
Owners
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Invitee
No invitee
   owned this note    owned this note      
Published Linked with GitHub
Like1 BookmarkBookmarked
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# lua: system-wide profiler * **Status**: In progress * **Start date**: 02-07-2021 * **Authors**: Mikhail Shishatskiy @Shishqa m.shishatskiy@tarantool.org, Maxim Kokryashkin @fckxorg m.kokryashkin@tarantool.org * **Issues**: [#781](https://github.com/tarantool/tarantool/issues/781), [#4001](https://github.com/tarantool/tarantool/issues/4001) ## Summary The document describes the platform profiler for LuaJIT. It is needed to obtain a complete view of platform performance. Existing LuaJIT profiler only able to give you information about virtual machine states and guest stack. Hence, the document proposes to provide a new sampling profiler, so it will be able to gather stack traces from both C and Lua. ## Background and motivation Currently, available options for profiling LuaJIT are not fine enough to get an understanding of performance. For example, perf only able to show host stack, so all the Lua calls are seen as single pcall. Oppositely, jit.p module provided with LuaJIT is not able to give any information about the host stack. To get a detailed perspective of the platform performance, a more advanced profiler is needed. The desired profiler must be able to capture both guest and host stacks simultaneously, along with virtual machine states. Assume we have the following chunks of code: ```c= // file c_payload.c // fibonnacci number double c_payload(double n) { if (n <= 1) { return n; } return c_payload(n - 1) + c_payload(n - 2); } ``` ```lua= -- file lua_payload.lua local jit = require("jit") local ffi = require("ffi") local os = require("os") ------------------------------------------------ -- fibonacci number function lua_payload(n) if n <= 1 then return n end return lua_payload(n - 1) + lua_payload(n - 2) end -- mind project-specific package location -- local c = ffi.load("c_payload") ffi.cdef[[ double c_payload(double n); ]] ------------------------------------------------ local start_time = os.clock() local benchmark_time = 10 jit.off() jit.flush() local i = 0 while os.clock() - start_time < benchmark_time do if i % 2 == 0 then c.c_payload(20) else lua_payload(20) end i = i + 1 end ``` The `c_payload.c` code compiled into a shared library and loaded by LuaJIT via the FFI. To get the difference, you can take a look at the flamegraphs (learn more about [flamegraphs](https://www.brendangregg.com/flamegraphs.html)) for this code generated by perf, jit.p, and PoC for the proposed profiler below. ### perf ![perf](https://i.imgur.com/DlKbFpo.png) ### jit.p ![jit.p](https://i.imgur.com/sDZZDZx.png) ### sysprof ![sysprof](https://i.imgur.com/Yf80MDE.png) As you can see, the sysprof's output looks like merged flamegraphs produced by perf and jit.p. ## Detailed design The proposed approach is to reuse the backend implemented in the existing `jit.p` profiler and add host stack capturing on top of it. It is important, that we create a different API for the new profiler, so the user can choose, which one is preferable for a particular case. But only one profiler can be used at a time. ### Prerequisites Firstly, it is necessary to make timer machinery in `lj_profile.c` self-reliant by introducing `lj_profile_timer` structure and start/stop interface for it. This makes the timer useful for not only for jit.p but for sysprof too. Another thing to do is to fix an invalid vmstate during stack resize. The issue itself is similar to the `tarantool/tarantool#5842`. ### Host stack The default sampling profiler implementation in LuaJIT, which can be seen [here](https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c), follows this flow: ``` luaJIT_profile_start --> profile_timer_start ... |lock VM state [signal emmited] --> profile_signal_trigger: __|prepare args for a callback |schedule callback execution |unlock VM state ... luaJIT_profile_stop --> profile_timer_stop ``` Callback, which is scheduled by `profile_signal_trigger` can be used to dump needed information, including VM stack. However, the mechanism of VM hooks is used for a callback, so it executes later than the signal was emitted. That is why even though the guest stack is still the same by the time when a callback is executed, the host stack has been changed already, so the final stack dump can not be considered valid. Hence, to get a valid final snapshot of both stacks, a dump should be done right at the signal, like [there](https://github.com/Shishqa/luajit/blob/c0da971640512696f5c166e8f2dc1ed982a8f451/src/profile/sysprof.c#L63). The host stack can be dumped with `backtrace(3)`. However, for complex stack layouts (e.g. fiber stacks in Tarantool), it may not be enough, so one can change default backtracing function to another. ### VM stack Depends on the VM state when signal occured, we consider guest stack valid or not. For example, we can not dump Lua stack, when VM is in the INTERP state because stack pointers can be inconsistent. We are using an implementation similar to the one, which is used in [lj_debug_dumpstack](https://github.com/tarantool/luajit/blob/af889e4608e6eca495dd85e6161d8bcd7d3628e6/src/lj_debug.c#L580) to dump guest stack. But there is a problem with that because sometimes the VM stack can be invalid, thanks to this [bug](https://github.com/tarantool/luajit/blob/af889e4608e6eca495dd85e6161d8bcd7d3628e6/src/vm_x64.dasc#L4594). As you can see down the link, VM state changes to LFUNC, and after that stack reallocation takes place. So if our signal takes place in between, we will get a segmentation fault. Anyway, that issue is easy to fix, so this approach is suitable. ### Traces If JIT is enabled while the profiler is running, traces are recorded as well. For now, we report only a traceno and info about a proto, where trace recording started. ### Event streaming The event stream has the following format: ``` Event stream format: stream := symtab sysprof symtab := see symtab description sysprof := prologue sample* epilogue prologue := 'l' 'j' 'p' version reserved version := <BYTE> reserved := <BYTE> <BYTE> <BYTE> sample := sample-guest | sample-host | sample-trace sample-guest := sample-header stack-lua stack-host sample-host := sample-header stack-host sample-trace := sample-header traceno sym-addr line-no sample-header := <BYTE> stack-lua := frame-lua* frame-lua-last stack-host := frame-host* frame-host-last frame-lua := frame-lfunc | frame-cfunc | frame-ffunc frame-lfunc := frame-header sym-addr line-no frame-cfunc := frame-header exec-addr frame-ffunc := frame-header ffid frame-lua-last := frame-header frame-header := <BYTE> frame-host := exec-addr frame-host-last := <ULEB128> line-no := <ULEB128> traceno := <ULEB128> ffid := <ULEB128> sym-addr := <ULEB128> exec-addr := <ULEB128> epilogue := sample-header <BYTE> : A single byte (no surprises here) <ULEB128>: Unsigned integer represented in ULEB128 encoding (Order of bits below is hi -> lo) version: [VVVVVVVV] * VVVVVVVV: Byte interpreted as a plain integer version number sample-header: [FUUUUEEE] * EEE : 3 bits for representing vmstate (LJ_VMST_*) * UUUU : 4 unused bits * F : 0 for regular samples, 1 for epilogue's Final header (if F is set to 1, all other bits are currently ignored) frame-header: [FUUUUUEE] * EE : 2 bits for representing frame type (FRAME_*) * UUUUU : 5 unused bits * F : 0 for regular frames, 1 for final frame (if F is set to 1, all other bits are currently ignored) frame-host-last = NULL ``` ### Symtab It is a heavy task to dump names of functions every time, so instead, we will dump a symbol table in the beginning. Later on, it will be sufficient to dump only a function's address. However, some functions can be loaded and unloaded several times, and their addresses will be different each time. Hence, we will update the symbol table accordingly. To carry out the symtab update, we will drop in new symtab record into the main event stream. A symbol table looks like this (the same format as symtab in memprof): <!-- ``` 1 byte 8 bytes 8 bytes _______________________________________________________________ | type | address of function | function name | first line number| --------------------------------------------------------------- ``` --> ``` symtab := prologue sym* prologue := 'l' 'j' 's' version reserved version := <BYTE> reserved := <BYTE> <BYTE> <BYTE> sym := sym-lua | sym-final sym-lua := sym-header sym-addr sym-chunk sym-line sym-header := <BYTE> sym-addr := <ULEB128> sym-chunk := string sym-line := <ULEB128> sym-final := sym-header string := string-len string-payload string-len := <ULEB128> string-payload := <BYTE> {string-len} <BYTE> : A single byte (no surprises here) <ULEB128>: Unsigned integer represented in ULEB128 encoding (Order of bits below is hi -> lo) version: [VVVVVVVV] * VVVVVVVV: Byte interpreted as a plain numeric version number sym-header: [FUUUUUTT] * TT : 2 bits for representing symbol type * UUUUU : 5 unused bits * F : 1 bit marking the end of the symtab (final symbol) ``` ## API For the system profiler both C API and Lua API are implemented. The C API for the sysprof is described in the <lmisclib.h> header. The C API looks like this: ```c= /* Configure the proflier. */ LUAMISC_API int luaM_sysprof_configure(const struct luam_sysprof_config *config); /* Start profiling. */ LUAMISC_API int luaM_sysprof_start(lua_State *L, const struct luam_sysprof_options *opt); /* Stop profiling. */ LUAMISC_API int luaM_sysprof_stop(lua_State *L); /* Get counters for each VM state. */ LUAMISC_API int luaM_sysprof_report(struct luam_sysprof_counters *counters); ``` The profiler can work in three modes: ```c= enum luam_sysprof_mode { /* ** DEFAULT mode collects only data for luam_sysprof_counters, which is stored ** in memory and can be collected with luaM_sysprof_report after profiler ** stops. */ LUAM_SYSPROF_DEFAULT, /* ** LEAF mode = DEFAULT + streams samples with only top frames of host and ** guests stacks in format described above. */ LUAM_SYSPROF_LEAF, /* ** CALLGRAPH mode = DEFAULT + streams samples with full callchains of host ** and guest stacks in format described above. */ LUAM_SYSPROF_CALLGRAPH }; ``` Available profiler options are stored in the `luam_sysprof_options` structure. ```c= /* Profiler options */ struct luam_sysprof_options { /* Profiling mode */ enum luam_sysprof_mode mode; /* Sampling interval in msec */ uint64_t interval; /* Custom buffer to write data. */ uint8_t *buf; /* The buffer's size. */ size_t len; /* Context for the profile writer and final callback. */ void *ctx; }; ``` As you can see, writer backend is customizable, and our implementation does not lock one at the C-API level. Backtracing function, frame writer and on-stop callback is a one more degree of freedom: ```c= struct luam_sysprof_config { /* ** Writer function for profile events. ** Should return amount of written bytes on success or zero in case of error. ** Setting *data to NULL means end of profiling. ** For details see <lj_wbuf.h>. */ size_t (*writer)(const void **data, size_t len, void *ctx); /* ** Callback on profiler stopping. Required for correctly cleaning ** at VM finalization when profiler is still running. ** Returns zero on success. */ int (*on_stop)(void *ctx, uint8_t *buf); /* ** Backtracing function for the host stack. Should call `frame_writer` on ** each frame in the stack in the order from the stack top to the stack ** bottom. The `frame_writer` function is implemented inside the sysprof ** and will be passed to the `backtracer` function. If `frame_writer` returns ** NULL, backtracing should be stopped. If `frame_writer` returns not NULL, ** the backtracing should be continued if there are frames left. */ void (*backtracer)(void *(*frame_writer)(int frame_no, void *addr)); }; ``` One can replace backtracing function by passing properly filled `luam_sysprof_config` structure to `luaM_sysprof_configure` function. To start the profiler one needs to fill the `luam_sysprof_options` with desired values and pass it to the `luaM_sysprof_start` function. In case of successful start, it returns `SYSPROF_SUCCESS`. To stop the profiler, one should call `luaM_sysprof_stop` function, which will return `SYSPROF_SUCCESS` in case of successful profiling finalization. Starting profiler from Lua is simple: ```lua local started, err, errno = misc.sysprof.start{ mode = profiling_mode, interval = sampling_interval, path = fname, } ``` where `profiling_mode` is one of the following: ``` "D" - DEFAULT mode "L" - LEAF mode "C" - CALLGRAPH mode ``` In the `DEFAULT` mode, only the counters for each state will be collected. In the `LEAF` mode the sysprof provides you with information about a top stack frame for each sample. And in the `CALLCHAIN` mode, the profiler gives you a complete call stack for each sample. Take a look at the examples below to understand the difference. | ![](https://i.imgur.com/4J9ItbN.png) | |:---:| |`DEFAULT` mode | | ![Leaf mode](https://i.imgur.com/ALIVNn1.png) | |:--:| |`LEAF` mode| | ![](https://i.imgur.com/X0nL83r.png) | | :--: | | `CALLCHAIN` mode | `sampling_interval` is the sampling period in milliseconds. `fname` is name of the file where profile events are written. Writer for this function perform `fwrite()` for each call retrying in case of EINTR. When the profiling is stopped `fclose()` is called. The profiler's function's contract is similar to standard io.* interfaces. If it is impossible to open a file for writing or profiler fails to start, nil is returned on failure (plus an error message as a second result and a system-dependent error code as a third result). Otherwise, returns true value. Stopping profiler from Lua is simple too: ``` local stopped, err, errno = misc.sysprof.stop() ``` If there is any error occurred at profiling stopping (an error when file descriptor was closed) `sysprof.stop()` returns `nil` (plus an error message as a second result and a system-dependent error code as a third result). Returns true otherwise. If you want to build LuaJIT without the Lua and platform profiler, you should build it with `-DLUAJIT_DISABLE_SYSPROF=ON`. If it is disabled `misc.sysprof.start()` and `misc.sysprof.stop()` always return false. ### Retrieving the results Profiler's binary output can be parsed with the `luajit-parse-sysprof` utility: ``` luajit-parse-sysprof - parser of the profile collected with LuaJIT's sysprof. SYNOPSIS luajit-parse-sysprof [options] sysprof.bin Supported options are: --help Show this help and exit --split Split callchains by vmstate ``` Or, alternatively, you can do it like this: ``` tarantool -e 'require("sysprof")(arg)' - sysprof_output.bin ``` It produces another file, which can be read by `flamegraph.pl` utility to generate a flamegraph image like this ![sysprof](https://i.imgur.com/Yf80MDE.png) ## Rationale and alternatives Another way to implement such a thing is to make perf to see guest stack. To do so, we need to map virtual machine symbols (and that functionality is present in LuaJIT ([link](https://github.com/tarantool/luajit/blob/d4e12d7ac28e3bc857d30971dd77deec66a67297/src/lj_trace.c#L96))) and do something so perf could unwind the virtual machine stack. Stack unwinding from outside of the LuaJIT is the problem we didn’t manage to solve for today. There are different approaches to do this: - *Save rsp register value to rbp and preserve rbp.* However, LuaJIT uses rbp as a general-purpose register, and it is hard not to break everything trying to use it only for stack frames. - *Coordinated work of `jit.p` and perf.* This approach requires modifying perf the way it will send LuaJIT suspension signal, and after getting info about the host stack, it will receive information about the guest stack and join them. This solution is quite possible, but modified perf doesn't seem like a production-ready solution. - *Dwarf unwinding*

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lost their connection.

Create a note from template

Create a note from template

Oops...
This template is not available.


Upgrade

All
  • All
  • Team
No template found.

Create custom template


Upgrade

Delete template

Do you really want to delete this template?

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Tutorials

Book Mode Tutorial

Slide Mode Tutorial

YAML Metadata

Contacts

Facebook

Twitter

Discord

Feedback

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions

Versions and GitHub Sync

Sign in to link this note to GitHub Learn more
This note is not linked with GitHub Learn more
 
Add badge Pull Push GitHub Link Settings
Upgrade now

Version named by    

More Less
  • Edit
  • Delete

Note content is identical to the latest version.
Compare with
    Choose a version
    No search result
    Version not found

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub

      Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

       Sign in to GitHub

      HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Available push count

      Upgrade

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Upgrade

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully