AWS Rust Platform Team
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Publish Note

      Everyone on the web can find and read all notes of this public team.
      Once published, notes can be searched and viewed by anyone online.
      See published notes
      Please check the box to agree to the Community Guidelines.
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Publish Note

Everyone on the web can find and read all notes of this public team.
Once published, notes can be searched and viewed by anyone online.
See published notes
Please check the box to agree to the Community Guidelines.
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- tags: turbowish --- # TurboWish Development Plan (old; April 1st) ## What this covers A description of the TurboWish software architecture, where the pieces have been selected in a way to best take advantage of different skill sets across our team. ## What this does *not* currently cover There's no schedule outlined here for when the different components will be delivered. I want to discuss the architecture with the team first, before trying to spell out how we get to a finished product using this architecture. ## The TurboWish vision TurboWish gives our team's customers (namely, Rust developers) insight into what their programs are doing, in order to identify performance pitfalls. The most immediate insight we seek to provide: Tools to answer the question "Why aren't my asynchronous tasks being scheduled in the manner I expect." In order to answer this question, we want to provide the developer with an easy to digest view of the program's tasks, associated resources, the relationships between those tasks and resources, and the state of each. * For example, as described in a [user story][], TurboWish can show the developer a directed graph depicting both 1.) what resources a task is waiting on, and 2.) what tasks are responsible for making those resources available for use. [user story]: https://quip-amazon.com/CMYUAq1zIWQm/TurboWish-User-Stories ## The TurboWish Architecture, at a high level TurboWish is broken into three parts: 1. Client Service Instrumentation, which is responsible for emitting events describing the current client status and relevant state transitions made by the client. 2. The Event Collector, which receives the emitted events. The Event Collector builds an internal model of the service from these events, and is able to answer queries about this model. The Event Collector is also able to send introspection requests to the client, which can result in more elaborate descriptions of the client state. 3. A User Interface, which present the Event Collector's model to the developer. Each of these parts is described in more detail below. ## The TurboWish Architecture, diagrammed ```mermaid %% Note: `%%` at line start is a comment. flowchart TD Browser([Web Browser]) TwCollect -.- reqs -.-> TwIntrospect reqs((introspection<br/>requests)) TracingEventBus -.- e1 .-> TwCollect %% e1 is really a dummy node to make the rendering a bit nicer. e1((event<br/>stream)) subgraph Developer [Developer Workstation ____] TwTui <--> TwCollect TwTui([TurboWish Console UI]) Browser <--> TwCollect TwCollect[TurboWish Event Collector] TwCollect --- EventLog EventLog[(EventLog)] end subgraph Client [Client Service ____] TwIntrospect[TurboWish Request Handler] ClientCode[Client App Code] Tokio Rayon TwIntrospect --- Tokio TwIntrospect --- ClientCode ClientCode -.-> TracingEventBus Tokio -.-> TracingEventBus Rayon -.-> TracingEventBus TracingEventBus[Tracing Event Bus] end ``` ## Principles * (Added post meeting): Client instrumentation should add minimal noise to timing measurement * (Added post meeting) * Event emission should not block application progress. - If the user *wants* to see internal state that requires Θ(n) space to represent, then we should deliver it incrementally over a series of Θ(n) events (with each one requiring O(1) delivery effort). * TurboWish is most useful when the full series of events is available to the collector. But: TurboWish should be somewhat useful even if the collector misses a prefix of the event stream. - (In other words, one should be able to connect to a running service and still get utility out of TurboWish, potentially by making use of the introspection functionality to query the current state after the initial connection is made.) * Do not clog the event stream with events that the client does not need. - Some events will be necessary for the Event Collector to maintain an accurate model of the executor state, and will be emitted unconditionally - Other events that track more fine-grain details of service operation are off at the outset and enabled via an opt-in introspection request from the client. ## Architectural Overview ### Client Service Instrumentation For Turbowish to be useful on an async program, one must use an async executor that has TurboWish instrumentation added to it. (Tokio is the executor used for current prototyping, so when you see "the executor", you can just think "Tokio" if you prefer.) Client application code will benefit from adding extra instrumentation to their code as well, but developers should be able to use and benefit from TurboWish without going to extremes adding new instrumentation beyond what the executor has out-of-the-box. Likewise, any linked crate that encapsulates state of interest (such as thread pools in the Rayon crate) may benefit from providing its own TurboWish instrumentation. The added instrumentation takes the form of logs of events. (For examples of the kind of instrumentation I expect us to add to client code and to tokio itself, see "Details: Client Service Instrumentation" below.) These events may include program-internal details such as (partial) stack traces that will include memory addresses of program code. * (We cannot change existing APIs to thread through details like file/line-number info in the style of `#[track_caller]`, so in general this is the only way I expect to be able to infer calling context without putting undue burden on client code. See more discussion in appendix.) #### Who should own developing Client Service Instrumentation The fundamental piece of client service instrumentation is instrumenting the async executor itself. Tokio developers are a natural fit for this effort. However, Rust compiler engineers may be able to provide expert assistance on some of the details like stack backtracing (or finding other solutions to the problem of infering calling context). After we have experience with the instrumented executor, we will be in a better position to evaluate what kinds of instumentation we might request the client code to add in order to make the developer experience delightful. ## Event Collector The Event Collector is responsible for receiving the events emitted from the Client Service Instrumentation, and using them to construct a model of the program's executor, its tasks, and the resources with which those tasks are interacting. The Event Collector needs to be robust: It must efficiently process the stream of events coming from the client service, construct its internal model, and respond to user interface requests (which will usually take the form of queries on its constructed model). The event collector may need to run for a long time, processing many events, in order to monitor live services. This means it needs to avoid memory leaks or other resource consumption issues. (The Event Collector should consider either discarding past events or storing them onto disk, which is why there's a picture of a database in the diagram.) Thus, the Event Collector is designed as an independent program that will run in its own process space and be easily monitored on its own, so that we isolate resource usage issues and can address them (potentially after delivery of Minimum Viable Product, but hopefully *before* such delivery). The Event Collector will also need to access the text of the program and its associated debuginfo (to provide the user with a view of its machine code or source code, or to map any program memory addresses in events to the original calling functions, which are likely to be a much more customer-intelligible label for a calling context). Finally, under the TurboWish architecture as currently envisioned, the Event Collector is also responsible for sending introspection requests (if any) to the client service. * The main motivation for this, rather than having the User Interface send such requests directly, is that if the Event Collector is initially attached to an already running service (and thus only sees a suffix of the event stream), the Event Collector will *already want* to make introspection requests as part of the construction of its internal model of the executor. * However, it is possible that having all such requests go through the Event Collector adds a potential risk (point of failure) that is unwarranted. *Feedback welcome!* #### Who should own developing Event Collector I expect at least some of the development effort on the Event Collector to come from Rust compiler engineers, since they are the people who are in the best position to work with the emitted program text and associated debuginfo. ## User Interface The TurboWish User Interface presents the Event Collector's model to the developer. We will provide two user interfaces at launch time: a terminal console view, which will be optimized around providing a "bare-metal" interaction with the Event Collector, and a web browser interface, which provides a super-set of the features offered by the console (such as the rendered graph described in the user story.) The main idea behind separating this out is that I want the Event Collector to be robust, while the User Interface can be developed in a more haphazard fashion * For example: its okay if the User Interface leaks memory; just restart it, and let it reconnect to the Event Collector. #### Who should own developing User Interface Anyone who wants to. The console view development effort should probably be driven by the same people who own the Event Collector itself, since I expect it to be the quickest way for us to dogfood the Event Collector ourselves. We should probably enlist 3rd party expertise on the web browser interface. Or at least, I don't know whom on the team is a web2.0/dhtml/ajax/whatever expert; I just know that its not my area of expertise *at all*. * I would personally prefer to *not* build a web-server into the Event Collector itself, unless we can do so in a manner where almost no interesting logic is associated with that web server. ## Appendix: Architectural Details ### Details: Client Service Instrumentation The most basic functionality for the task/resource graph user story requires the executor to emit events whenever: * a task is spawned, * a task is dropped, * a waker is created, * a waker is cloned, * a waker is dropped, or * a waker is tranferred from one task to another. Supporting other user stories will likely require tracking other information as well (such as how many pending futures have had `wake` called and are awaiting a call to `poll`). This partly motivated the introspection request channel: The Event Collector can send a request for more detailed information, and that will toggle new event logging paths. The emitted events should include unique identifiers (UID) for any referenced task/wakers. * For values that are themselves boxed or own a heap-allocated value, we should be able to use a memory address as a UID, as long as we also include some sort of timestamp with the events (and the Event Collector will infer when memory is being reused and update its internal model accordingly). * (If we need to track values that do not have a uniquely associated heap values, then we may need to add some sort of unique-id generation for them. So far I haven't seen a need in tokio's types.) The emitted events should also include some notion of the calling context for the event. This calling context should be meaningful from the viewpoint of the Client App Code. * For example, when `<TimerFuture as Future>::poll` calls `cx.waker().clone()`, we want the waker construction event to include (at minimum) that a waker was created from `TimerFuture`, so that one can properly tell what *kind of resource* that `waker` is associated with. * (It would be even better to include enough information in the event for the Event Collector to know *which specific resource* is associated with the waker, rather than just its type. These events may include program-internal details such as (partial) stack traces that will include memory addresses of program code * (We cannot change existing APIs to thread through details like file/line-number info in the style of `#[track_caller]`, so in general this is the only way I expect to be able to infer calling context without putting undue burden on client code.) * More specifically: Based on my understanding of the API's available, providing contextual info about the calling context of `cx.waker().clone()` will require either 1. client instrumentation that sets some thread- or task-local state, or 2. backtracing through the stack to find instruction addresses that the Event Collector can, via debuginfo, map back to the calling context. # Meeting Notes ## Big Picture Feedback Question: Example of Introspect Requests? Answer: e.g. Get list of current thread pool. Or turn on more fine-grained info. Question: What is view from customer? How does customer interact with it? Answer; Have to enable TW in service as a feature. (Maybe its an option, maybe not.) When I have a problem, I launch a separate program that starts collecting the events off the stream. Example Question / User Story: Why am I getting 40% lower throughput than I expect? Question: Don't know how to evaluate the architecture, because I don't know what TurboWish does. Can believe the high-level architecutre: 1. Need to instrument the service, and 2. Need to interpret that instrumentation. Exmaple: Starting up service, HTTP response hangs Question: Two differnt ways of interacting: * Interactive REPL, try to determine what current state is * Post-processing the log events that have already been gathered Observation: There are two problems this is trying to solve * Huge obvious problem, trying to understand behavior. (This architecture seems to resolve that.) * Nuanced peformance issue, where the instrumentation itself adds expense, and thus disrupts the observed performance (and makes the customer mistrusts the tool) Need to separate dev vs production? * If we assume dev-only as an upfront target, then we can e.g. leverage tools like `perf`. Focus on problems seperately * Deadlock debugging, vs * Better perf integration How much do we try to integrate with existing (and/or improve) existing tools * Zipkin, X-ray, perf, ftrace Need more explicit user stories of the experience someone has using this Our value-add can be expressing things in terms of the nouns that people are already using (like "tasks", "channels"), rather than system-wide vocabulary. Aim: Shiny Future Stories by 8 April 2021 Aim: Design Documents for dedicated development efforts by 15 April 2021

Import from clipboard

Paste your webpage below. It will be converted to Markdown.

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template is not available.
Upgrade
All
  • All
  • Team
No template found.

Create custom template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

How to use Slide mode

API Docs

Edit in VSCode

Install browser extension

Get in Touch

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

No updates to save
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully