or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Better generated Rust code - Folkert and Gary
Folkert: Pitching a language feature that gets better code generation via LLVM, related to branches, getting jump tables rather than direct jumps
Gary: Better code generation based on CPU features. glibc uses IFUNC for this, resolving to different function at load time based on CPU features. Simplifies runtime detection, effectively patches up all the call sites via the global offset table (GOT). Kernel does this via patching the call sites so it's more efficient
Also see external functions/etc by Mara. Could use those at the linking level. IFUNC is load-time level. Might also want things at runtime for dynamic switching.
MUSL doesn't support IFUNC.
Runtime detection and switching might be useful for the kernel, based on whether SSE/AVX/vector state has been saved off for the current userspace process or not.
Another approach for SIMD: a ZST that represents what CPU features you have, having an instance of it guarantees that you know what features you can use. Focused on ergonomics.
tracepoints (e.g.
tracing
) rarely get updated; you readRUST_LOG
once at startup and then you know what trace points are enabled. But tracepoints still check an atomic. It's perfectly predictable by the branch predictor but still needs a cache line and a BTB entry. Might be faster if we can use code patching.Would be useful in userspace, not just the kernel.
C folks don't always consider changing the language.
Can we change the language for this?
Start out with a crate on crates.io for simple static
if
, if it works out well and has good performanceOpen PR for proper tail calls. What is the blocker here?
How much optimization can we guarantee, vs best-effort / quality-of-implementation (QoI)?
Bikeshedding on how to spell it (e.g.
#[tail] return
vsbecome
vs other possibilities).Need to match the signature.
Could we make the tail call change just change drop order so that it's possible to optimize?
Should we question whether Rust's default ABI should be callee-cleanup or caller-cleanup? We don't support varargs for the Rust ABI (and if we do they may be via generics and monomorphization). We should try the experiment. If it's not better everywhere, we could perhaps let you opt into it for functions that want TCO?
Amanieu: On platforms where arguments are in registers you might not need to match the signature if the differences are in arguments passed in registers.
MIR optimizations?
Doesn't have things like jump-threading.
Gary: Some optimization blockers. Taking the address of something can block many optimizations?
Dropping something may prevent some optimizations of it, because drop takes
&mut self
.Folkert: If you're above a certain size MIR optimization bails. Not sure why the threshold is there.
If you set x to 1 and then
match x
it doesn't jump directly to the branch for x being 1.Need more people working on MIR optimizations. Including those LLVM already does.
Folkert: Trifecta might be interested in working on this?
Could be smarter about how we do MIR opts. Similar to cranelift. Cranelift does optimization via rewrite rules. "Equality saturation" to reduce the size of the tree, and do all the optimization possible with a fixed amount of time/fuel. Doesn't require picking a pass ordering.
Amanieu: it's different than equality saturation.
Amanieu: Probably not helpful to do in the Rust compiler. Primarily works with pure operations, not side effects (e.g. memory loads/stores).
The way that Rust references functions is not optimal. It assumes it might come from a shared object, and generates longer instructions than it needs to. Ends up being longer instructions. We should mark all our internal symbols as having hidden visibility. We should review symbol visibility and see if we're doing the right thing. Sounds like we might not be.
Symbol references across crates.
TLS model is very pessimistic by default. Makes rayon much slower (
tls_get_addr
very high in profiles).Many cases where LLVM isn't doing great codegen for Rust, and it needs fixing on the LLVM side. Not enough people working in that area.
Which companies are working on both LLVM and Rust? Could we talk to them about working on LLVM to improve Rust codegen?