# 2021-03-31 / 15:00
[#107](https://github.com/rust-lang/wg-async-foundations/issues/107)
##
* context
* got folks doing large distributed systems
* scientific computing
* projects
* studying fancy physics stuff
* solving partial differential equations on multicore, distributed systems
* large data, high perf, parallelization
* correctness and obviousness
* high scientific integrity
* determinism is really imp't
* lots of code maintained by astrophysicists, not professional programmers, relatively little support
* ingesting a high volume stream of events, parsing it, storing it into a large database
* or doing queries through a database and having to serve that up
* don't need to impl your own distributed algorithms but does require a lot of concurrency to keep up
* implemented in Go
* runtime with goroutines etc was really nice
* but felt unclear how this mapped to hardware
* would memory be in stack? heap?
* hard to predict performance
* but it did largely work well in practice
* implemented in F#
* since ported to C#, Python
* workflows
* used "async workflows" at C#
## async workflows
https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/asynchronous-workflows
```f#
let urlList = [ "Microsoft.com", "http://www.microsoft.com/"
"MSDN", "http://msdn.microsoft.com/"
"Bing", "http://www.bing.com"
]
let fetchAsync(name, url:string) =
async {
try
let uri = new System.Uri(url)
let webClient = new WebClient()
let! html = webClient.AsyncDownloadString(uri)
printfn "Read %d characters for %s" html.Length name
with
| ex -> printfn "%s" (ex.Message);
}
let runAll() =
urlList
|> Seq.map fetchAsync
|> Async.Parallel
|> Async.RunSynchronously
|> ignore
```
* big challenge
* interaction between many parallel tasks and one or two mgmt tasks
* if you are writing a system reading from a kafka server
* reading messages off
* need to keep track when you've fully processed a message
* so you know if there is a fault, how far you must retry
* needs a coordinator tracking the state of ongoing processing
* sometimes blocking or stopping concurrent tasks
* to allow you to commit an offset
> So how we implemented the kafka client in Go was roughly: we had a goroutine which read messages from kafka, recorded the incoming message into a ring buffer with an “Inflight” state, then posted the message to a channel. Downstream go routines processed the message then posted the Success/Fail status to a channel and the Kafka client layer would read those messages and update the state in the Ring buffer.
>
> The ring buffer provided a max number of in flight messages and if that was hit then it would stop reading from Kafka
>
> The ring buffer would then take the largest contiguous block of Success messages and commit the offset.
## notes on physics applications
* https://www.github.com/clemson-cal
* data pattern
* not as simple as mapping across all tasks
* light dag in rounds
* kind of browsed code
* found smol, tokio
* first tried message passing and channels
* launched 1 thread per task
* had them sending synchronous messages
* for students learning and coming into this system
* background: fortran, python
* learning curve was steep
* still "fighting" the language (~5 months in)
* async not really the source of problems
* worked on error handling
* plumbed errors through the async modules
* took a bit to grok how things were working
* figuring out how to work with `Ok` and `Err` etc
* used to just throwing exceptions in C++
* read "The Book"
* key challenge
* how to share the futures so that arbitrary other tasks can pull things out of a matrix and await them
* use `to_shared`
*
## story
Erich will take a stab at writing this PR.
* Character:
* Barbara, early in her career
* Outline:
* Started with launching a bunch of threads
* Perf was subpar
* Explored tokio and async
* Replaced thread with runtime.spawn
* Used async futures and `.await` to create dependency graphs between tasks
* But realized you can't await more than once and you needed access to random stages
* made a [map](https://github.com/clemson-cal/app-kilonova/blob/eb38c6bb66d780e51954a52ddd469f14f08a0e16/src/scheme.rs#L32-L39) but realized cloning it was expensive
* introduced `Arc` to figure that out
* used `shared`
* don't love all `.map` and `.unwrap` calls, noisy
* Initially when hitting errors just had panics out of convenience
* If a background thread panics, it crashes all the other threads, so there was this concaphony of stacktraces
* People were slack messages pages full of errors
* Converted to `Result` because knew that was the right way
* didn't know you could collect a `Iterator<Item = Result>` into a `Result<Collection<Item>>`
* tasked Grace with it
* took 3 or 4 days of banging her head against it to thread all the types, get `?` in the right places, etc
* Things scale well excellently but
* compilation time is a problem on simpler laptops
* would like 2 or 3s, but 30s to a minute is common
* could have quite a lot to do with async
* the student Niklaus can't really hack on the core parts of the system
* took 5 months to get up to speed
* initial investment was non-trivial
* What are the morals of the story?
* Writing async code to do DAGs of computation is doable, but the pattern is not necessarily obvious and takes some tinkering to discover.
* The libraries are tuned for I/O and may not be optimal.
* Error handling and fault handling was not obvious, panics in particular were difficult to manage.
* Using async Rust brings a fair amount of "non-essential" complexity to the problem.
* What are the sources for this story?
* Covers the experiences of Jonathan Zrake's group at Clemsen.
* Why did you choose NAME to tell this story?
* Fit the backgrounds. Barbara isn't a perfect hit but we can't have two Graces, and jonathan knew some Rust.
* How would this story have played out differently for the other characters?
* ..
* Would Rayon cover this case?
* Not quite, Rayon doesn't support arbitrary DAGs. You could do a full map pattern but you would get uneven CPU utilization towards the end of a phase.