<h1><p style="font-family: Courier New;letter-spacing: 1px;">METAMORPHOSIS: AKT I</p></h1>
> A case study of how to implement Kotlin's
> [`Grouping`][Grouping] in Rust
<details>
<summary>TOC</summary>
[TOC]
</details>
<h2><p style="font-family: Courier New;letter-spacing: 1px;">PREFACE</p></h2>
This is the first episode of Metamorphosis, a series to explore the Recreational Programming through several unique ways, e.g. Type-Driven Programming. A recreational programming does not bound to any actual limiations, that is, it's basically a programming style where you enjoy when you programs, which could be applied to various fields of programming, e.g. Competitive Programming.
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
AKT I - GROUPING
</p></h2>
Several days ago, I wonder what [`Grouping`][Grouping] implementation would be look like, so I decided to find out on Kotlin's official documentation and its repository, and to fully understand what would it be like to implement in a non-gc and memory-safe enviornment, I decided to implement it in Rust.
To my surprise, this is the definition and initilization of [`Grouping`][Grouping]:
```kt!
public interface Grouping<T, out K> {
/** Returns an [Iterator] over the elements of the source of this grouping. */
public fun sourceIterator(): Iterator<T>
/** Extracts the key of an [element]. */
public fun keyOf(element: T): K
}
public inline fun <T, K> Array<out T>.groupingBy(crossinline keySelector: (T) -> K): Grouping<T, K> {
return object : Grouping<T, K> {
override fun sourceIterator(): Iterator<T> = this@groupingBy.iterator()
override fun keyOf(element: T): K = keySelector(element)
}
}
```
Ok, let's not discuss some Kotlin-specific features such as [`crossinline`][Inline Function] and [`inline`][Inline Function] keyword, but we'll need to understand what this code is about, so let's get start with the interface definition.
As you may noticed, [`Grouping`][Grouping] is defined as neither a [class] or an [abstract class], which means that it could be treated as a [trait], but additionally it indicates the initialization of [`Grouping`][Grouping] could only be [anonymous object] (consider that the official implementation of `Array<out T>.groupingBy`). And yes, this is already a issue in our way since Rust does not have any faeture close to [anonymous object], everything must be initialized out of a [struct], but other than this issue, everything seems just fine at this moment, as you can notice the definition of [`Grouping`][Grouping] only requires 2 functions to be implemented, which are `sourceIterator` and `keyOf`.
So that's all? Of course not, but before we advance furthermore, let's begin to implement our own Grouping in Rust!
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
GET RUSTY!
</p></h2>
```rs!
pub trait Grouping<T, K>
{
fn source_iterator<'a>(&'a self) -> impl Iterator<Item = &'a T>
where
T: 'a;
fn key_of(&self, element: &T) -> K;
}
```
Most stuff here are identical to original implementation, but you might already noticed that `source_iterator` does have some kind of quirky symbol in the function signature. The quirky thing defined between function name and parameter list is called [lifetime], which is used to notify compiler a reference's lifetime for further borrow check invalidation, but that's another topic. In summary, the definition of [lifetime] `'a` is to tell that the Iterator we're returning iterates a reference to value with type `T`, but also, the reference stays alive alongside the [`Grouping`][Grouping] itself. Notice that the reason we use reference instead of owned data to iterate is that we care about the reusability (which will be explained in next section.)
Okay at this point we'll now need to consider the actual implementation for our [trait] [`Grouping`][Grouping], specifically, for the struct that implements it, we'll name it `GroupingImpl` for now.
Ideally, here's some considerations we have take into account for later implementation.
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
THE IMPLEMENTATION OF STRUCT
</p></h2>
In this case study, we tends to match Kotlin's use case (we'll later have another refined version that deviades to Rust's use case). Since Kotlin does not usually creates new value on each invocation, we only need to constraint that the internal underlying data stays immutable. Additionally, in some scenarios, Kotlin tends to reuse a class (notice that this requires the previous prerequisite to be satisfied), so we'll just store a vector of data. Let's define our actual `GroupingImpl` struct first!
```rust!
pub struct GroupingImpl<'ks, T, K>
{
raw: Vec<T>,
key_selector: Box<dyn Fn(&T) -> K + 'ks>,
}
```
Let's break down this code, in order to satisfy trait [`Grouping`][Grouping]'s function `source_iterator`, we have to either guarantee that the iterator is reusable or we can constantly create iterator from somewhere, the former option is not ideal due to the bound requirement of clone-able iterator does also requires its iterating item is also clone-able, which is not always applicable to all types, so we'll take the later option. In the above code, we store a vector, and vector can be constantly converted into [`Iterator`][Iterator].
For the next field `key_selector`, this is used for satisfy trait function `key_selector`, to map a value to key. It's type signature is a little bit complex, so let's starts with the most-inner type `Fn`. A `Fn` type is a closure type, which is dyanmically sized (because the information carried by clsoure is unknown at compile time), so that would require us to store it somewhere (usually in the heap) in order to guarantee the size of `GroupingImpl` could be determined in compile time, that's where [`Box`][Box] type comes to rescue. A [`Box`][Box] stores a unique pointer to the heap, and it's sized, which allows us to store something unsizeable in a struct. But you may also noticed the [lifetime] behind the closure type `'ks`, it's explicitly annotated in order to not to let compiler to determines closure's lifetime, which by default would be `'static` (requires to be alive alongside the program itself), so this relax's our closure's lifetime contraint.
In order to make our actual implementor struct met the requirement of [`Grouping`][Grouping] trait, we'll also need to implement all the required functions (`source_iterator` and `key_of`), see below:
```rust!
impl<'ks, T, K> Grouping<T, K> for GroupingImpl<'ks, T, K>
{
fn source_iterator<'src>(&'src self) -> impl Iterator<Item = &'src T>
where
T: 'src,
{
(&self.raw).into_iter()
}
fn key_of(&self, element: &T) -> K {
self.key_selector.as_ref()(element)
}
}
```
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
CONV/ENIENT CONV/ERSION
</p></h2>
Keep in mind, Rust does not have any kind of [extension function] like Kotlin or C#, so that means we'll have to find another way to conveniently converts a data sequence into [`Grouping`][Grouping]. Fortunately, in Rust, we use a trait trick to implement a extension function on foreign or local declared types. We start by defining the conversion function's signature like below:
```rust!
pub trait IntoGrouping<'ks, T, K>
{
fn grouping_by(self, key_selector: impl Fn(&T) -> K + 'ks) -> GroupingImpl<'ks, T, K>;
}
```
This trait allows its implementor to take 1 closure as `key_selector` function, and consume themselves then returns a `GroupingImpl`. We can now implement this trait on the `Vec<_>` for convenient conversion like below:
```rust!
impl<'a, 'ks, T, K> IntoGrouping<'ks, T, K> for Vec<T>
{
fn grouping_by(self, key_selector: impl Fn(&T) -> K + 'ks) -> GroupingImpl<'ks, T, K> {
GroupingImpl {
raw: self,
key_selector: Box::new(key_selector),
}
}
}
```
To use the `grouping_by` function we defined in trait, we can just use below code to convert it!
```rust!
let v = vec!["one", "two", "three", "four", "five"];
let g = v.grouping_by(|t| t.chars().next().unwrap().to_ascii_uppercase());
// Now variable g is grouped by element's first character in uppercase
```
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
THE ESSENTIAL: AGGREGATE
</p></h2>
Now here's the hot take, let's start implement default grouping operations. Introducing the most important function of all: [`aggregate`][aggregate].
Before we advance to Rust's implementation, let's have a look on Kotlin's [`aggregate`][aggregate] to have a better understanding what it does:
```kotlin!
public inline fun <T, K, R> Grouping<T, K>.aggregate(
operation: (key: K, accumulator: R?, element: T, first: Boolean) -> R
): Map<K, R> {
return aggregateTo(mutableMapOf<K, R>(), operation)
}
public inline fun <T, K, R, M : MutableMap<in K, R>> Grouping<T, K>.aggregateTo(
destination: M,
operation: (key: K, accumulator: R?, element: T, first: Boolean) -> R
): M {
for (e in this.sourceIterator()) {
val key = keyOf(e)
val accumulator = destination[key]
destination[key] = operation(key, accumulator, e, accumulator == null && !destination.containsKey(key))
}
return destination
}
```
Let's not get confused with the difference of `aggregate` and `aggregateTo` functions, they're essentially the same while `aggregateTo` function provides more control on the result storage on the destination data structure. As you can see `aggregate` / `aggregateTo` function collects values that shares same key into separate data sequences, then stores the data sequences into a `Map`. In general, it groups value, that is.
So how do we do it in Rust? Well, difficult as it may seems but actually straightforward:
```rust!
fn aggregate<R>(&self, operation: impl Fn(&K, Option<R>, &T) -> R) -> HashMap<K, R> {
let mut m = HashMap::new();
for item in self.source_iterator() {
let key = self.key_of(item);
let value = m.remove(&key).map_or_else(
|| operation(&key, None, item),
|accumulator| operation(&key, Some(accumulator), item),
);
m.insert(key, value);
}
m
}
```
But there's still some important detail to notice while converting Kotlin's code into Rust's: In Kotlin, we don't have to consider data's ownership, but not in Rust. Notice that the reason we a reference to key is to reduce the cloning.
Overall, the function looks simple, and the operations are essentially same.
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
JUST ANOTHER AGGREGATED FUNCTIONS
</p></h2>
The remaining functions, e.g. `fold_with_key`, `reduce`, `each_count`, are basically `aggregate` function with additional operations / modification, we'll not discuss about their operations (assuming you're already familiar with basic data sequence operations).
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
FINALE OF AKT I
</p></h2>
That's all about our attempt to convert Kotlin's [`Grouping`][Grouping] into Rust code, as you read over these, you'll notice more aspects what Rust cares and what Kotlin doesn't really care about.
Generally speaking, we can learn more about the philosophy behind each programming languages, and tries to achieve same thing while keep the essential idea of certain subject.
In the next akt, we'll completely dive into Rust's ideaology and follows Rust's [`Iterator`][Iterator]'s best practice to refine our work, and also figure out potential flaws in our current work.
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
TRIVIA
</p></h2>
- The font used in headlines is Courier New, which references to the typewriter font used in album cover of Pink Floyd's `The Early Years`.
- Section name `CONV/ENIENT CONV/ERSION` is also a reference to the naming style used in Pink Floyd's `The Early Years`, in original design, the common suffix is `/ATION`
- Section name `JUST ANOTHER AGGREGATED FUNCTIONS` is also a reference to Pink Floyd's song `Just Another Brick In The Wall (Part 1-3)`
<h2><p style="font-family: Courier New;letter-spacing: 1px;">
REFERENCES
</p></h2>
<h3><p style="font-family: Courier New;letter-spacing: 1px;">
SOURCE CODE
</p></h3>
- [`GitHub`](https://github.com/ChAoSUnItY/metamorphosis/blob/main/src/akt1.rs)
<h3><p style="font-family: Courier New;letter-spacing: 1px;">
COMMON
</p></h3>
- [`Trait`][Trait]
<h3><p style="font-family: Courier New;letter-spacing: 1px;">
KOTLIN SPECIFIC
</p></h3>
- [`Grouping`][Grouping]
- [`Inline Functions`][Inline Function]
- [`Class`][Class]
- [`Abstract Class`][Abstract Class]
- [`Anonymous Object`][Anonymous Object]
- [`Aggregate`][Aggregate]
- [`Extension Function`][Extension Function]
<h3><p style="font-family: Courier New;letter-spacing: 1px;">
RUST SPECIFIC
</p></h3>
- [`Struct`][Struct]
- [`Lifetime`][Lifetime]
- [`Ownership`][Ownership]
- [`Iterator`][Iterator]
- [`Box`][Box]
[Grouping]: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-grouping/
[Inline Function]: https://kotlinlang.org/docs/inline-functions.html
[Class]: https://kotlinlang.org/docs/classes.html
[Abstract Class]: https://kotlinlang.org/docs/classes.html#abstract-classes
[Anonymous Object]: https://kotlinlang.org/docs/object-declarations.html#inheriting-anonymous-objects-from-supertypes
[Trait]: https://en.wikipedia.org/wiki/Trait_(computer_programming)
[Struct]: https://doc.rust-lang.org/rust-by-example/custom_types/structs.html
[Lifetime]: https://doc.rust-lang.org/rust-by-example/scope/lifetime.html
[Aggregate]: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/aggregate.html
[Ownership]: https://doc.rust-lang.org/rust-by-example/scope/move.html
[Extension Function]: https://kotlinlang.org/docs/extensions.html#extension-functions
[Iterator]: https://doc.rust-lang.org/std/iter/trait.Iterator.html
[Box]: https://doc.rust-lang.org/std/boxed/struct.Box.html