Reactor: Simple Variants

Introduction

The original reactor document is here. In this document, I'll present a full system: if you're new to reactor, you shouldn't necessarily need to reference the original document. With that said, what's new?

Conventional Improvements

Addtional safety in 1-for-1s.
- We reintroduce the direct color play signals of sieve.
- We allow delayed play signals not promising the position of any connectors.
- Directly attributable to timotree.
Good touch information given by reactive clues.
- We bias our touch to be on previously-untouched cards.
- We bias our touch to be rightward.
- This synergizes with 1-for-1 safety: touched cards tend to be ones otherwise hard to make act.
- Inspired by targeting in the Indirect Signaling System which is in development by akashnil.
Dry board safefy.
- We play with an early-game lock assumption
- The hardest theoretically winnable decks involve locked players. A lock assumption optimizes conventions for those decks. That said, it's plausible my proposed solution (no chop until 10 points) goes too far in certain contexts, especially if clues are low.
Care with finesse targeting.
- When necessary, we want finesses to work as often as possible. We also want to keep them from being necessary.
Allowance for color clues to occasionally get two discards.
- While rare, sometimes a color clue doesn't make sense as two plays, no matter what we assume. In these instances, it's nice to allow for the clue to provide two discards.
Clue naming.
- Instead of our 1-for-1s being called referential sieve clues as in the original document, we simply call them stable so that the name contrasts with reactive.

Considerations

Safety comes at the cost of efficiency in situations where that safety is unnecessary. In optimizing for safety to this extent, the claim that this convention is an improvement expresses a belief that with solid play, the team will be totally fine in (almost) all situations where that safety is unnecessary.

An L

I expect this version performs significantly worse in null. This is due to both the stable and reactive clues suffering extra in that variant relative to the original Reactor conventions.

Principles

The Reactive Principle

The basic premise of Reactor holds: this is a system designed with 3p in mind. The next player without a safe play is the reacter, and the other player is the reciever. When a clue is given, the reacter expects that it is providing them with a safe action. Clues given to the reacter's hand are called stable; the reacter takes their information, and nothing additional happens. Clues given to the receiver are reactive: they promise the reciever a safe action, but the reacter gets one too: they react by playing or discarding a card.

Good Touch Principle

This is just good. Most cards in the deck are good. Most systems should play with a weak good touch principle: if a card in Alice's hand is known to be playable or trash, but it is not known which, playing that card is the default action should Alice choose to not give a clue.

Sieve Chop

When chop exists, it is the leftmost untouched card in hand.

Stable Clues

Stable clues are clues given on the reacter's hand. They can occasionally be given to fix otherwise good-touch playables.

Fill-in clues

If a clue fills-in a previously-touched card, revealing it playable, trash, or a duplicate of a card in the receiver's hand, it is a signal to act on that card, with no further meaning.

Color play clues

Color clues say to play the leftmost newly-touched card. If the reciever holds a playable card of that color, it is assumed to be a help-yourself delayed play clue; the reacter should help get that card (or cards) to play to become unlocked. Shoutouts to timo for this excellent idea: it is a main reason this version returns to direct color play clues.

Rank discard clues

Rank clues say to discard the untouched card to the left. When multiple cards are referred to, the signal is on the leftmost referred-to card, except chop (if it exists) has lowest priority. If chop is the only untouched referred-to card, the clue is a lock signal.

Rank Trash

If there are two non-chop cards to the left, it is simply a direct discard clue.

Color Trash

As of now, this is undefined.

Reactive Clues

Reactive clues get an action from each of the other players.

If you want give a clue getting two untouched cards to play, count the total number of untouched cards to the left of each. If one or both cards are touched, the count for that hand should include all untouched cards and touched cards to the left. You should get a number between 0 and 4 for each hand. Add them, and take mod 5.

Count that many untouched cards then touched cards if you run out in from the right of the reciever's hand. The card after that is the one you want to focus. A color clue touching that card and none of the ones you counted over will work, at least as far as the slot math is concerned.

An Equivalent Formulation

The Clue Focus

The focus is the rightmost newly-touched card, if one exists. If none exists, it is the rightmost re-touched card.

Rank Clues

The target can be a play or a discard. The priority:

Leftmost untouched playable card
Leftmost touched playable card
Leftmost trash (card on the stacks or duplicated)

Color Clues

The target is almost always a play. The priority:

Leftmost untouched playable card
Leftmost touched playable card
Newest available finesse combination
Leftmost trash

The action

If the clue is a color clue, the actions will be two plays or (very occasionally) two discards. If the clue is a rank clue, the actions will be one play and one discard.

The focused card's priority value determines the priority values of the two targets. If the focus is previously-untouched, count untouched cards to its right. If it's previously-touched, count all untouched cards and previously touched cards to its right. This is the focus priority value.

If the target is previously-untouched, count previously-untouched cards to its left. If it is previously-touched, count all untouched cards to its left and previously-touched cards to its left. This determines the target priority value in both the reacter and the reciever's hands.

Finally, the equation all players must make true is simple:

focus priority value = target priority value + target priority value

The Early Game

Until the team has 10/25 points, chops do not exist; all discards must be instructed. Get good. The turn after the team reaches 10/25 points, all players without known actions have chops set to their leftmost untouched card, and players play with sieve-style chops from there. Maybe this threshold should change by a bit, but approximately there has heuristical justification and did pretty good in hypos.

Clarifications

Finesse Targeting

Finnesses between Bob and Cathy become available for one of three reasons:

Bob just drew the connector
Cathy just drew the 1-away
The connector just became playable

When Bob is asked to play into a finesse, he should ask

Does it involve a slot 1 drawn since Alice's last turn?
Does it involve a color whose stack changed since Alice's last turn?

If the answer to any of these questions is yes, that is the priority finesse target. If multiple are the answer, the priority is: Bob's slot 1, Cathy's slot 1, then matching color.

If all are no, Bob should then assume prompt (play matching touched card) over finesse, then choose the leftmost card in Cathy's hand.

Overall, here is Bob's priority order, which should be not that bad:

Play just-drawn slot 1
Play into the just-drawn slot 1
Play into a card with just-played color
Play touched card
Play into leftmost possible target
Discard a card to indicate leftmost trash in Cathy's hand.

Free Choice

Prior to chop's existence, a free choice convention would be good to tell cards to discard from the left.

Chop: why?

The best chop systems involve chop-moving cards.

H-group chop-moves nothing by default
KRU chop-moves nothing, but uses universal rank saves: cards are easy to chop-move with clues
Sieve and Ref Sieve initially chop-move 12 cards (3p) and expect a chop move from every action.
Turbo chop-moves 12 cards (3p) and expects a chop move every time at least one action is given.
The KPP-cup-winning H-group-based team chop-moved the entire starting hand (or maybe just 12 cards?).
Ref Sieve and Turbo players routinely flirt with the idea of having locked starting hands.

I propose something further. I claim that in easy variants, we can afford to spend clues on all our early discards as well as our plays. Let's do the following.

Chop-move all of the first 25 cards in the deck.
Chop-move one additional card for each of those cards that ends in the discard pile.

One perspective on this is that we assume by default players have no safe discards until we know for sure someone has one. Both of these forumations are approximately equivalent to what is in the doc: no chops until ten points.

Final thoughts thoughts on safety

Everything here about safety makes a big assumption: that we will not run out of clues. In general, this is the fundamental downside to care in clue meanings. H-live discusses a version of this concept as care with clue efficiency: the ratio between numbers of cards left to play to clues left available. I prefer to think of a game of Hanabi as optimizing both our plays and our discards, making this framing of clue efficiency less useful.

Ideas

Stable Clue Information

One thing we've discussed for stable clues is using them to provide information to the reciever as well as an action to the reacter. I think this is an extremely promising idea but I don't have a good idea of how to implement it. As such, it is left out of these conventions.

Finesse Targeting

I think 1, 2, and 3 could reasonably be permuted in any way. 5 could be simply something else. 6 could come before 5 or even 4. Yet so it is written and so it shall be.

Awkward Reciever Hands

It would be nice if Alice could give 1-for-1s to Cathy even if Bob has no safe action when Cathy's hand is awkward: That is, Bob has no way to get a safe action from her with a single stable clue.

Delayed Signaling Toxicity Assumption

Suppose Alice has no safe action, and so she gives a clue. Before her next turn, she recieves a 1-for-1 she apparently could have gotten last turn (it says something other than telling a newly-playable card to play). She could assume it is trying to get a different action in some way.

One Option:

Color Clues
- When a color clue newly-touches multiple cards, play the 2nd-leftmost.
- When a color clue newly-touches one card or 0 cards, play the rightmost untouched card in hand.
- This handles all situations with 3 previously-touched cards
- This handles all situations with 1-2 previously-touched cards except when all cards share a color.
- This handles many situations with no untouched cards in hand. A counterexample has to have one of the following:
  - Alice has only two colors in hand
  - Alice's slot 1, 2, and 3 are the same color.
- All of these examples (except the last) are mathematically impossible to improve upon without fundamentally rethinking the conventional structure.
When a rank clue touches multiple cards, focus invert with left-bias. Become willing to discard a newly-touched card.
When a rank clue touches a single card, discard ??

Unnecessary Stable Play Toxicitiy Assumption

A stable play clue is unnecessary if it could have been gotten reactively. This being defined is a benefit to stable play clues being direct and with color: knowing what card is playing lets the reacter decide far more often whether the clue could have been gotten reactively. That said, a toxicity assumption isn't currently in the doc, but maybe it should be.

The Endgame

Endgames are sometimes awkward with Reactor. I think we could define the endgame as ~15+ points or 2- pace, then use that adjust conventions accordingly. Like stable clue information, I don't have strong beliefs for how to implement this.

Alternative to Chopless

See here for an alternative solution to the chop problem: https://hackmd.io/G48yGOEYSHGM5hiIux4MbA

Both of these conventions are trying to get at a similar problem: when players are locked, it is important to be cycling the deck in order to draw the cards needed to unlock them. The difference is largely in how situations when two players have no trash are handled.

To compare scenarios, call a player hardlocked if they hold no playables or trash and have no OK discards. Call them softlocked if they hold no playables or trash but have at least one OK (inter-hand duplicate or 3/4 BDR) discard. To further analyze, we could break down softlocked states into even smaller cases depending.

Scenarios Chopless performs better

Bob is hardlocked, Cathy is softlocked
Bob and Cathy are hardlocked (but both do bad here)

Scenarios Semi-spookiness performs better

Bob is softlocked, Cathy is hardlocked, and Bob's slot 1 is an OK discard
Bob and Cathy are softlocked, and Bob's slot 1 is an OK discard

Hypotheticals

Upon review, these have OK but imperfect decisionmaking between conventional acions. I think they demonstrate the system's strength well.

p2v52s1 (Clue Starved)

Reactor: Simple Variants

Introduction

Conventional Improvements

Considerations

An L

Principles

The Reactive Principle

Good Touch Principle

Sieve Chop

Stable Clues

Fill-in clues

Color play clues

Rank discard clues

Rank Trash

Color Trash

Reactive Clues

An Equivalent Formulation

The Clue Focus

Rank Clues

Color Clues

The action

The Early Game

Clarifications

Finesse Targeting

Free Choice

Chop: why?

Final thoughts thoughts on safety

Ideas

Stable Clue Information

Finesse Targeting

Awkward Reciever Hands

Delayed Signaling Toxicity Assumption

Unnecessary Stable Play Toxicitiy Assumption

The Endgame

Alternative to Chopless

Hypotheticals

Read more

Reactor 2.0 Winstreak Agreements

Winstreaking - Best Practices

Reactor AT DAWN

Reactor 2.0 Cheatsheet for Hallmark