Galacticard Architectures

# Galacticard Architectures ## Good Coding Practices - Optimize for simplicity - There is almost no reason for something to not be simple. - Simple approach doesn't necessarily mean quicker to implement. - Be Defensive - Expect things to fail, think about what should happen when things fail, because they will. - Generally we should fail in such a way that our platform can keep operating as much as possible. - Don't assume things will happen in order. - Be [idempotent](https://en.wikipedia.org/wiki/Idempotence) whenever possible. - Write tests - Test failure modes. - Unit tests are sometimes good for critical, core code. - But mostly End-to-End Integration, its easy and.. - E2E/Integration can also be used as smoke tests against emulated and production services. - Tests shouldn't have expectations about time, timing, or data persistence to avoid mysterious failures. - Do Code Reviews - Are any unit tests necessary? - Did they write an integration test? - Are failure modes tested? - Did they handle all obvious points of failure to fail gracefully as possible? - Did any endpoint APIs change in a backwards-incompabile way? - Is there sufficient logging? - Any new dependencies? - Can the solution be made more simple? - Monitoring - Think about error rates rather than individual errors. - Also think about classes of errors that can be grouped together and their effects. - Think about what % of operations, over what period of time need to happen trigger an incident. - Cannot start an incident for every single error that happens. - If we follow all our guidelines, we should have a good idea on what and how things will fail. - Be more sensitive about operations that cause wholesale failure of core competencies, less sensitive about failures that do not directly impact core competencies. - You'll be potentially be waking people up in the middle of the night and you have to work with that person in the future. You will BE that person eventually. - Minimize external dependencies - Only add a dependency when it makes our lives a lot easier. - Every dependency we add expands the surface area of code we don't fully understand, or want to understand. - Every dependency is another thing we need to track and manage updates for. - Logging - Log everything you would need to diagnose some issue when it happens. - Think about the information you'd need if you were the person on call diagnosing an issue at 3am. You do not want to have to do deploys during an indicent to collect additional logging. - Make it easy to do the right thing, and hard to do the wrong thing - Make code reusable. - Make code easily testible. - Make code predictible. ### Functions Functions, both webhooks, crons and pubsub workers can only run for a limited amount of time. We can adjust this amount of time allowable, but there is a hard limit we cannot exceed (something like 10 minutes). We highly prefer functions which execute in the smallest amount of time possible. Less than 1 second is highly encourgaged. To that end: - Prefer functions that do one thing and do it well. - Functions should perform a constant chunk of work that takes a predicible and constant-ish amount of time to resolve. - Functions that need to operate across an unknown amount of work should use chunking and paging during the entire process: - Reduces memory footprint, we pay more the more memory we use. - Prevents accidentally running out of memory causing funny failure states. - If your function runs out of time (or memory), it should be able to be re-executed continously until it finishes the pending work. - The maximum number of workers available for each function should be highly limited. The best value would be a maximum of 1 worker executing at once (or 0!). - For workers that process offline tasks (pubsub or cronjobs), we should have a good idea of our tolerance for delays in that work being completed. If the workload is highly sensitive to delayed work, we can increase the number of worker to meet that expectation. If the workload is less sensitive, it might be acceptible to keep the worker pool small and let the work get processed slowly. - Don't use firestore triggers, it reduces our visibility into how much work is left to process, how long that work is being delayed, and our ability to manage those queues. ### Webhooks - A workload category that is highly sensitive to latency are webhooks. Most applications will expect that a webhook will be handled more or less immediately. This limits the types of things we can do in a webhook. - If the client (likely external clients) need the work to happen immediately, we likely need a larger pool of workers available for that purpose. - If the client doesn't need to work to finish immediately, we can throw an error saying to try again later, or schedule it on a pubsub worker and help the client know how to figure out when the work is complete. - Use a consistent response format: Document example: ```json= { "data": {} "errors": [{ "message": "error happened", "code": 10002 }], "meta": {} } ``` Collections example: ```json= { "data": [{}, {}, {}], "errors": [{ "message": "error happened", "code": 10001 }], "meta": { "cursor": "aaaaaaa", "docCount": 1092, "next": "/api/thing?cursor=aaaaaa" } } ``` - Think about how latency tolerant the clients of your webhook are. - Do they need the request to be handled immediately? Will they retry the request if we don't have any resources available? - If they are highly sensitive and need the request handled immediately, can we hand the work off to a pubsub worker and finish the bulk of the work later? - Use HTTP status codes, but consider using internal error codes to handle subtlety when communicating with clients. ### Front-end TBD ### Contracts TBD ## Technical Architecture Below are the technologies that we will use to power the GalacticCard their responsiblites in our technology stack. ### Front End: Web - NextJS - JavaScript with the option of upgrading progressively to TypeScript if we choose. - Handles all user-facing interactions. - Allows users to authenticate with their wallet - Allows users to link and authenticate with third-party centralized identity providers like Google, Facebook. - Allows users to interact with our contract to control refilling behavior, depositing and withdrawl of USDC to fund their USD card. - Performs authenticated reads directly from the datastore - Uses Firestore's `onSnapshot` for real-time updates - Does NOT perform writes to datastore, all writes gated through authenticated webhook calls to the backend APIs. - Has SSR but can be completley disabled for simpler hosting - Marketing portions might need SSR for SEO, but the app portion of the site may not. - https://gist.github.com/tannerlinsley/65ac1f0175d79d19762cf06650707830 - https://dev.to/apkoponen/how-to-disable-server-side-rendering-ssr-in-next-js-1563 ### Backend: API - Authenticated HTTP Endpoints over TLS - REST-ish API implemented with Firebase's HTTPS functions. - Authenticated requests via Firebase Auth JDK identity token. - Return as much data as the client needs to fully render a view or component. - Handle authenticated webhook events from card vendors. - Handle write operations to datastore. - Handle write operations to PubSub queues. ### Backend/Datastore: Contract - EVM Solidity Contract - We require an EVM-compatible layer 2 that supports ERC20s. - Currently deployed on Polygon. - Standard solidity contract. - Extends the ERC20 specification. - **CHANGING** Allows users to "deposit" USDC tokens and receive GalacticTokens in return. - Users cannot transfer GalacticTokens to other wallets. - Emits on-chain Events about GalacticToken movement. - Source-of-truth for GalacticToken balances. - **CHANGING** Custodial entity "owning" the USDC transferred in and out. ### Authentication - Firebase Auth - SIWE determines primary identity - Authentication state managed by Firebase - Firebase Auth allows for linking 3rd-party identity providers for account recovery (twitter, google, facebook, email). - The wallet will be the primary identifier, and we will strongly encourage secondary identities. ### Asynchronous Operations - Firebase functions - Cron worker to periodically ingest chain transactions and operations executed by accounts enrolled in our program (to calculate fraud scores) - PubSub workers to react to signals from our backend API as needed - Fetching exhaustive transaction history for a wallet in preparation of calculating a fraud score. - Calculating fraud scores. - For example: when new applications, profiles, etc.. happen ### Datastore - Firestore - Schemaless - Front end will read directly. - API will write. - Flexible rules that allow us to interact with identity/authentication state to gate access to only documents "owned" by the identity performing the reads. - Allows for real-time updates via `onSnapshot` - Has triggers, but our opinion is that **they should be avoided in deference to PubSub workers whenever possible, which is always.** - Might be where we store our copy of a "ledger" containing fiat & crypto transactions associated with a user/wallet ### Configuration - Environment variables are a good choice - Firebase has its own configuration / secret management for functions ### Administration / Management - Write scripts to perform regular maintenence tasks. Should be easy for another developer to run the script if needed. - Write scripts to remediate issues, data repair, playbooks, etc... ## Data Flow There are many ways to schedule the execution of some work in the firebase system. 1. Webhooks can be called directly by our front end (or other webhooks) 2. Webhooks can create PubSub tasks that are consumed by a pool of workers 3. Webhooks can create Firestore documents which can execute firestore triggers **Not Reccomended**. 4. Cron functions can be scheduled to run on a periodic basis. We've utilized all of these in the past and they are all valid approaches. One area we've leaned on hard in the past are Firestore triggers. Most of our firestore triggers are executed either when a new document is created or when an existing document is updated. In rarer cases, a trigger may be called when a document is destroyed. This is all very useful and we believe triggers are simply using PubSub topics/subscriptions in the background. However, compared to directly using PubSub events, _we lack critical visibility into the underlying topics/subscriptions_. Thus using firestore triggers has several downsides: 1. We cannot know how much work is pending on the topic backing the function. 2. We cannot purge the underlying topic if needed. 3. We cannot know how long work is lingering on the topic before being executed. 4. Firestore triggers are extremely firebase coupled. Thus we would reccomend that in all cases where a trigger would be useful that we take one of the following approaches. #### Manually publish a PubSub message to a topic we directly control after adding or updating a document. **Upsides** - We fully control what happens, and have maximum insight into the topics and subscription feeding work into our system. - This approach is not tied to Firestore and is applicable to any system or platform we choose to use in the future. **Downsides** - Its possible the function could write the document to firestore and fail to publish the appropriate message to a topic either by sloppy code or due to some failure mode. - Care would have to be taken to ensure we handle failure modes and idempotency of the original function/webhook. ## Data Architecture Firebase's Firestore is organized around the idea of **collections** and **documents**. Collections may exist at the top-level namespace of the datastore or may exist as a property of a document. The latter version is generally called a **sub-collection**. Collections can be directly addressed as such: - `/purchases` - The Purchases collection - `/purchases/123` - A Purchase document identified by `123` - `/purchases/123/transactions` - The Transactions sub-collection of purchase `123` - `/purchases/123/transactions/abc` - A transaction, identifier by `abc`, belonging to the purchase `123` In this case there is a single collection named `/purchases` and a variable number of `transactions` collections, situated beneath each document in the `purchases` collection. Collections and documents can effectively be nested perpetually, but in the sake of simplicity its often nicer to limit the depth of these. ### When to use a collection versus a sub-collection Generally, all sub-collections _could be_ top-level collections, but this requires that each document in those collections have a unique identifier. **Using sub-collections is _generally_ preferred as they lead to a more organized data structure, both in how our code is represented and how developers think about the data.** There are really no speed or cost benefits from using one approach over the other but the way we write our code and reason about it are the primary effects. There are some other advantages to sub-collections though: - Sub-collections have their own rate limit for writes, each subcollection has its own index. - Sub-collections can simplify security rules. - Sub-collections mean that the document identifiers don't need to be globally unique. **TODO:** Make it more clear the pros and cons, and include "wider compatibility" to using top-level collections over sub-collections. The biggest downside to using sub-collections is that deleting nested data is a pain. Generally we prefer the approach of never-deleting things, so that shouldn't be a huge issue unless there is bad data that is causing issues. The biggest downside to using root-level collections is that the more heirarchical data we add, the more complex it becomes to reason about our data. The primary use-case for top-level collections are for many-to-many relationships. Resources: - https://firebase.blog/posts/2019/06/understanding-collection-group-queries - https://firebase.google.com/docs/firestore/query-data/queries#collection-group-query - https://firebase.google.com/docs/firestore/manage-data/structure-data - https://stackoverflow.com/questions/47193903/advantages-of-firestore-sub-collections ---