## The DATABASE
```mermaid
flowchart LR
A([Projects]) --> |1: New string to translate| B
B{{Localization.Service}} --> |2: Send new string to translate to Crowdin| C{Crowdin}
C -.-> |3: Webhooks returning translated datas| B
B --> |4: Store the returned datas in our database| D[(Internal Database)]
```
In order to save on our side all tanslated strings, we need a database who can handle a **big** amout of request.
**BUT...** Why do we want it on our side ?
- We can't perform a lot of request on crowdin to get latest translations, there are many rate limit on their API
- We can't do a memory database, each time we will reboot, we need to recover all data from Crowdin, and we just reach the first case
- In term of speed and performance, it's better to keep the source of truth really near the Localization Service
- We need an extra data in our translation: the state. Depending if the string is already approved or just suggested from the Crowdin's IA, we will need to store this on our side to provide to the user a "IA translation state percentage"
In the choice of database, we need to take care of several criterias: the high availability, performance, Turbulent's knowledge,...
But we need to keep in mind that the Localization Service will have some cache between the services & database, reducing the load on it.
### 1. 🟢 Use a two database, document & key value store
On RSI we can isolate two kind of data to translate:
- tokens (from template per example, who is like a key / value)
- pages (from the comm-link, who can be stored as a document, who can be a set of token)
To optimize the storage & requests speeds, we can separate in two database these datas to provide and separate the way we can retrieve it.
For a single token, the key value is the best option. However for a page, the whole translation can be stored in a document and the key can be a part of the URL, per example:
```
comm-link/spectrum-dispatch/19462-Whitleys-Guide-San-toky-i
```
```mermaid
flowchart LR
A{{Localization.Service}} --> B[(Key / Value DB)]
A{{Localization.Service}} --> C[(Document DB)]
B --> A
C --> A
```
- **Advantage:**
- Retrieving all translated strings for a page can be faster than searching each key for the same page in the key / value store
- A page / set of tokens can be used for more than a page like a full TySku object
- **Disadvantage:**
- Implement two different logics for tokens & page / set of tokens
- Need to maintain more code because we are using two different database
-
###### *Explanations about two different storage method*
- ###### 💾 **Database #1 KEY / VALUE**
```
KEY | VALUE
account_settings_rsvp_title | Titre RSVP
RSI_faq_title | Foire aux questions
```
- ###### 💾 **Database #2 Document NO-SQL**
```
KEY | VALUE
ty_sku_1001 | {"title": "Grand vaisseau 10 places", "description": "Achetez-le"}
ty_merch_9987 | {"title": "Hoodie XXL", "description": "Devenez un pilote"}
```
### 2. 🟢 Use a single database as key / value store
Instead of using two database and differenciate data type, we can store it all inside the same database engine. Document or key / value store, it will keep the logic for both: token or split page elements to tokens.
```mermaid
flowchart LR
A{{Localization.Service}} --> B[(Key / Value DB)]
B --> A
```
- **Advantage:**
- Architecture is lighter
- Compared to Solution 1, less code to maintain
- If the load require it, replication can be easy with key sharding: place **ty_sku_*** in Replica-1 and **ty_merch_*** in Replica-2
- **Disadvantage:**
- Need to maintain more code because we are using two different database
- Need to find a way to create & retrieve key depending the opbject, page, etc
###### *Explanations about single storage method*
- ###### 💾 **Database KEY / VALUE**
```
KEY | VALUE
account_settings_rsvp_title | Titre RSVP
RSI_faq_title | Foire aux questions
ty_sku_1001-title | Grand vaisseau 10 places
ty_sku_1001-description | Achetez-le
ty_merch_9987-title | Hoodie XXL
ty_merch_9987-descriptio | Devenez un pilote
```
### ⚖️ Choice about database (Key / Value store)
### 🟢 Redis:
**Functionality:**
- In-memory key-value store, providing high-speed data access due to data storage in memory.
**Capabilities:**
- Used for caching, session management, message queues, and supports advanced data structures.
**Scalability:**
- Redis can be scaled horizontally using clustering.
**Sharding:**
- Supports sharding for horizontal scaling.
**Advantages:**
- Extremely fast read and write operations.
- Support for advanced data structures.
- Highly suitable for caching and real-time applications.
- ✅ Lot of knowledge inside Turbulent about this database engine
**Disadvantages:**
- Limited data size by available RAM.
- Not ideal for large-scale data storage.
### Comparative Array
| Criteria | Redis | ScyllaDB | ArrangoDB |
|----------------------------|-------|----------|-----------|
| Scalability (easy) | x | ~ | |
| Better for Key-value store | x | x | x |
| High performance | x | x | |
| Price (cheap) | x | x | x |
| Sharding | x | x | x |
| Replication | x | x | x |
| Data persistence | | | x |
| Multiple GET | x | x | |
| Single Threaded | | | x |
| TURBULENT KNOWLEDGE | x | | x |
### 🔴 ArangoDB:
**Functionality:**
- Multi-model NoSQL database supporting document, graph, and key-value models.
**Capabilities:**
- Offers the flexibility to work with various data models.
**Scalability:**
- ArangoDB supports horizontal scaling through data partitioning (sharding).
**Sharding:**
- Built-in support for sharding.
**Advantages:**
- Support for multiple data models.
- Horizontal scalability for distributed applications.
- Joins and transactions across data models.
**Disadvantages:**
- Smaller user base compared to some other databases.
- Limited third-party tools and libraries.
- Less turbulent knowledge comparing to redis
### 🟡 ScyllaDB:
**Functionality:**
- NoSQL database compatible with Apache Cassandra, based on a columnar model.
**Capabilities:**
- Offers high availability, low latency, and supports Cassandra Query Language (CQL).
**Scalability:**
- ScyllaDB uses data partitioning (sharding) and replication for high availability and linear scalability.
**Sharding:**
- Built-in support for sharding.
**Advantages:**
- High availability and low latency.
- Compatibility with Cassandra, easy migration.
- Suitable for real-time applications.
**Disadvantages:**
- Requires some configuration for optimum performance.
- Smaller community compared to Cassandra.
- Never used inside Turbulent
### ⚖️ Choice about database (NO-SQL & Documents)
### 🟢 MongoDB:
**Functionality:**
- Document-oriented NoSQL database.
**Capabilities:**
- Suitable for applications with flexible and scalable data schemas.
**Scalability:**
- MongoDB can be easily scaled horizontally using sharding.
**Sharding:**
- Built-in support for sharding.
**Advantages:**
- Flexible schema, easy horizontal scaling.
- Good performance for read-heavy workloads.
- Rich querying capabilities.
**Disadvantages:**
- Eventual consistency model may not be suitable for all applications.
- Indexing and data modeling complexity.
### 🟡 Couchbase:
**Functionality:**
- Distributed NoSQL in-memory database supporting key-value and document-oriented models.
**Capabilities:**
- Offers high-performance caching, multi-master replication, and indexing for efficient data retrieval.
**Scalability:**
- Couchbase achieves horizontal scalability using data partitioning (sharding) and replication.
**Sharding:**
- Built-in support for sharding.
**Advantages:**
- High-performance data retrieval.
- Horizontal scalability and multi-master replication.
- Flexible data modeling.
**Disadvantages:**
- Configuration complexity, requiring careful planning.
- Smaller community compared to some other databases.
### 🔴 ArangoDB:
**Functionality:**
- Multi-model NoSQL database supporting document, graph, and key-value models.
**Capabilities:**
- Offers the flexibility to work with various data models.
**Scalability:**
- ArangoDB supports horizontal scaling through data partitioning (sharding).
**Sharding:**
- Built-in support for sharding.
**Advantages:**
- Support for multiple data models.
- Horizontal scalability for distributed applications.
- Joins and transactions across data models.
**Disadvantages:**
- Smaller user base compared to some other databases.
- Limited third-party tools and libraries.
- Less turbulent knowledge comparing to redis
### Comparative Array
| Criteria | Couchbase | MongoDB | ArrangoDB |
|-------------------------------|-----------|---------|-----------|
| Scalability (easy) | x | x | x |
| High performance | | | |
| Price (cheap) | x | x | x |
| Replication | x | x | x |
| Can do BOTH (K/V & Documents) | x | | x |
| TURBULENT KNOWLEDGE | | x | x |