# Matrix Scalability
###### tags: `Lale` `Matrix` `Synapse`
Most of the performance of **Lale** chat will depend on Matrix servers.
Currently, we are using *a single monolithic python process* for the Synapse (matrix server). Synapse's architecture is quite [RAM hungry](https://github.com/matrix-org/synapse#help-synapse-is-slow-and-eats-all-my-ram-cpu) currently. They cache a lot of recent room data and metadata in RAM in order to speed up common requests.

Python’s Global Interpreter Lock ([GIL](https://wiki.python.org/moin/GlobalInterpreterLock)) means that Python can mainly only use one CPU core at a time, so starting more threads doesn’t help with scalability, we have to run multiple processes.
This architecture will **eventually collapse**, as more users and activity occur within Synapse.
Best practice and also the common solution is to break down the system into **microservices**. The current Matrix has this ability and is still in fast development.
## Auth Service
Synapse calls Auth service as `Password auth providers`. Password auth providers offer a way for server administrators to integrate their Synapse installation with an existing authentication system. If I understand correctly, `Lalepass` is the `Password auth providers` in our architecture. We may try to separate this `Synapse` and `Password auth providers` into mircoservice (different machine).
Synapse has this(separate Auth into a different machine) ability through [a configuration file](https://matrix-org.github.io/synapse/develop/usage/configuration/homeserver_sample_config.html) in `homeserver.yaml` under `password_providers`.
## Database Cluster
A database cluster is almost implemented in all enterprise architecture which has many transactions. The idea is to create `Master-Slave` database cluster. The `Master` handles all the writing transaction (technically POST and GET API handled by Master database) and `Slave` database handle the reading transaction (GET API, in Synapse mostly `/sync` API).
The `Master` and `Slave` database deploy into different machines. We achieve the Master-Slave connection through Postgres Replication.
Related articles:
* [Synapse uses database from different machine (but not master-slave database)](https://www.mytinydc.com/en/blog/matrix-synapse-shift-database-to-another-server/)
* [Postgress Replication](https://www.enterprisedb.com/postgres-tutorials/postgresql-replication-and-automatic-failover-tutorial)
* [Synapse Replication Architecture](https://matrix-org.github.io/synapse/develop/replication.html)
## Workers
If we want to use database replication then we must use workers. We create other synapse processes (ideally on the different machines). I found [this documentation](https://matrix-org.github.io/synapse/develop/workers.html) gives lots of information regarding Synapse workers.
## Storage Service
If Lale is targeting used by Million of users then Applying `Storage service` into different machines is a must. The configuration is under `media_storage_providers` variable [ref](https://matrix-org.github.io/synapse/develop/usage/configuration/homeserver_sample_config.html).
but, I would suggest buying [AWS S3 (Amazon Simple Storage Service)](https://aws.amazon.com/s3/). Basically, we don't need to worry anymore about not having enough space for user storage.
I also found someone created [a module](https://github.com/matrix-org/synapse-s3-storage-provider) for Synapse to connect with `AWS S3`
## Redis
Implement Redis to speed up the reading transaction processes. (To be continued....)
## Overall
This is the overall Synapse Architecture if we implement all above.

### Notes
* Ideally, everything is running on the different machines, **one machine one service**.
* We may use [AWS (Amazon Web Services)](https://aws.amazon.com/) for everything.
+ We don't have to set up our own servers.
+ If something is not suitable for our purpose we could unsubscribe anytime.
- We don't host our servers.
* In order to implement one/all above, we must upgrade the Synapse into **the latest stable version**.
## Milestones
It is too overwhelming if we implement everything is in once.
* M1: Upgrade the Synapse into **the latest stable version**
* M2: **Dedicated server** for Synapse (no other services running in the synapse machine)
* M3: Move the **Database**, **Media Storage**, and **Auth** into different Machines.
* M4: Implement **Database Replication** & **Workers**. (Will be tough)
* M5: Implement **AWS S3** and **Redis**.
### Comparison
We could take a look at how other IM services can achieve a good performance.
* [Facebook Chat Architecture](https://www.slideshare.net/udayslideshare/facebook-chat-architecture)
* [The architecture behind chatting on LINE LIVE](https://engineering.linecorp.com/en/blog/the-architecture-behind-chatting-on-line-live/)
* [Matrix.org Architecture](https://matrix.org/blog/2020/11/03/how-we-fixed-synapses-scalability)
### Others
* [Complete Guide to the Chat Architecture](https://yellow.systems/blog/guide-to-the-chat-architecture)
* [Understanding the Architecture & System Design of a Chat Application
](https://www.cometchat.com/blog/chat-application-architecture-and-system-design)
* [How to Make a Messaging App like WhatsApp, Telegram, Slack](https://www.simform.com/how-to-build-messaging-app-whatsapp-telegram-slack/)
* [An Extensive Guide to Messaging App Development](https://yalantis.com/blog/messaging-apps-development-telegram-whatsapp-others-work/)
* [Conquering Highest Scalability of an Enterprise Chat Application Using Kubernetes](https://dzone.com/articles/conquering-highest-scalability-of-an-enterprise-ch)
**Some Articles suggest using WebSocket & RestAPI. But, we highly depend on Matrix Synapse for this one. Which only implement RestAPI over HTTP.**