Matrix Scalability

# Matrix Scalability ###### tags: `Lale` `Matrix` `Synapse` Most of the performance of **Lale** chat will depend on Matrix servers. Currently, we are using *a single monolithic python process* for the Synapse (matrix server). Synapse's architecture is quite [RAM hungry](https://github.com/matrix-org/synapse#help-synapse-is-slow-and-eats-all-my-ram-cpu) currently. They cache a lot of recent room data and metadata in RAM in order to speed up common requests. ![Single Monolothic Architecture](https://doc-00-20-docs.googleusercontent.com/docs/securesc/nih9l9m5e77kkr7227scgba8oo9ln4eg/6jso1sgd7fh3k57a1ajenmorueqm5rrp/1628818950000/10760273425755923018/10760273425755923018/1tV3WQgv72BBRl4PY39DgwoaMiZC4z0tY?e=view&authuser=0&nonce=c1jg94ffdciro&user=10760273425755923018&hash=hpmgahaefel4ui11k33eosq2rqdc2psl) Python’s Global Interpreter Lock ([GIL](https://wiki.python.org/moin/GlobalInterpreterLock)) means that Python can mainly only use one CPU core at a time, so starting more threads doesn’t help with scalability, we have to run multiple processes. This architecture will **eventually collapse**, as more users and activity occur within Synapse. Best practice and also the common solution is to break down the system into **microservices**. The current Matrix has this ability and is still in fast development. ## Auth Service Synapse calls Auth service as `Password auth providers`. Password auth providers offer a way for server administrators to integrate their Synapse installation with an existing authentication system. If I understand correctly, `Lalepass` is the `Password auth providers` in our architecture. We may try to separate this `Synapse` and `Password auth providers` into mircoservice (different machine). Synapse has this(separate Auth into a different machine) ability through [a configuration file](https://matrix-org.github.io/synapse/develop/usage/configuration/homeserver_sample_config.html) in `homeserver.yaml` under `password_providers`. ## Database Cluster A database cluster is almost implemented in all enterprise architecture which has many transactions. The idea is to create `Master-Slave` database cluster. The `Master` handles all the writing transaction (technically POST and GET API handled by Master database) and `Slave` database handle the reading transaction (GET API, in Synapse mostly `/sync` API). The `Master` and `Slave` database deploy into different machines. We achieve the Master-Slave connection through Postgres Replication. Related articles: * [Synapse uses database from different machine (but not master-slave database)](https://www.mytinydc.com/en/blog/matrix-synapse-shift-database-to-another-server/) * [Postgress Replication](https://www.enterprisedb.com/postgres-tutorials/postgresql-replication-and-automatic-failover-tutorial) * [Synapse Replication Architecture](https://matrix-org.github.io/synapse/develop/replication.html) ## Workers If we want to use database replication then we must use workers. We create other synapse processes (ideally on the different machines). I found [this documentation](https://matrix-org.github.io/synapse/develop/workers.html) gives lots of information regarding Synapse workers. ## Storage Service If Lale is targeting used by Million of users then Applying `Storage service` into different machines is a must. The configuration is under `media_storage_providers` variable [ref](https://matrix-org.github.io/synapse/develop/usage/configuration/homeserver_sample_config.html). but, I would suggest buying [AWS S3 (Amazon Simple Storage Service)](https://aws.amazon.com/s3/). Basically, we don't need to worry anymore about not having enough space for user storage. I also found someone created [a module](https://github.com/matrix-org/synapse-s3-storage-provider) for Synapse to connect with `AWS S3` ## Redis Implement Redis to speed up the reading transaction processes. (To be continued....) ## Overall This is the overall Synapse Architecture if we implement all above. ![Synapse Architecture](https://doc-00-20-docs.googleusercontent.com/docs/securesc/nih9l9m5e77kkr7227scgba8oo9ln4eg/0ocrb5vkg44m5jstdgil7c64as9ufp16/1628819025000/10760273425755923018/10760273425755923018/1KRp6Lu_9ZwvyHFtJdSPeEXOVf8iR7dOM?e=view&authuser=0) ### Notes * Ideally, everything is running on the different machines, **one machine one service**. * We may use [AWS (Amazon Web Services)](https://aws.amazon.com/) for everything. + We don't have to set up our own servers. + If something is not suitable for our purpose we could unsubscribe anytime. - We don't host our servers. * In order to implement one/all above, we must upgrade the Synapse into **the latest stable version**. ## Milestones It is too overwhelming if we implement everything is in once. * M1: Upgrade the Synapse into **the latest stable version** * M2: **Dedicated server** for Synapse (no other services running in the synapse machine) * M3: Move the **Database**, **Media Storage**, and **Auth** into different Machines. * M4: Implement **Database Replication** & **Workers**. (Will be tough) * M5: Implement **AWS S3** and **Redis**. ### Comparison We could take a look at how other IM services can achieve a good performance. * [Facebook Chat Architecture](https://www.slideshare.net/udayslideshare/facebook-chat-architecture) * [The architecture behind chatting on LINE LIVE](https://engineering.linecorp.com/en/blog/the-architecture-behind-chatting-on-line-live/) * [Matrix.org Architecture](https://matrix.org/blog/2020/11/03/how-we-fixed-synapses-scalability) ### Others * [Complete Guide to the Chat Architecture](https://yellow.systems/blog/guide-to-the-chat-architecture) * [Understanding the Architecture & System Design of a Chat Application ](https://www.cometchat.com/blog/chat-application-architecture-and-system-design) * [How to Make a Messaging App like WhatsApp, Telegram, Slack](https://www.simform.com/how-to-build-messaging-app-whatsapp-telegram-slack/) * [An Extensive Guide to Messaging App Development](https://yalantis.com/blog/messaging-apps-development-telegram-whatsapp-others-work/) * [Conquering Highest Scalability of an Enterprise Chat Application Using Kubernetes](https://dzone.com/articles/conquering-highest-scalability-of-an-enterprise-ch) **Some Articles suggest using WebSocket & RestAPI. But, we highly depend on Matrix Synapse for this one. Which only implement RestAPI over HTTP.**