Galerinha - HackMD

# Galerinha ## Business analysis Galerinhas, as a whole, has a bunch of business domain talking to each other. Ruben in our first conversation spoke to us only about the reward system. Although, we see it as a good first step because it is a good and more safe approach to construct a platform that each part is independent of each other. Of course that in our tech envy minds, the microservices word is the first to popup. But I want to make us look beyond the microservices concept and look at some pillars to the Galerinhas platform, starting at the users of the platform. Galerinhas will have two kinds of users, the parents and the suppliers, Galerinhas partners. ### Parents This one will look at Galerinhas as a trustworthy source of entertainment. This target is a subjective one, and the responsibilities are on the marketing and branding team. The best marketing team with the most fantastic campaign cannot be effective if our infra does not attend the service levels agreement. Another point is, the parents will have to look at Galerinhas as they look at Disney products. Disney does everything with excellence. Every interaction planned and tested to become the best version of it. The goal here is to the experience to be as close as possible to the dreamed one, and in some cases, to be better than the dreamed one. From the above paragraphs, we can extract some conclusions about our technological choices: Before any version of infrastructure is built, it must have an SLA. The frontend - web and mobile - will be the touchpoint with the parent. So this one has to be more than ok, it has to wow who see it and interact. ### Partners If the parents are at the center of Galerinhas, the partners are surrounding them with stunning offers. And as Uber/99 biggest challenger is to bring new drivers, I see that the challenge for Galerinhas is to show the partners that we are a one of a kind platform. So for partners, we will need to offer a stable. It does not need to be as charming as the parent one, but requires to provide confidence and data to support the evolution of their offers. Extracting the conclusions about our technical goals again: Before any version of infrastructure is built, it must have an SLA. We need to right from begging to keep our historical data secure and in a place for easy manipulation on the short future for the data team to handle and use this to provide information for the partners. Let's keep those pillars in mind as we move forward, talking about the platform itself. ## Architecture ![](https://i.imgur.com/6bb8pN9.jpg) The first step here is to take the users in mind when drawing the domain. So we need to have a Parents API, that the mobile and web client will use. And a Partner API for the admin and Partner client. We have omitted the admin user from the business analysis because it has no substantial impact on our choices. But we will refer to this user here. ### Overview Those APIs are the world wild open. They use a JWT authorization mechanism, the token on the header will carry the user authenticity and his access control level. Behind the authorization, we have two kinds of applications containerized, running on top of Fargate, and serverless, running on Lambdas. Each application has its own database. Before diving deep into the backend, what lies behind the APIs. I want to talk about the frontend. ### Mobile Our capabilities here are using Flutter technology, a Google Frameworks that builds from the same code base the Android and the IOS apps. Big companies have switched from react-native to flutter, as Nubank that has made this option in mid-2019. Those capabilities include automatized testing, unit, and end-2-end, which, together, act as a regression test for a more confident release routine. ### Web Our capabilities in the web are most using React and Vue framework, besides the fact we know and have experience with Angular, we believe that the first two are the right choice for a more healthy product and also a better poll of talents looking in the expansion of the Badico team or Galerinhas' tech team. Said that we know that the MVP/BETA is being built on Angular, and we are not saying that needs to change right away. But we believe that a more safe approach is to start the new features in one of the recommended frameworks. Said that our capabilities with frontend also include automatized tests. But do not stop there. We are skilled at building mono repo architecture because Partner Portal and the Parent Portal can use the same component. Frontend cleared out of the way. We have a bigger picture to analyze the backend of the backend. ### APIs As we have said, the API is the entry point of the whole backend. But against common sense, we can't expose our business API to the web. This part of the code has to be concerned about the business rules and its implication. Anything that is cluttering this goal has to be stripped away. The business rules live on the backend application, it does not concern about authentication issues or even access level control. If a request arrives and it has a valid body, it makes it happen. These other concerns are in the gateway for our API. The API Gateway sits between the client and one or more apps that each one has only one concern. One of those apps is the authenticator. An app that only deals with authentication rules and is used by the API Gateway to validate each request that arrives, if any is unauthenticated than it’s blocked before getting to the business app. We have experience working with 3 kinds of APIs, 2 of them we build from the ground up and one we just consume. #### RESTful API RESTful is the go choice for a project in the beginning. It will provide good work knowledge and has a lot of standards already in place. The communication between frontend and backend will be no brainer and also work very well for backend-backend communication when the need arrives. And the last point on that is, we follow all good practices of REST. Like a clear status code handling, using the right verb for the right job, using HATEOS as an engine for declaring all resources of the API. It does not just expose a POST route and get all params from the body. The common POSTful API. #### Graphql API Graphql is the choice when the frontend and backend are starting to taking its mature form. It is like graduation, a good choice is to put a Graphql server as part of the already established RESTful API. And start to move new features to it and modeling the old resource to them. A good rule of thumb in when to start the Graphql API is when you need to have a v2 of the RESTful API. Instead of rebuilding it in a RESTful way, you expose the new contract in Graphql. The Graphql will give the frontend a lot of control over the network request. And will let the modeling of views more easily tied to its API counterparts. #### SOAP API This one is the unnamed one. Every experienced engineer already had to integrate with one of this, and its a pain. SOAP was the 2000 answer to communication between services. A smart pipe approach, so smart that bind several constraints to the engineer actions and makes everything more cluttered. We integrate with SOAP, but we don’t build. So the API Gateway is in our tool-box. Let's move to another topic. ### WebHooks WebHooks uses the API Gateway's underlying resources, changing just the delivery strategy; instead of a sync answer, it acts as an asynchronous one. The async way is essential for communication between business apps, like the Rewards API and the Parents API. A good way to see a webhook is like a notification channel. If SOAP was the smart pipe, using a more modern approach with RESTful and WebHooks we are taking an as a dumb pipe and connecting smart services to it. The WebHook is a strategy in how to use and expose a notification layer on top of an API. ## Databases We always will recommend the use of databases as services, before even thinking about what database to choose. The only case that this will not be true is in a mature and well-established project. This choice is based on costs and risks. Taking the responsibilities to operate and a database is a big challenger, for a specialized team with just that responsibilities in hand. Maybe in the future, we can build this team, but for today, a DaaS works fine and give us velocity to deliver value. In the spectrum of what database. We have good parameters to choose first between NoSQL and SQL. In both AWS has good options and our team has a lot of experience build and supporting applications for those DaaS. We can go even further, we have one of the best working knowledge in Brazil about DynamoDB. And in AWS Aurora, we have the experience to connect thousands of lambdas to it. ## Infrastructure In the whole document we have talked in some form of infrastructure piece, but we want to take a special look at infrastructure itself, not about the applications, but the pieces that the applications are running on top of it. We divided the infrastructure topic into two distinct. DevOps and SRE. ### DevOps This contracted form of development operations, or developers and operations, means a lot of things to a lot of people. Here I want to use as a proxy to infrastructure that builds infrastructure. So any automation script that builds a local environment is in this category. We know that in the community definition there is a cultural meaning but this cultural piece is already the bases of Badico Engineering. So in the below paragraph, we gonna talk about actions In this category, we want to place some practices, that at minimum saves time, like: Fully-fledge local environment in one command away. With just one command the developer has to be able to run a local environment that behaves close as possible to the cloud one. Continuous integration pipeline, with this we want to be able to send to cloud environments at every pushed commit. In this pipeline, we also have tests that verify the safety of that version. Continuous Deploy and Continuous Delivery, these two are mixed up. But we can have just one of the two. The Continuous Deploy says that every newly integrated code goes to production. And the Continuous Delivery says that at any time we can send the integrated code to production. Which to choose is a business decision. Fully-fledge cloud environment in one command away. Same as the local environment but instead of run in a developer machine it runs in a cloud. For this, we use the infra as code approach. For enabling these practices we have some tools at our disposal, like: - Codepipeline - Codebuild - Codedeploy - Gitlab - Github Actions - CodeClimate - Seed - Terraform ### SRE If DevOps is about automating infrastructure. Site Reliability Engineering is about design and maintains that infrastructure. In this last topic, we gonna talk about the services level objectives in an infrastructure viewpoint. Reliability means different things for different parts of an application. Netflix is a streaming video platform this means that I have to be able to watch a video when connected. The promotions service can be down, the suggestion service can be down, the originals service can be down, but the streaming service has to stay in place. The right amount of Reliability is almost a subjective art, just upon experience taking into account good and bad decisions made in the past that we will be sure that we are making the right ones for today, but also know that ones made will not be so good tomorrow. So the SRE team has to be able to keep track of every decision made, since the development of an application. This knowledge has to be available not just for help with incidents, they will arrive sometime, ut to build together with the postmortem incidents the state of the application in the head of every team member. Doing so, everyone will be in a spot to come with new solutions, or see mistakes before an incident, and so on. This team is a team that is with production in mind. Before anything else is done, production has to be working in its feet. A good SRE engineer will design with resilience in mind, some of its components are listed below: - Elasticity, this one is the capacity to scale to handle a few(down) and lots(up) of requests with the same efficiency. - Fault-tolerance, the ability of an app to keep running even when errors happen. - Self-healing, given that an error has occurred the app then deals with that without human intervention