Boosting your backend application === ###### tags: `backend development` # Boosting the development productivity --> Building a good project boilerplate Boilerplate? In computer programming, boilerplate is the sections of code that have to be included in many places with little or no alteration. Such boilerplate code is particularly salient when the programmer must include a lot of code for minimal functionality. ## Choosing an application framework - No frameworks at all - Build / assemble things by yourself - Micro frameworks - Feature: routing, request handling, [response marshalling], error handling, ability to write custom hooks / middlewares - Python: Flask, Falcon, Bottle - Go: gin - - Full feature frameworks - Features: basic features, ready middlewares, logging, db abstraction, MVC, template engine, validator, auth - Python: Django, FastAPI, Tornado, Pyramid - Go: gin, echo - Why framework: Reduce duplication, reduce boilerplate code, share patterns between team members - How to build/find a good project boilerplate - Refs: - https://www.technolush.com/blog/micro-vs-full-stack-frameworks#:~:text=Micro%20frameworks%20are%20smaller%20in,more%20than%205k%20code%20lines. - https://www.quora.com/What-is-the-difference-between-framework-and-micro-web-framework-like-bottle-or-flask ## Common features - Configuration: defaults, command line arguments, env vars, config files (yaml, json, ini, toml, envfile), remote config (etcd, consul). Precedence: explicit call to Set --> flag --> env --> config --> key/value store --> default - Routing: grouping, module, blueprint (different auth schema, logging...), versioning (/v1/, /v2/) - Request handling: header params, path params, query params & request body, binding, validation - Response marshalling: wrapping response, dto ```python return HttpOK, user ``` - Logging: performance - Error handling ```python raise HTTPException(status_code=404, detail="Item not found") ``` - Testing: - testing with db and integration services - mock db, mock services - db session: tear down or not ## Common patterns - API Documentation: document-fisrt vs code-first - DB ORM and migrations: database-first vs code-first, built-in migration vs dedicated migration tool - ORM: basic queries, CRUD, basic relationships (be aware of lazy loading vs eager loading) - Complex joins --> raw query - migration: sql vs code - be careful of "SELECT *" when there are very large columns those are not neccessary for the result. Also: model.save() --> update all columns? --> how to update selected column ```python migrator.add_column('some_table', 'title', title_field), migrator.add_column('some_table', 'status', status_field), migrator.drop_column('some_table', 'old_column'), ``` - Convention vs Configuration - Dependency Injection - CLI commands & background tasks - Middlewares (plugins, hooks): Auth, Security (cors, csrf), Logging, Rate limit, gzip, rewrite, timeout --> many of them can be handled by the gateway/proxy --> KISS - Bigger project: clean architecture? trade off: complexity & flexibility # Boosting the application performance - Refs - https://betterprogramming.pub/improving-backend-application-performance-4e1b6c050ec8 - Tuning internal backend application, not about the system scaling (SRE) - Prepared for scale: stateless - Benchmarking - jmeter - locust - DB: Keep the transactional tables small - Steps: https://stackify.com/java-performance-tuning/ - 1. "Don’t optimize before you know it’s necessary", unless it doesn't take time or make the code hard to read and maintain - how fast your application code has to be - which parts of your application are too slow and need to be improved - 2. PROFILER VS. BAD GUESSES: Use a profiler to find the real bottleneck - 3. Work on the biggest bottleneck first - DB tuning: https://www.alibabacloud.com/blog/how-to-optimize-mysql-queries-for-speed-and-performance-on-alibaba-cloud-ecs_593872 - DB indexing, partitioning and sharding, db connection pool & statement pooling - DB connection pool - Improve Query Performance to Reduce Data Storage Load --> using orm --> print debug queries for further investigation - Design Patterns: Cache-Aside Pattern https://blog.cdemi.io/design-patterns-cache-aside-pattern/ - Be care of logging: log level, size - Caching: --> "don’t fetch it twice in your code" --> using a middleware with minial changes in code. Everything can be cached: cache the return values of functions - Distributed caching - Invalidate cache: by keys / tags (eg. invalidate all cached data related to a specific user when user role changed) - Middleware: use hash of request (params) as the key - Measure hitting rate - More about Caching - Cache Data at the Application Level - Caching Data at the Machine/Server Level - Use an External Cache to Reduce Data Storage IO - Backgrounding, Multi-Threading, and Asynchronous IO --> Async (coroutines), concurrency: racing condition --> cache with lock, db lock... - The Amazing Redis https://medium.com/swlh/the-amazing-redis-620a621f3b2 Locks and Counters Redis is a great place for storing synchronization locks and counters mainly because of its thread safe atomic operations. - Scaling: parallelism - Replace Multiple Fetches With a Single Multi-Get to Improve IO: even fast db like redis: get vs mget difference are significant https://medium.com/@jychen7/redis-get-pipeline-vs-mget-6e41aeaecef - ví dụ: query 1 page 10 product with promotion data: 3 ways: 1. left join promotion to product --> 1 query, 2. select 10 product, then query details for each product --> 11 queries --> supported by ORM, 3. Select 10 products, then select 1 query to select promotion data where product_id in (list of 10 ids) - Also: Batch your writes - Stroll Through the Code: simply reading and analyzing your code, looking for language specifics code practices --> don't over do it, refactor only when it's obivious - Data structure: mọi ng đều biết nhưng ko để ý khi code: list vs set / map - - Python: yield vs return - Java: string builder vs string concatenation - Concurrency across threads creates the opportunity for parallel processing, but it takes a keen understanding of Amdahl‘s law - Just use whatever language you’re most productive in since developer time is usually more expensive than computing power. - Cannot improve db more --> Database Changes: NewSql, NoSql - MySQL --> TiDB - PosgreSQl --> CockroachDB - MongoDB - ClickHouse - Elastics - Fastapi: https://levelup.gitconnected.com/fastapi-vs-express-js-vs-flask-vs-nest-js-benchmark-5e14d518cc00 --> asyncpg + orjson/ujson - https://travisluong.medium.com/fastapi-vs-fastify-vs-spring-boot-vs-gin-benchmark-b672a5c39d6c - Python json: https://dollardhingra.com/blog/python-json-benchmarking/ - FastAPI: fast, but fast out of the box: asynccpg + orjson - Logging affects performance. - https://www.techempower.com/benchmarks/#section=data-r21&hw=ph&test=query - https://github.com/tiangolo/fastapi/issues/1664 - Language specific tuning: https://aglowiditsolutions.com/blog/python-optimization/ - DB indices best practices: https://arctype.com/blog/mysql-indexing-best-practices/ - https://www.arctype.com/blog/optimize-sql-query/