Part 1 - HackMD

# Part 1 ## Load balancing a docker Swarm cluster with nginx As it was stated by the assignment, the first part was for us to implement a load balancer in a cluster of nodes that run the microservices of the hotelapp. In our case this was done using a Docker swarm cluster and an nginx proxy load balancer. ### Implementation The nginx balancer will be implemented by adding it as a service in the docker-nginx.yml file which is used in order to deploy the docker stack to the swarm. #### Choosing a microservice to load balance After inspecting the cpu usage in each microservice when deployed with the docker-compose file (which contains the initial microservice implementation from hw3), we noticed that the most resource-intensive microservice was `search`. This is expected since examining the lua workload files (that the benchmarking tool uses) shows that it explicitly sends requests to the search backend service. We chose to scale the `seach` service to 3 instances in a cluster of 7 nodes. ![](https://i.imgur.com/FQVfxP2.png) ### Nginx configuration Some minor changes needed to be made from the given configuration in order for the proxy load balancer to work properly. First of all nginx needs to know to use the swarm's DNS resolver so that it resolves the `search` host to one of the scaled instances. Then we set up the nginx server so that it forwards grpc requests to one of the scaled instances which will be resolved as explained above. The requests will be distributed in a round-robin fashion as it's the default. ```yaml events { } http{ resolver 10.0.9.1; upstream search_server { server search:8082; } server { listen 8582 http2; location / { grpc_pass grpc://search_server; } } } ``` ### Changes to the frontend microservice `frontend` needs to know to contact `nginx` instead of `search` directly. We modify `cmd/frontend/main.go` as follows: ```go var ( port = flag.Int("port", 8080, "The service port") addr = flag.String("addr", "0.0.0.0", "Address of the service") jaegeraddr = flag.String("jaeger", "jaeger:6831", "Jaeger address") profileaddr = flag.String("profileaddr", "profile:8081", "Address of the profile service") //Changed from search:8082 to nginx:8582 searchaddr = flag.String("searchaddr", "nginx:8582", "Address of the nginx load balancer for the search service") ) ``` ### Verifying it works To check if the load balancer is working properly, we first start a benchmark using the `wrk2` tool. While it's running we check each host's CPU usage using ansible: ```bash ansible all -i ./hosts -a "docker stats --no-stream" ``` *Note: the `hosts` file contains the hostnames of all nodes in the swarm* Which returns the following: ![](https://imgur.com/3US3rvr.png) Looks like the load is evenly distributed between the `search.1`,`search.2`, and `search.3` instances. The load balancer works correctly. # Part 2 ## MongoDB-based and Memcached-based implementations of `rate` As per the assignment, we were told to edit the files `docker-compose.yml` amd `internal/rate/mongodb.go`, so we assume the MongoDB-based implementation is the one that uses Memcached for caching and the Memcached-based implementation is the one that uses the `internal/rate/memdb.go` file. ### Docker Images From the previous assignments, all the microservices are served from a single image (`andreas16700/hotelapp_micros` on dockerhub) and each microservice is launched with its appropriate `entrypoint`. ```yaml search: image: andreas16700/hotelapp_micros entrypoint: search container_name: 'hotel_app_search' ports: - "8082:8082" restart: always geo: image: andreas16700/hotelapp_micros container_name: 'hotel_app_geo' entrypoint: geo ports: - "8083:8083" restart: always ``` We implement the MongoDB-based version of `rate` similarly to the extension of `profile` in lab 10. The function ```golang func (db *DatabaseSession) GetRates(hotelIds []string) (RatePlans, error) ``` Checks Memcached first. ```go for _, id := range hotelIds { // first check memcached item, err := db.MemcClient.Get(id) ``` if there's no hit, it fetches from the database. ```go } else if err == memcache.ErrCacheMiss { // memcached miss, set up mongo connection log.Infof("Memcached miss: hotel_id == %v\n", id) session := db.MongoSession.Copy() defer session.Close() c := session.DB("rate-db").C("inventory") rate := new(pb.RatePlan) queryReq := bson.M{"hotelId": id} query := c.Find(queryReq) ``` ### Issues We encountered an issue here. We noticed that when set up and running, the `rate` service was restarting sometimes after simply visiting the frontend. ### Investigating We attempted to debug the `rate` microservice which proved to not be very simple but quite fun. ### Debuging a microservice Debugging the service locally means both the local service and the docker services have to be modified. The local service should connect to the MongoDB and Memcached databases (e.g `mongodb-rate:27017` should be changed to `localhost:27017`). The other services have to either be modified similarly and run locally (e.g `search` should connect to `localhost:8084` instead of `rate:8084`). ### Debugging straight from Docker Turns out there's a much easier way without modifying the actual programs which makes everything easier since the previous image doesn't need to be modified. #### Debugging with dlv Two things need to be done in order to debug a Go microservice running in Docker. The `docker-compose.yml` needs to be modified so that the `rate` service is specified as follows: ```yml rate: build: context: . args: - DB=mongodb container_name: 'hotel_app_rate' security_opt: - "seccomp:unconfined" cap_add: - SYS_PTRACE ports: - "8084:8084" - "4000:4000" command: dlv --listen=:4000 --headless=true --api-version=2 --accept-multiclient exec rate # it's path to your compile app in the container depends_on: - mongodb-rate - memcached-rate restart: always ``` It specifies that its image should be built with the `Dockerfile` in the current directory. Which needs to be modified so that dlv is installed, and so that the app is built and run using dlv: ```docker RUN go install github.com/go-delve/delve/cmd/dlv@latest ... RUN go build -tags ${DB} -gcflags="all=-N -l" ./cmd/... EXPOSE 40000 CMD ["dlv", "--listen=:40000", "--headless=true", "--api-version=2", "--accept-multiclient", "exec", "rate"] ``` That's it! Now using our favorite IDE we can connect to the running service and debug it normally by connecting to the 4000 port: ![](https://i.imgur.com/XWBd6ml.png) Visiting the frontend while having set a breakpoint in the GetRates function we can see that the rates of the hotels of IDs 5,3,1,6,2 are being requested. ![](https://i.imgur.com/8yT62av.png) We've broken up the query to MongoDB for easier debugging: ```go rate := new(pb.RatePlan) queryReq := bson.M{"hotelId": id} query := c.Find(queryReq) err := query.One(&rate) ``` Evaluating the expression `query.Count()` gives us zero: ![](https://i.imgur.com/DLSHyfP.png) Now this is a problem because the program assumes a `RatePlan` for every `hotelID` and if no record exists on MongoDB it fatally exits. Which explains the restarting. Now how could this happen? The data fed to the microservices is fed from json files in `data/medium/`. We can confirm that while there's records for hotelIDs 1 through 80 in `geo.josn` and `hotels.json`, the `inventory.json` only contains data for fewer hotelIDs, of which hotelIDs 5 and 6 are missing. For the purposes of the exercise we decided to, instead of fatally crashing or fundamentally changing the whole app, to insert a sample rate for when no data is found for a given hotelID: ```go err := query.One(&rate) if err != nil { log.Warnf("No rate for hotel id %d. returning sample rate ", id) rt := pb.RoomType{ BookableRate: 109, TotalRate: 124, TotalRateInclusive: 144, Code: "KNG", Currency: "", RoomDescription: "", } s := pb.RatePlan{ HotelId: id, Code: "RACK", InDate: "2015-04-09", OutDate: "2015-04-24", RoomType: &rt, } rate = &s } ratePlans = append(ratePlans, rate) ``` ### More Issues Inside `GetRates` again, for **hotelID 3** a rate does exist in MongoDB: ![](https://i.imgur.com/b1zvFKo.png) *(Data shown is from connecting to the mongodb microservice)* and *appears* to be found and assigned successfully to the `rate` variable and no error is reported. But only the field `Code` is non-empty: ![](https://i.imgur.com/OEHcU0T.png) *Execution is suspended at line 73 as shown above* Debuggning inside `func (q *Query) One(result interface{}) (err error) ` shows that the unmarshalling doesn't return an error because the `Code` field is filled but none other. That causes all sorts of other problems, considering that `HotelId` is empty and `RoomType`is nil. ![](https://i.imgur.com/7mEYFIA.png) Continuing the execution there's a fatalpanic: ![](https://i.imgur.com/HCpS8j8.png) Following the thread, it all begins in `GetRates` again, at the line where it sorts the rates: ```go func (s *Rate) GetRates(ctx context.Context, req *pb.Request) (*pb.Result, error) { ... sort.Sort(ratePlans) ... } ``` Which calls at some point the Less function: ```go func (r RatePlans) Less(i, j int) bool { return r[i].RoomType.TotalRate > r[j].RoomType.TotalRate } ``` In which the debugger shows something's very wrong ![](https://i.imgur.com/cFWDpKD.png) At this point we decided to run the benchmark anyway because there not enough time to investigate further and maybe there woudn't be much point to it: the MongoDB-based implementation does communicate with the database which would be reflected in the performance evaluation. Running the benchmarks generated not useful data. We tried running for variable threads, variable connections and variable request load. Many of the `wrk2` runs returned `-nanus` thread latency, other latencies were all over the place ranging from ms to seconds and back to ms inbetween runs. We include some of our generated data in the `data/mongodb` and `data/memdb` directories for their respective `rate` implementations.