Currently HP-H users are facing lot's of performance issues, affecting the system usablilty and preventing users from being able to work with the system.
## Problem Statement:
Users of HP-H are currently experiencing significant performance issues which are affecting system usability and hindering work processes.

## Detailed Issues:
Slow loading and occasional failure of Digital Lab tiles.
Although a direct connection shows slight improvement, reliability remains a concern.
Frequent errors arising due to failed connections with the cloud service (tiles-hub) that is responsible for serving tiles to users over the internet.
## What we did:
#### Server Resource Assessment:
Internet Speed: Adequate (Download: 3.5 Gbps, Upload: 1.3 Gbps).
CPU and RAM: Appear to be sufficient. Digital Lab consumes 12GB of RAM for caching.

#### Scale-Up cloud services:
We have doubled the allocated resources for our tile-hub service.
Although, the services were not showing any un-healthy status, we've doubled the allocated resources to ensure a high performance.
#### Load Testing
Conducted a load test on HPH for 10 concurrent users over 20 seconds, resulting in a rate of 368 tiles served per second via the Digital Lab server.
> The test was performed over the HP-H VPN during working hours.

#### Analyse logs and monitoring:
**Agent - logs**
We have many errors happening because of failing connection with the cloud service (tiles-hub)
This shows network connection error between the on-premise serivce (Agent) and the cloud (tiles-hub) service.
Since we can't see these failing requests on our logs, it might be related to blocked requests from the HP-H network infrasturcture.
```
2023-09-27 10:33:27.670 +02:00 [ERR] HubConnection reconnecting due to an error.
System.TimeoutException: Server timeout (5000,00ms) elapsed without receiving a message from the server.
2023-09-27 10:38:13.957 +02:00 [ERR] HubConnection reconnecting due to an error.
System.TimeoutException: Server timeout (5000,00ms) elapsed without receiving a message from the server.
2023-09-27 10:38:14.010 +02:00 [ERR] HubConnection reconnecting due to an error.
System.TimeoutException: Server timeout (5000,00ms) elapsed without receiving a message from the server.
2023-09-27 10:38:14.010 +02:00 [ERR] Failed to serve a tile
```
**Browser Logs (Direct - connection):**
Lot's of errors from client's browsers are being reported indicating the requests are not being received by the on-premise Digital Lab server in HPH (https://patho-hph.depot.pathozoom.com)

**Indexing logs**
There is a pottential correlation between the performance issues and the indexing of new slides.
This could happen for the following reasons:
- Network bandwidth being consumed.
- NAS resources being utilized.

## What we are doing:
- Reduce the load on the NAS.
- Better source directory enumeration (short-term fix).
- Implement incoming folder (long-term solution).
- Better handling for failing tiles:
When tiles serving from Agent fails, it creates lots of errors and exception this might be slightly affecting availablilty but shouldn't be the main factor for the performance issues..
- More monitoring