# Planning for the future of Post Award's technical infrastructure
## Context
### What we do in Post Award
The Post Award services are focused around gathering M&E data from grant recipients in a consistent cross-fund-compatible format, then making it available to DLUHC users so that it can be analysed to assess how effective the fund has been.
Funds have historically gathered M&E data in different ways and asked different questions, limiting the ability to compare the effectiveness of funding between funds. In Post Award, one of our primary goals should be to prevent the collection of this 'messy' data and our services must support this.
### Implementation detail
The current Post Award technical (app) infrastructure roughly follows a microservices-based approach, and is split in the following way:
- `data-store`, which is a Python Flask app exposing an API which fronts the main postgres database.
- `submit`, which is a Python Flask app serving HTML to users that need to submit M&E reporting via excel spreadsheets.
- `data-frontend`, which is a Python Flask app serving HTML to internal DLUHC users that want to access M&E data to analyse the effectiveness of our funding programmes and projects.
These are the three core Post Award apps. We make use of a number of other components that are broadly shared with Pre Award.
- `authenticator`, which is a Python Flask app exposing an API for authenticating users via Microsoft AD.
- `account-store`, which is a Python Flask app that generates and returns data about user accounts.
- `funding-service-design-utils`, which is a shared utils library for common functionality
---
Historically, M&E data has been collected by funding teams using Excel Spreadsheets with limited consistency of structure and questions across funds. To develop our service, `submit` has been built with the goal of flexibly handling submissions via Excel spreadsheets. However, each fund that we support will need us to build a custom schema for the fund's spreadsheet, and despite best efforts to create a generalised system, a lot of custom code and transformation logic has been built for the two funds we currently support (Towns Fund and Pathfinders). This will be a limiting factor in us onboarding future funds quickly, potentially preventing us from scaling to meet the needs of the department.
As a team, we have broadly agreed that we need to stop supporting submission of the full set of M&E data via spreadsheets and instead look to move to a 'hybrid' model. The hybrid model will seek to obtain as much of the reporting data as possible via web forms that can provide a better user experience via improved validation, a simpler and more structured journey, and improved data quality. We believe some of the data will still need (or be appropriate) to be collected via spreadsheet, primarily the financial sections that are most easily represented as tabular data. There are limited existing patterns for collecting tabular data via web forms and Excel allows more advanced functionality, such as formulas, for calculating data that some users find useful. The collection of this tabular data should be much simpler in nature than what we currently support through the current spreadsheet processing pipelines.
### Report progress on funded projects (RFP)
A lot of recent design work has focused on the next iteration of submitting M&E data. This has mainly been represented as a new submission patterns service, tentatively called 'Report progress on funded projects'.
https://www.figma.com/file/2dH3cosIBd2Fr7LUFvJBZN/%2F%2F-Submit-%2F%2F-Submit-Monitoring-Data?type=design&node-id=1871%3A148479&mode=design&t=5nVUXGji5mEggsNo-1
### Considerations for the future
General non-functional requirements:
- Minimise speed to onboard new funds
- Minimise development overhead
- Minimise the cost of making mistakes
- Maximise flexibility so that we can iterate quickly
- Minimise the amount of boilerplate required - focus on supporting business logic over building out frameworks/re-implementing solved problems
Specific considerations:
- Can we make use of the form builder/form runner that Pre Award already use? How would we effectively use our data standards / enforce a single source of truth (eg for M&E questions)?
- Our integration with Microsoft AD seems to be a significant pain point. Managing users and permissions (access to submit for funds) is slow, 2FA/user account supp
Group & Role management is done via DLUHC IT which introduces lead times outside of our control. While getting 2fa/user auth for 'free' is nice, the user experience seems poor and we have limited ability to customise it.
However, Entra ID is the preferred option of DLUHC for auth unless we can justify using something else.
## Possible future paths
### 1. Continue work on the `submit` microservice.
We could extend this service further to support two modes of submission:
- full spreadsheet
- hybrid
We would probably add some flag/data on funds to say which of the two flows they go through and pivot users down each journey depending on which fund they're submitting for.
This probably works quite well if we integrate it with the form builder, as we can use that to gather the full JSON blob that acts as a 'submission', and then just do the processing of that JSON on the submit service.
#### Pros
- No new microservices to maintain
- Possibility to re-use parts of the existing spreadsheet ETL pipelines.
- {{more stuff here}
#### Cons
- Implicit support for the spreadsheet flow may encourage or allow business to compromise with future funds. May discourage migrating those funds to the new/preferred hybrid model.
- A lot of boilerplate code to build out when working across separate api+frontend services to shuffle data back and forth. Additional considerations when dealing with eg user permissions.
- {{more stuff here}
### 2. Build a new Flask frontend microservice for RFP
Follow on with the existing 'microservices' architecture, but create a clear separation of concerns between the existing 'submit' which handles the full spreadsheet flow and the new 'hybrid' RFP approach that is primarily based around web forms.
#### Pros
- We have a clean slate to build the new iteration of submit from, not restricted or influenced by the existing spreadsheet flow.
- We create a clear separation between the old and new flows, which may help us justify not doing too much additional feature work on the old one. Essentially it moves to some kind of maintenance mode.
#### Neutral
- Unable to re-use (without duplicating) any of the existing spreadsheet ETL pipelines/logic.
#### Cons
- We have another microservice to maintain, which increases the cost of chore work - updating dependencies, keeping things in sync, building and integrating API endpoints, testing, deployment pipelines, real cost of servers/infrastructure (albeit a tiny factor).
### 3. Migrate existing microservices to a Flask monolith, extend from there
There is some evidence that a microservices-based approach was adopted prematurely in FSD, and that has an ongoing cost:
- REST API endpoints have to be designed carefully and built out, network latency is introduced between requests.
- Duplicate sets of dependencies have to be tracked and updated for eg security patches.
- Multiple pipelines have to be maintained.
- Local development tends to be worse, with limited intellisense/etc between repositories.
- Technical complexity increases, eg logging and tracing between applications when investigating and debugging issues.
- etc
#### Pros
- A monolith generally provides more flexibility with development of new features.
- Less need for complex technical deployment processes, distributed tracing, etc.
- Improved developer experience should mean easier onboarding of new developers and less boilerplate/glue overhead.
- Requests will be served (slightly) faster due to removal of network overhead within a single request.
- Retain existing DB/data model/data. A lot of work has gone into this.
#### Cons
- A chunk of work needs to take place to combine data-store/data-frontend/submit. This needs scoping out for feasibility and the best approach to take.
- We need to come up with an appropriate migration plan, which may either include a complex zero-downtime plan or agreeing some planned downtime with the business (eg to migrate DNS, VPCs, etc).
- We will be migrating the existing spreadsheet ETL pipelines, which may mean we are implicitly supporting that code. It could lead to business compromise with future funds to use more of that logic than we would ideally like.
- Harder to split out public API in future, as no clear separation
- Need to re-implement developer admin tasks which are currently segregated by AWS access e.g re-ingest/download source spreadsheet downloads
- Would probably require another IT healthcheck/TDA approval due to change of public/admin interfaces
### 4. Move existing microservices to maintenance mode, build new Flask monolith
I don't think I know a good reason why we'd do this. Maybe it makes sense if:
- we think combining the existing apps into a monolith would be too difficult
- we really want to start from scratch
#### Pros
- Create a clear separation between the spreadsheet flow and the new 'hybrid' approach.
- Re-use existing technologies should mean we can easily apply best practice/lessons learned and no need to learn a new framework.
- Access to govuk-frontend-jinja and govuk-frontend-wtforms packages, which are well supported and tested for building Flask apps with GOV.UK Frontend components and styling.
#### Cons
- We'd need to reproduce the data-store DB model. Do we have any reason to do this? Do we think it's not fit for the future?
- Would need to migrate existing data-store DB to the new database
- We may end up building a lot of boilerplate/glue features, or integrate a number of third-party packages to manage things like users, permissions, admin panels, etc. Many plugins for things around this exist, but are in various states of activity and maintenance.
### 5. Move existing microservices to maintenance mode, build new Django monolith
There is some indication that in the future we are likely to need to manage users/roles/groups/permissions in the new hybrid/web form approach. This is because user testing has shown that different people may need to interact with the RFP service, and some people are responsible for different parts of M&E reporting data. There may be people that fill in just the admin section, or outcomes+outputs, or financial data, or people who are allowed to sign off the entire submission as accurate+truthful. Django takes a 'batteries-included' approach vs Flask, so it has things like user models, authentication, forms, admin panels etc baked in. This can provide a lot of power for very little developer work.
#### Pros
- Django has a lot of baked-in 'answers' to common business requirements like user auth and admin panels.
- All of the previously-mentioned monolith benefits around reduced developer overhead, better developer experience, faster requests, easier debugging, etc.
- Django is generally a more popular and more used framework - recruiting and onboarding new devs may be easier.
#### Cons
- There is nothing actively maintained like `govuk-frontend-wtforms` for Django to easily build GOV.UK Frontend-styled forms in Django. There is a historical attempt by MoJ but it is now archived and may not be compatible with the latest Django. We may need to build this ourselves.
- The team is less familiar with Django, which takes a very structured approach to building services that is different to Flask's more freeform nature. It would take some time for the team to learn this.
### 6. Move existing microservices to maintenance mode, build new microservices
I don't think any of us think this is worth exploring so I'm not even going to go there.
### 7. Use Lamdas to process webhooks from form-builder
We could use this as an opportunity to try serverless architecture patterns which were not available on PaaS to do a lightweight integration with the form-runner
## Pros
* Possibility of lightweight integration with the form-builder
* Encapsulation of form-builder specific logic
* Webhooks suits event driven nature of Lambda
## Cons
* If writing directly to data-store would need to replicate much of its logic in the Lambda to access the DB?
* New paradigm would have to have approaches for logging/monitoring etc
* Inconsistent with most of rest of tech stack across pre & post award (Although pre-award has one Lambda currently)
* Possibly complex technical considerations to integrate this with the development workflow.