NBI v3.0 MVPs -- "one" pager (tldr;)

## Abstract *NBI* as a product has some shortcomings notably the difficult configuration setup, slow content correction turnaround, the split brain components problem, the existence of 2 parallel engines to compute recommendations, the high data processing cost, sub par whole decision making, lack of adaptability to engagement or effectiveness, etc. The following architecture explanation talks about the proposed v1.0 and v1.1 MVP architecture which should help remedy at least 2 of the problems (two parallel engine, high data processing cost) in the best possible way and negotiate another 2 (configuration setup problem, content correction turnaround) problems to the best of possibility before perhaps needing a slight extension in a later architecture. In the following sections, we will talk about what each of the problems are in brief before talking about how the respective problems can be attacked independently without losing the ability to merge the solutions together in a phased manner into something singular and complete v1.x, and we shall talk about that in the concluding section. This [drawIO](https://app.diagrams.net/#G1uM_IMGTSSJWHNZ9Ao3JfbpUxtISof_rR) should help create the picture. ## The problems Following are the set of problems that is targetted in MVP 1.x as well as their proposed independent solution. ### 1. Configuration setup Currently an *NBI* is an abstracted data structure which represents a relevant _insights_ coupled with a logically relevant _action_. Configuring an *NBI* then becomes a problem of configuring the "assets" (message language and images) and product properties of each _action_, _insight_ and *NBI*. We currently use "smart" google sheets to help configure all of these. But the problem with a such a unified configuration aspect is the inability to do role based access management e.g. only letting __PM__ tweak all the properties, while letting __TPM / DM__ configure the content only, etc.. There is another big problem of that of sheet management due to keeping too many versions (for above access separation) or due to too many touches. The lack of proper checks in such a sheet also contributes to the overall problem. Not to mention, the sheet itself is a mamoth, which makes it difficult to read, edit, explain as well as makes it vulnerable to errors. A proposed solution to these problems is to develop a feature on _LaunchPad (LP)_ to enable us configure these therein. A simple solution would be to allow us to ingest the current *NBI* master sheet (with all possible combinations of *NBIs*) into _LP_ database as a master config list and allow a __DM__ to setup a pilot with valid NBI objective filter (EE, LeadGen, Paper) and then allow them to edit each contained _NBIs_, _actions_ and _insights_. This would also mean the ability to do a quick check on the validity of these components (e.g. every _action_ should have a benefit score). The [NBI metadata and model management PRD](https://docs.google.com/document/d/1htggxgpFg9gye12wXy6AZmIIvBJzQmz03OqZUr2kd3g/edit) talks about a couple of phases to solve some of these problems, primarily massiveness and correctness. That should be a good starting point, albeit the approach could be different. ### 2. Content correction turnaround Another important requirement from the same _LP_ based solution would be to have allow the user (__DM__) to have the ability to render the *NBI* emails (for some "mock" combinations) and HER sections based on the pilot configuration. One important workflow could be to allow the user to export these _display contents_ as images, _HTML content_ or even directly as _FIGMAs_ so as to allow quick correction turnaround time with __TPM__ and __the client__. Even better would be if this flow is automated when the PR for changes is merged. ### 3. Two parallel engines We currently have a couple of engines, *NBI* engine and the *Reco Engine*, which does the very similar work of calculating relevant insights for a user, matching them with best possible user actions (a.k.a recommendations) before eventually sorting them by relevance and freshness. From these, the backend process can then cherry pick out the best $n$ of these based on some objectives. This existence of parallelism can potentially double the cost of maintenance, setup and execution. Also it could potentially contradict each other, which makes it all the more riskier. A proposed solution to this problem is to let the *NBI* engine do work for both the needs. This would need 2 preparation things: - A formal document describing the gaps (and fixes) between the 2 engines so as to enable migration to *NBI* engine - Rework of *NBI* data preparation task to make it more reliable and cost effective. We know from past that most actions / recommendations are already into *NBI* engine. The biggest gap could be the insights supported by *NBI* and the algorithm used for ranking the relevancy, both of which can be eased into the MVP without being too fussy. One key task involved in the migration activity would be to create a recommendation entry into the already complicated *NBI* sheet as well as backfill / migrate the *NBI* definition for recommendations for all existing pilots which are to be migrated (which should be recommended for maintainability). ### 4. High data processing cost The rework of *NBI* data preparations and execution is needed not just for reliability but also for improving overall cost per user per home. Currently we spend as much as 3-4 cents per user per home per year. This number should ideally go down to as much as 1 cent per home per year or even less. Apart from reliability, we'd also need to answer some questions such as - Is it OK if the *NBI* computations are not done immediately on a survey submission OR are delayed by say 24 hours after the disagg computations? - Would we need to wait for *Hybrid 2.0* implementation OR do we need to run Hybrid engine (in the appliance disaggregation flow) anyways in the system post migration. - How do we trigger notification processor after running NBIs? - Also how to do we continue to do the CDC which were doing before? Most of these answers should be something similar to the questions asked when designing cost aggregation for TOU rate plan. But some other questions might be newer. Cost per home is also a critical aspect of data preparation and execution engine. We would need to do a good bit of data engineering optimizations ranging from revamped dataset partitioning strategy to revamped UDF execution strategy to ensuring that the generated code is vectorized. This would also need to be formalized before taking any further actions. Between all the activities needed to execute and erase problem 3 and 4, the following are the key tasks needed (in no particular order but hopefully in it) - creating a suitable testing repertoire for reco migration and execution engine migration - moving current recos to NBI engine (config migration and LP rework) - working on *NBI* execution optimization (optimization) - cleaning up of unnecessary data pipeline processes and moving them into new engine (code migration) - working on post migration data pipeline fixes (ETL fixes) - creating pre and post migration report of engine relevancy ## Conclusion In order to ensure backward compatibility and parity, we would want to keep the specifications for each *NBI* component same as before. Assuming we do that, most of the tasks mentioned above can be done phase-wise independently. We propose to split the MVP 1.x into 2 versions. ### v1.0 This could only have the *LP* based configuration setup and content management aspects. i.e. Problem 1 and 2. As mentioned before, this would not be complete set changes needed on *LP*. *LP* would need further changes in the future when - supporting reco migration, we would need further backward compatibility with already supported reco on *LP* - supporting new reformed NBI engine 3.0 which would be able to auto find all relevant actions for user insights (and apply other NBI related content logic in it) as well as support parameterization. Another aspect of this version is that we could work on formalizing developments to be undertaken in v1.1 in this "quarter", e.g. formalize the reco engine migration strategy and data preparation abstraction and optimization strategy. ### v1.1 This would entail, as mentioned just moments ago, working on the Reco engine migration to NBI engine as well as moving the data pipeline actions, run post disagg, into an "optimized" NBI data preparation engine. As part of execution strategy, I am suggesting a consecutive 2 month development cycle to work on these MVPs. e.g. Deliver v1.0 in 2 months and v1.1 in 4 months and remaining in 6 months. ## Future Some of the things that will remain and would need to be handled in the future are - a better self evaluating adaptable NBI engine - an improved ranking engine to make better decisions and unify "NBI" activities - Billing Cost Optimizer engine - LLM based data exploration engine (maybe far-fetched) ## References 1. PRD - NBI Metadata and Model Management [(link)](https://docs.google.com/document/d/1htggxgpFg9gye12wXy6AZmIIvBJzQmz03OqZUr2kd3g/edit) 2. NBI Phases architecture diagram [(link)](https://app.diagrams.net/#G1uM_IMGTSSJWHNZ9Ao3JfbpUxtISof_rR)