*World Tree Studios*
*Aug 16, 2023*
SOW for creation.space, build-out of the **AI-chaining Workflow Execution Engine**
*[Previous document](https://hackmd.io/hyZd9xT4QFKpbECRDvANDQ?view)*
# Intention
Make a robust system for chaining multiple AI / media transformation operations together.
Initial functionality should be a hard-coded chain that emulates a writing style from social media posts, parses news, and generates social media posts on that news in the given writing style.
This gives us the opportunity to create something useful in a short amount of time, and at the same time make progress on the pipeline execution feature.
## Social Media Post Generator
- User enters profiles on various social media platforms, and system emulates the voice / writing style of those accounts.
- System monitors various news sources, parses articles, and generates social media posts in the voice of the user
## AI Transformation Task Chain
These operations will be called via remote HTTP-based API calls. They will are all asynchronous, and have possible failure states.
Each service should be able to have API keys. Payment amounts should be tracked.
Functionality for retrying should be included; if any operation is not idempotent we should have special dispensation for resetting to some initial state.
A "task" is a single operation, with various statuses: e.g. not started, in progress, complete, fail.
Each operation has one or more inputs and a single output. (Update this spec if we determine there are operations with multiple outputs.)
A "pipeline" or "job" is a series of tasks connected to each other. The output of one job may be piped to the input of one or more succeeding jobs.
Job has the number of tasks within it, and status of each task.
# Plan
Notes
- Llamahub, Llama index - pull in data from various sources
- Compile posts and prompt for voice
- Ask user for "style" of post (witty, funny, etc)
- Prompt includes the article to post on
- Output the post
## Running a job
The job is selected by the user. This can be through landing at a specific URL or selecting from a list.
A job starts with its input. Initial inputs may come from the user. It should be able to come from typing, microphone, video camera, or uploaded picture, video, or audio file.
Providing the initial input, the pipeline is run. User should be able to view the status of the job run as it runs.
## The Builder: Defining Jobs
Power users should be able to define their own pipelines.
### Pipeline Code Repo
My recommendation here is to start with a repository of pipeline code, and have each pipeline be in a named folder with code instructions for chaining together the tasks of the pipeline. This allows us to arrive at the maximum utility for power users without up-front effort defining a user interface to chain together jobs.
The official repository should accept pull requests, which are vetted by the staff for security.
### Builder UI
We start with an experimental UI spike, cap the # of hours, using GPT to generate vector-based workflow creation engine, to implement a "low-code" builder user interface that generates the pipelines.
## Future Work
Add additional metadata for particular job statuses.
- "in progress" can have uploaded bytes, total size, number of files, percent complete, etc.
- "fail" can include retry count.
# Required Pieces
Gather a list of initial root tasks / API calls (Taino has this list)
- What API Call & how to invoke it
- What inputs are required
- What output is rendered
## Infrastructure
Which service will run these background jobs?
### Compute for Background Execution
We can choose a FaaS type of system that handles long-running tasks, async task switching, and does not have the "cold start" problem.
The other main option is operating a VPS server. This may compare favorably in terms of cost, but have some maintenance overhead.
Research will need to be done to choose an appropriate system.
### Task / Pipeline State Storage
A data store will be needed to enqueue a task, track its state, and point to where the intermediate value is saved. We will use our main DB for this. We need to define the table structure and queries.
Supabase can stream data updates via websocket for realtime status tracking. Research will need to be done to determine an existing library to use to manage job and tasks, their statuses, and retries.
### Data Storage
The output from a task should be saved somewhere. Text can be saved in the data store. Binary files should be saved in a blob store such as S3.
## Cost Analysis
Include cost for each AI API call, to be able to calculate approximate cost for a particular pipeline. Also factor in infrastructure cost, storage cost.
## Proposed set of Task types
Choose a subset of Replicate models.
Text -> Text
Text -> Image
Image -> Image
Image -> Text
Audio -> Text
Text -> Audio
Video -> Text
Text -> Video
Video -> Audio
Image -> Video
### Text Transformation: Prompt templates
Curate a set of prompt templates that can take structured input.
Start with a single blob of text. Later take multiple labeled text fields.
# Implementation
## Social Media Post Generator
- User pastes in top samples of writings in "their voice".
- Keep track of total number of tokens, with a max.
- Pull in news from defined sources.
- Summarize, present user with options of news to post about.
- Create a post about a particular
### Database Storage
- Data model to store user voice posts.
- Data model to store news articles, and generated summaries.
- Store generated posts.
### UI
Post Generation dashboard for users to review news summaries, choose which ones to post on.
## Pipeline
### Pipeline Data Structure
- Define the DB table Structure for saving a Pipeline Definition.
- Query the DB tables and build up an in-memory data structure for each Pipeline (the Pipeline Definition).
- Data structure should contain everything needed to call the task, receive the output, reference the output for the next task, and/or surface the output as a final product.
- Save intermediate state of a particular Pipeline Run: what stages have been run, status of each, inputs and outputs of each task.
## Pipeline Code Parser
- Loop over each pipeline folder, marking name of each for the pipeline name
- Parse code and run it. The function calls should create an in-memory data structure defining the pipeline (see above). This should be persisted into the database. The repo can be parsed on a regular basis.
- Do a "mirror" type of sync. If a folder has been renamed or removed, remove the pipeline. Warning: if UI builder is active, repo will not contain a complete list of pipelines, so rework the deletion logic.
- Define a set of functions that, when called, build up the right Pipeline Definition.
- Parse out the code in the repo, inject it into an execution environment that contains the right libraries.
## Pipeline "low-code" Builder
(Determine if there is an appropriate open source flow-based programming model to plug in here. So far my research indicates that there is not an easy-to-modify drop-in system here.)
- Vector-based UI. Determine whether to draw on Canvas or Vector-on-DOM.
- Draw the task as a box: inputs on left, output on right.
- Connect tasks together according to Pipeline Definition data structure. Draw Lines, add Bezier Curves.
- Drag Task Boxes to different points on the canvas. Extend the canvas size when dragging to the edge.
- Scroll around the canvas with the mouse and/or via dragging.
- Drag from Output circle to Input circle, and update the Pipeline Definition accordingly.
- Future: Interactively execute Tasks and view progress on this page
- Alternative UI: [Composer](https://www.composer.trade/) (scroll down to the code editor), [Scratch](https://scratch.mit.edu/projects/editor/?tutorial=getStarted)
## Pipeline Execution Engine
- Long-running, pull from a queue
- Or keep its own queue and allow triggering via authenticated HTTP call
## Infrastructure
- Task manager library (bull etc)
- Data storage engine for Pipeline Definitions.
- Data storage engine for Pipeline Execution State: queue, current status
- Compute provider: Vercel functions, Netlify functions, Cloudflare Workers, or VPS (Digital Ocean, AWS, GCloud).
- Cron job: Trigger execution on a time-based interval, and/or in response to a database change.
# Challenges
- Context length: including voice and article both in the prompt will almost certainly overrun the context size, leading to sub-optimal behavior.
## Possible Solutions:
- Track number of tokens in input prompt. Use longer context model if necessary.
- Have user choose best posts/articles, so it doesn't go over N tokens.
## Follow-up Features
- Parse social media directly, pull in user-generated content to parse into a "voice".
# Requests
- Need set of news sources to parse.
- Scrape major prompt template providers for template sets.
# Work
These estimated hours are slightly optimistic given that we'll be trying to automate as much as possible with AI tools.
## Social Media Post Generator
RSS Parser
Dashboard UI: Place for users to paste in their "voice" posts. Place for user to review relevant news.
## Pipeline Definition
Create the pipeline definition data structure, and create the tables and the queries to store in database. Queries for loading records from the DB, parse the data structure into memory: 11 hrs
## Pipeline Execution Engine
Run the pipeline specified in the Pipeline Definition. Define initial inputs, keep track of state of each run, retry if necessary, pipe output from previous run to the next task: 25 hours
## Path 1: Builder UI Spike
Prototype a drag & drop interface: 18 hours max
- Review work product at the halfway mark and after this time-boxed work, and determine whether to pivot to Pipeline Code Repo.
- If we stay with this path, use the hours from Code Repo to complete this Builder UI.
Design, Design Review, Re-styling of prototype for Drag & Drop interface: 18 hrs
## Path 2: Pipeline Code Repo
Create a working source code repository, and code format, for defining jobs. Create execution environment, implement a Pipeline Parser to render the repo into pipeline objects saved in our DB, write up documentation for usage: 18 hrs
## Infrastructure
Research, choose, and deploy infrastructure. Choose and deploy compute platform, cron job, remote triggering mechanism, software installation: 11 hrs
Observability Platform: Dashboard to see how many jobs in progress, cost per job, which ones failed: 10 hours
## Prompting
Find sources of prompts, collect set of prompt templates: 3 hrs
# Totals
My rate has been $300/hr. However, if there is willingness to include in-kind equity, I can go as low as $200/hr.
11 + 27 + 18 + 10 + 18 + 11 + 10 + 3 = 108 hrs
108 hrs * $200/hr = $21600
# Milestones
1. Pipeline Definition stored in DB and loaded into data structure; Execution Engine able to run the pipeline.
2. Pipeline Creation: ability for power users to create custom pipelines (either via UI builder or code repository).
3. Pipeline Infrastructure: Auto-run pipelines with background execution, Observability Platform, Prompt Templates.
## Payment
Suggest: 25% up front, and 25% upon completion of each milestone.