Prompt Engineering Framework
Introduction
What is prompt engineering?
- Prompt engineering is the process of designing a set of queries to achieve a specific result using AI models.
- What is prompt engineering good for?
- Standarizing knowledge
- Engineered prompts can serve as repeatable data processing pipelines that can be integrated with business processes.
- Generating Insights
- Engineered prompt pipelines can help with discovery of patterns and relationships between data points in context
- Once standarized pipelines are created the process can enrich the context leading to new lead generation opportunities
- Staying up to date
- Prompt pipelines can run when data is updated or in a scheduled manner. furthermore live data can be included in order to trigger other components.
Prompt Engineering Best Practices
-
Remember GIGO (Garbage In, Garbage Out)
- Prompt engineering is entirely dependent on the quality of the data fed to the model.
-
Unstructured, irrelevant or incomplete data can diminish the predictive performance of the model
-
Rich data and independent data sources can increase the performance and create emergent properties.
- Emergent properties are features manifested by the model that were not explicitly developed but appear from the interaction of the model with the data.
The data processing pipeline.
In order to achieve consistend and repeatable results with prompt engineering we must structure the queries so that data is processed effectively and the correct context is applied for each individual query
Data intake:
Data intake is the most important part of the process because of the GIGO principle. The model can only work with the insights already in the data and noise can achieve the opposite effect.
- In order to minimize GIGO we must:
- Clearly determine the shape of the data that will be given to the model.
- Inconsistent data sources = Inconsistent results
- Determine different variables to be analized
- handle missing data
Data Preprocessing
Data processing
- Once data has been taken in and split into individual processes it must be turned into embeddings for vector space storage
- This is handled by existing pipelines standarized by the langchain library
- For multi modal analysis several data stores must be created
Orchestration:
- Once the number of contexts increases beyond 1 there's a need to think about orchestrating the models in order to achieve the desired result.
Approaches to orchestration:
Thinking in sets
- When designing data structures it is important to understand that most of the added value comes trom the aggregation of multiple entries to the database. With a properly aggregated context new entries can increase the predictive potential of the model significantly