Try   HackMD

Prompt Engineering Framework

Introduction

What is prompt engineering?

  • Prompt engineering is the process of designing a set of queries to achieve a specific result using AI models.
    • What is prompt engineering good for?
      • Standarizing knowledge
        • Engineered prompts can serve as repeatable data processing pipelines that can be integrated with business processes.
      • Generating Insights
        • Engineered prompt pipelines can help with discovery of patterns and relationships between data points in context
        • Once standarized pipelines are created the process can enrich the context leading to new lead generation opportunities
      • Staying up to date
        • Prompt pipelines can run when data is updated or in a scheduled manner. furthermore live data can be included in order to trigger other components.

Prompt Engineering Best Practices

  • Remember GIGO (Garbage In, Garbage Out)

    • Prompt engineering is entirely dependent on the quality of the data fed to the model.
      • Unstructured, irrelevant or incomplete data can diminish the predictive performance of the model

      • Rich data and independent data sources can increase the performance and create emergent properties.

        • Emergent properties are features manifested by the model that were not explicitly developed but appear from the interaction of the model with the data.

The data processing pipeline.

In order to achieve consistend and repeatable results with prompt engineering we must structure the queries so that data is processed effectively and the correct context is applied for each individual query

  • Unstructured Data => Disambiguated Data => Data Embeddings => Context

  • (Data Intake)–––––(Data Preprocessing)––-(Data Procesisng)––(ReportGen)

    • Each step must achieve a specific goal and enable the following process to do the same thing. To achieve this each step must have its own solutions and models.

Data intake:

Data intake is the most important part of the process because of the GIGO principle. The model can only work with the insights already in the data and noise can achieve the opposite effect.

  • In order to minimize GIGO we must:
    • Clearly determine the shape of the data that will be given to the model.
      • Inconsistent data sources = Inconsistent results
      • Determine different variables to be analized
      • handle missing data

Data Preprocessing

  • Data Preprocessing takes in unstructured data and parses it into individual variables that will be used in our pipeline for multi modal analysis.

    Multi modal analysis benefits

    • Multi modal analysis allows for a greater depth of understanding of a given topic
      - Data modes are determined by available input.
      - Thread each mode into its specific pipeline
      - Individual threads are called on by an orchestrator model or in a predetermined sequence through the pipeline

Data processing

  • Once data has been taken in and split into individual processes it must be turned into embeddings for vector space storage
    • This is handled by existing pipelines standarized by the langchain library
    • For multi modal analysis several data stores must be created

Orchestration:

  • Once the number of contexts increases beyond 1 there's a need to think about orchestrating the models in order to achieve the desired result.

Approaches to orchestration:

  • Sequential models:

    • A langchain chain is as the name implies a series of models which use the output of a previous step as their input. This allows for "live" knowledge to be created as an original prompt is mutated into different outputs as it goes through the chain.
    • Langchains allow for pipelines with predefined outputs to be embedded into software processes.
      • By integrating reactivity into the model outputs we can use the results of a query to trigrer effects in the front end or database and then mutate data that can be used as a followup query.
        • By chaining these processes we can create AI enhanced functions and outputs for our code, beyond the simple chat modal.
  • Agent based models
    Agent based models introduce a further level of complexity to the system by allowing an Orchestrator AI to initiate determined Agents that act as subprocesses.

    • By creating attention layers models can form more complex reasoning and thoughts around the data being created and therefore produce more meaningful insights in theory.
      • In practice in order to have a good orchestrator agent the individual subprocesses must be validated before adding extra layers of complexity

Thinking in sets

  • When designing data structures it is important to understand that most of the added value comes trom the aggregation of multiple entries to the database. With a properly aggregated context new entries can increase the predictive potential of the model significantly