Prediction Guard Reference Information

# Intel Innovation - Prediction Guard Booth Briefing ## Goal of the conversations (1) Show people the demo, explain Prediction Guard, and answer questions (2) Determine if the person is an ideal customer (3) If they are an ideal customer, say something like: - It sounds like Prediction Guard is able to solve some of the challenges you are facing in bringing your LLM use case to production - Let me grab your contact information so we can follow up with next steps - We recommend having a follow up call to discuss your use case a bit more and scope out a potential engagement - Try to schedule something using [my scheduling link](https://savvycal.com/datadan/cad3eb63) on the spot. Otherwise just try to get their contact info. Alternatively, you can have them fill out the enteprise inquiry form on our site. If you do schedule on the spot, make sure you take down some kind of note about them (to give context to the call). You can take this down on a notepad or directly put it in notion. If they ask, the engagement could look like a pilot project that we work on together (before scaling up to more) or just getting them access to and support in using the platform. ## Ideal Customer Profile/ Story Our ideal customer: - Works at a mid to large size company - Is feeling pressure (e.g., because of a clear market opportunity, FOMO, or board pressure) to implement an "AI solution" of some kind - Is wanting to put $100k+ into such a solution - Has tried prototyping a solution with ChatGPT or OpenAI's platform (or similar closed AI platforms like Cohere or Anthropic) - Sees the value in what has been prototyped, but also sees risks related to unexpected/harmful outputs from models, leaks of private data, unstructured model outputs, model "lock-in", and/or cost - Is frustrated because they have high paid, senior technical staff that aren't sure how to solve this problem and get reliable AI solutions into production Questions to tease this out: - Has your company deployed any Large Language Model (LLM) solutions to production yet? - Tell me a little bit about your vision for how AI can be applied within your company. - What are the main challenges keeping you from releasing LLM-based solutions at scale? - Have you tried to prototype any LLM solutions? - When are you wanting to get your first or next LLM solution into production? - How are you dealing with model hallucination and unreliability? - Does your legal and/or security team have hesitations about using AI? Have they blocked internal uses of things like ChatGPT? Why? ## The Prediction Guard Pitch Tired of seeing other take advantage of sophisticated AI tech without knowing how to reliably integrate it into your own company? Frustrated that your high-paid, senior technical staff doesn't seem to know how to integrate and scale LLM solutions? Feeling like your cool AI prototypes are blocked because of concerns from your legal or security teams? We are helping companies overcome the hurdles they are facing as they bring LLM-driven applications into production. Specifically, we provide an easy-to-use, privacy conserving platform that lets you get reliable and safe outputs from the latest Large Language Models (LLMs). ## Specific Prediction Guard Features (see and point to the poster on the demo stand) 1. Get output from a whole a variety of LLMs hosted in a privacy-conserving manner 2. Control LLM outputs to force specified types (integer, categorical, boolean, etc.) and structures (JSON) 3. Check the outputs of models for factuality, toxicity, and consistency 4. Filter LLM inputs for PII, sensitive information, and prompt injection vulnerabilities 5. Integrate safe, controlled LLMs with the best tooling from the AI community (LangChain, LlamaIndex, etc.) This functionality is made available via a simple REST API and Python client that is compatible with OpenAI's client for easy migration from OpenAI prototypes. ## Some example use cases that we are seeing for Prediction Guard - **Data or Information Extraction from Unstructured Data** - Extracting patient information from transcriptions of medical conversations to automatically fill out medical forms. Annotating a large knowledge-base of training documents with metadata (title, document category, etc.). Extracting commodity bits from emails and company communications. - **Chat or Question Answering Over Private, Company Data** - Answering technical information from equipment manuals and SOPs. Answering ecommerce customer questions based on policy documents. - **Content Generation** - Generating product description tags to improve SEO. Generating summaries of webinars for better visibility and search. Generating ecommerce copy. - **Natural Language Based Automation** - Detecting language translation work intents and automating machine translation and quality checking. ## Things to highlight if they come up - We are HIPAA compliant - We have deployed our production system on Intel's Gaudi2 processors in Intel Developer Cloud (and we are working directly with Intel engineers to optimize the system) - Our API can be deployed in a customer's own infrastructure. However, most of our customers opt to have us host the platform, because we do not store any customer data and we have third party compliance monitoring in place (e.g., for HIPAA) ## FAQs - **Q: How do you ensure privacy?** - A: We don't store any prompt data that you send to Prediction Guard and all data is sent over encrypted connections. In cases where you need a completely air-gapped system, we are happy to discuss how you can host the prediction guard system in your own infrastructure. - **Q: How does the factuality check work?** - A: We use multiple (6 to be specific) different peer-reviewed methods for checking factual consistency. These are model-based methods that use language models trained specifically to detect factual inconsistencies between a candidate text and a reference text. - **Q: What is toxicity and how do you detect it?** - A: We use a language model fine-tuned on toxicity detection datasets to score the toxicity of LLM outputs. - **Q: How does the system check consistency?** - A: We call a model concurrently multiple times and compare the outputs to make sure they are consistent. This is called self-consistency sampling and has been shown to de-risk LLM calls. - **Q: Can you call multiple models at once?** - A: Yes - **Q: Can you send multiple prompts at once?** - A: Yes - **Q: What models do you implement?** - A: We currently have our own implementations of most of the popular open LLMs, including Llama 2 from Meta, and popular fine-tunes of those models like Nous-Hermes-Llama2 and WizardCoder. - **Q: Can I use OpenAI GPT models with Prediction Guard?** - A: Yes, you can optionally call OpenAI GPT models using the Prediction Guard API. This wraps the OpenAI calls with the safeguards and controls common to all Prediction Guard models. - **Q: Can I bring my own model or use Prediction Guard to fine-tune my own model?** - A: We have worked with a couple of clients to integrate custom models into the API. Right now that is possible, but the functionality isn't exposed in the production API. If you are interested, we would love to hear more help make this happen. We also generally recommend that you try some of the pre-trained LLMs in Prediction Guard first with techniques like Retrieval Augmented Generation (RAG), as this can be very powerful! - **Q: So how is this different from outlines, guardrails, or guidance?** - A: These open source packages provide some really amazing control mechanisms for LLMs, which in certain cases overlap with our feature set. However, these open source projects are, just that, open source projects. They don't provide hosting or integration of specific models, and certain problems like factuality, consistency, etc. still need to be addressed outside of what these packages offer. The tooling can get complicated fast, and we are making all of this a no-brainer with a turn key solution. - **Q: So how is this different from LangChain?** - A: We love LangChain and have an official integration with LangChain merged into the project! LangChain provides convenience around orchestrating calls to LLMs, and Prediction Guard enables safe and trustworthy calls to LLMs. A match made in heaven! - **Q - Do you support image or audio inputs** - A: We do not currently support image or audio inputs in the main Prediction Guard API. However, we are (via professional services) helping a couple of customers integrate audio and image inputs into multimodal LLM applications. ## Pricing If people ask about pricing, I normally say something to the effect of: - *"Sure, I'm happy to give you a a sense of our pricing. We don't charge per token input (similar to OpenAI or Cohere) as there isn't a way to scale that reasonably to enterprise use cases like information extraction over thousands of documents."* - *"We charge a fixed fee (usually via an annual enterprise license) based on a volume cap of requests that makes sense for your company"* - If they are pushing for an actual number, right now you can say: - For example, for a monthly volume cap of 100k requests, we are charging $3500/month plus whatever services/ support you are interested in. - *"Then we can scope AI engineering and support services on top of that as desired. These are priced on a case-by-case basis."*