DynoLearn: Dynamic Content Generation for OpenEARL

{%hackmd theme-dark %} # DynoLearn: Dynamic Content Generation for OpenEARL (Early Intervention Resource Library) Project Log ## Retrieval Augmented Generation (RAG) A Retrieval Augmented Generation (RAG) is a large language model that usilises retrieval of information such as from a database or API to generate content not limited to its training set. [Article on implementing a RAG in the langchain framework](https://towardsdatascience.com/retrieval-augmented-generation-rag-from-theory-to-langchain-implementation-4e9bd5f6a4f2) ## API Protocol I have also found a standardised LLM Agent API protocol called [AgentProtocol](https://agentprotocol.ai/protocol) >The reason for creating the Agent Protocol was to provide a standardized way how interact with the agents. This is useful for automation, agent to agent communication, general UIs or dev tools. ## Evaluation Frameworks I found two LLM evaluation frameworks, [LangSmith](https://python.langchain.com/docs/langsmith/) which is made by LangChain and [DeepEval](https://docs.confident-ai.com/docs/getting-started) which is made by Confident-AI. - LangSmith is in closed beta, I have applied to the waitinglist but I don't know when I might get access - DeepEval is open source and available, Confident-AI also offers a free tier to use their services, so I am leaning towards them According to Jeffrey Ip, the founder of Confident AI, [Common metrics used to evaluate LLMs](https://www.confident-ai.com/blog/how-to-evaluate-llm-applications#Step%20Two%E2%80%8A%E2%80%94%E2%80%8AIdentify%20Relevant%20Metrics%20for%20Evaluation:~:text=Step%20Two%E2%80%8A%E2%80%94%E2%80%8AIdentify%20Relevant%20Metrics%20for%20Evaluation) are: - Factual consistency - Answer relevancy - Coherence - Toxicity - RAGAS - Bias ## 11/11/2023 – Working with langchain and expanding on the project plan Today I was experimenting with the langchain framework, I found that I needed to learn how to use a few other libraries on top of it such as the HuggingFace Transformer library and pipeline library. The model didn’t perform consistently at generating a set amount of tasks but did demonstrate the ability to at least generate a set of tasks. 1. Find data for finetuning and evaluating. 2. Getting a base model that can be fine tuned to work with the langchain library 3. Finetuning the model with the data 4. Implement an agent using the finetuned model 5. Implementing the agent into a system ## 07/11/23 - More Medium blogs! - [Composability in LLM Apps](https://betterprogramming.pub/composability-in-llm-apps-495f0f170874) - [LLM Embeddings](https://pub.aimind.so/llm-embeddings-explained-simply-f7536d3d0e4b) - [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) ## 31/10/23 - Blog on working with innovative technologies https://ntietz.com/blog/forefront-of-innovation/ ## 18/10/2023 - Large Language Model Agent The Langchain python framework allows you to build a large language model agent designed around completing a specific task which could be useful for the project: * https://python.langchain.com/docs/modules/agents/how_to/custom_llm_agent ## 09/10/2023 - Web technologies Looking into web technologies I remembered the expereince i had with a few web apps that worked like native apps and found more information about them here: * https://developer.chrome.com/blog/capabilities/ * https://www.macrumors.com/how-to/use-web-apps-iphone-ipad/ * https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps The benefit of these apps is twofold, they are easily accessable and secure while also giving seemless functionality with the native system such as offline capabilities and access to native hardware APIs along easier cross-platform development ## 04/10/2023 - Learning about Large Language Models (LLMs) and testing one locally I read a few articles about LLMs below: * https://python.plainenglish.io/llm-large-language-model-a-comprehensive-overview-and-options-for-building-one-b180f508e182 * https://medium.com/@myschang/evaluation-of-large-language-model-llm-introduction-9343424ad253 * https://medium.com/the-llmops-brief/introduction-to-large-language-models-9ac028d34732 I have also read an article on Named Entity Recognition (NER) which could be used to help process the input to an LLM: https://www.turing.com/kb/a-comprehensive-guide-to-named-entity-recognition Looking into different models I went on to Huggingface.co and found a variety of them, seperated into categories for what they were trained to do, I chose the text generation category as I felt that would suit the needs of the project best. One of the most popular and recent models on the site was the `stablelm-3b-4e1t` model made by stability.ai * https://huggingface.co/stabilityai/stablelm-3b-4e1t In order to run this model locally I had to: 1. Set up a huggingface hub account and accept the terms and conditions to use the model 2. Generate a read-only huggingface API key 4. Log into the account using the API key with the huggingface cli `huggingface-cli login --token $HUGGINGFACE_TOKEN` 4. Install the CUDA toolkit for my GPU (Nvidia Geforce GTX 1060) along with the latest game ready drivers 6. Install miniconda or anaconda 7. Installing python dependancies in a conda virtual environment 8. Finally after a restart I was able to use everthing locally 9. Enabling developer mode on windows was also suggested to increase performance I chose to use PyTorch for its large support of libraries and models especially on huggingface https://www.v7labs.com/blog/pytorch-vs-tensorflow After getting everything running I used the template code and paramaters from the Model card: ```py from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t") model = AutoModelForCausalLM.from_pretrained( "stabilityai/stablelm-3b-4e1t", trust_remote_code=True, torch_dtype="auto", ) model.cuda() inputs = tokenizer("James needs to learn how to get ready for school, can chase from paw patrol help him?", return_tensors="pt").to("cuda") tokens = model.generate( **inputs, max_new_tokens=100, temperature=0.75, top_p=0.95, do_sample=True, ) print(tokenizer.decode(tokens[0], skip_special_tokens=True)) ``` Using the prompt *"James needs to learn how to get ready for school, can chase from paw patrol help him?"* elicited the response *"James needs to learn how to get ready for school, can chase from paw patrol help him? Help Chase from paw patrol to get ready for school. They need to prepare a new student, James. Let’s help them to choose clothes and pick up necessary stuff for school. Chase will help James to get ready for school. James needs to learn how to get ready for school, can chase from paw patrol help him?"* Guides on setting up with CUDA https://medium.com/@harunijaz/a-step-by-step-guide-to-installing-cuda-with-pytorch-in-conda-on-windows-verifying-via-console-9ba4cd5ccbef https://www.educative.io/answers/how-to-resolve-torch-not-compiled-with-cuda-enabled Download sources: https://developer.nvidia.com/cuda-downloads https://pytorch.org/