LangChain 101: An in-depth understanding of AI development with the framework

# LangChain 101: An in-depth understanding of AI development with the framework Artificial Intelligence or AI is the new buzzword that has conquired the world of technology in the recent days. It has brought a whole new revolution in the technology domain and has revolutionized the approach towards development, automation, and operations. Nowadays, it is making life simple for individuals, and every organization is trying to get a hold of and implement AI in their day-to-day operations and application development. But what exactly is it and how does it work? AI in simple terms refer to the simulation of human intelligence in machines. It surrounds a bunch of technologies and applications that ranges from simple automation and rule-based systems to training of complex algorithms. When we mention using AI in our everyday life, we refer to the training of complex algorithms to make decisions and predictions that make our life easy and simple. Machine Learning or ML is that subset of AI which helps in achieving the above task with involving training of algorithms, also known as models, on data to make predictions or decisions without being explicitly programmed. As AI evolved, Large Language Models or LLMs came into existence. They are designed to generate human-like texts and are trained on massive datasets that contain diversification of text sources using billions or trillions of parameters (configurations learned during training). These LLMs are the most useful while developing an AI application, and to streamline its development, we tend to use a bunch of frameworks that help in the process. In this article, we will focus on LangChain, one of such frameworks that helps in AI application development. Let's begin! ## What is LangChain? LangChain is a Python framework that streamlines the development of AI application by focusing on real-time data processing and integration with LLMs. It offers features for data communication and generation of vector embeddings. Furthermore, it makes AI development efficient for developers with its simplification of interactions with LLMs. The term "AI Application" actually refers to the inclusion of interaction with a learning model but the use of such model doesn't necessarily makes the application intelligent. Neural Networks, a subset of AI makes an application intelligent and LLMs are built using that. An AI application has a lot of pretrained knowledge as it processes data in real-time, meaning that the data submitted is taken as it is and fed to the LLM for updated information. Real-time processing of received data involves a bunch of steps. Suppose, you are browsing an e-commerce website and want to view an item on the website, you have to click on the item. That click will be sent to the webiste and will be processed through an AI application which will decide the "suggested items" that will be shown in the page. Hence, the application will be fed with the data of the item being viewed, the items available in the cart, and the previous items that the user saw or showed interest in. These data will then be fed to an LLM to show the list of suggested items. When building an application like that, you need to understand that there a number of steps in the pipleline that needs to be coreographed. These includes the selection of services, the way of feeding data into those services, and the shape of the data that will take place. All of these are complex actions which require APIs, data structure, network access, etc., and LangChain as a framework aims to simplify all of these actions and requirements without having to code all the little details. It provides pre-built libraries for popular LLMs in which a developer has to provide the credentials and the prompts and wait for the response. ![LangChain](https://hackmd.io/_uploads/BkmOeIQFA.png) Now, LLM interaction is one of the many qualities that LangChain posseses and we are going to have a look at all of the key components. But before that, we will discuss the importance and working principle of the python framework. ## What is the importance of LangChain? LangChain is important for several reasons. Some of them are listed below: ### Effective integration of LLMs and applications Effective integration of LLMs into real-world applications is a daunting task and it often provides a plethora of challenges. LangChain with its ability to interact with the LLMs provide the framework and necessary tools to bridge this gap, helping developers and users to harness the full potential of the models. ### Simplifying complex workflows Integrating applications with LLMs is not always a single-step query process because it involves data preprocessing, interaction of that data with external systems, and post-processing of that data. With the help of LangChain, developers can define and experience systematic management of these complex workflows that is highly essential for making a robust application. ### External integration and operability Integrating LLMs with external tools, APIs, and databases are a massive advantage and LangChain helps in attaining that by enabling the development of comprehensive solutions that involves multiple data sources and services. ### Efficiency and productivity LangChain helps in increasing developer productivity in multifold by providing utility functions, reusable components, and a structured approach towards development of AI application. All of these reduces the time and effort to build sophesticated applications and enable developers to focus on other works. ### Specific needs and use cases LangChain has made a thumping impact across various sectors of the industry. It has made AI interaction easy and has enhanced the efficiency and accuracy of it across sectors such as, customer service, content creation, and data analysis in the form of chatbots and tools. ## How does LangChain work? Now that we have a fair bit of understanding on what LangChain is and its importance, it is quite obvious for us to know the working principle of the framework as the next step. When you interact with an application, there are certain steps that need to happen in order to make the function executable. For example, when you try to make a payment with your credit card, the card information first gets gathered and secured. Then it gets transmitted to a processor, and the processor authenticates the validity of the card and availability of the fund, and ultimately, tranfers the money from the card to the merchant's account. The same thing happens while interacting with an LLM. In LangChain, the "chain" is responsible for creating a processing pipleline by putting AI actions together in order. Each action, or chain, is a necessary step in the pipeline for the completion of the set goal. To understand this, you need to understand the pattern of a Retrieval Augmented Generation (RAG) application. In this case, the pattern initiates with the user submitting a question. After that, an embedding gets created from the text followed by a search on the vector database to gain more context on the question. Afterwards, a prompt gets created by using the original question and the context obtained from the retrieval. Finally, the prompt is submitted to the LLM, which returns with a successful and intelligent completion of the prompt as a response. All of these steps in the application pattern need to happen in succession for the completion of the goal and in the case of an error, the entire processing stops. The "chain" construct in LangChain attaches steps in a specific way with a specific configuration. All of its libraries follow the same construct making that makes it easy to move steps around and create powerful pipelines. ## What are the key components of LangChain? LangChain comprises of a number of key components and each plays a distinct role in the overall functionality of the framework. These components collaborate with each to enhance the tasks in natural language processing and hence, make the system have an effective understanding, processing, and generation of human-like responses. All of them are listed below: ### LLMs LLMs are the main pillars behind LangChain. They provide the capability to understand prompts and generate responses. Trained on large datasets, they are deisgned to produce text that are logically correct and contextually relevant. ![model_io-e6fc0045b7eae0377a4ddeb90dc8cdb8](https://hackmd.io/_uploads/BJOgtMmY0.jpg) Source: <a href="https://python.langchain.com/v0.1/docs/modules/model_io/" target="_blank">LangChain Documentation</a> ### Prompt Templates In LangChain, prompt templates are designed for effective interactions with LLMs. They structure the input in such a way that maximizes the effectiveness of a model to understand and respond to queries. ### Indexes Indexes are databases in LangChain. They organize and store the data in a structured manner which enables effective retrieval of data during processing of language queries by the system. ### Retrievers This is another important component that works beside indexes. They help in collection of relevant information swiftly from the indexes based on the input query and ensures that the generated response is informed and accurate. ### Output Parsers Output Parsers act as the processors of LLM generated languages. They process and refine the generated output into a format that is relevant and useful for the specific task to be carried out. ### Vector Store Vector Store is another critical component of the framework that handles the embeddings of words or phrases into numerical vectors after the initial submission of query by the user. Embeddings are highly essential for tasks that involve the semantic analysis of language and understanding its nuances. ![vector_stores-125d1675d58cfb46ce9054c9019fea72](https://hackmd.io/_uploads/SyDZvMXFC.jpg) Source: <a href="https://python.langchain.com/docs/modules/data_connection/vectorstores/" target="_blank">LangChain Documentation</a> ### Agents Agents are the decision making components in LangChain. They are responsible for determining the best course of action that needs to be taken based on the input, the context, and the availability of resources within the system. A discussion on LangChain is incomplete without highlighting its major features and benefits. We will briefly cover them in the following sections. ## What are the main features of LangChain? LangChain has a long list of features but it is important to call out the best ones. Below is a summary of the most popular features of the framework: ### Model Communication As discussed earlier within this article, while building an AI application, you need to communicate with a language model like LLMs. Plus, if your application is following a RAG pattern, you might generate vector embeddings, open a chat session with the model, or submit a prompt for the LLM to complete. LangChain is particularly focused on model communication and common interactions and hence, the framework makes it easy to create a complete solution. The [langchain documentation](https://python.langchain.com/docs/integrations/llms/) helps you get a list of the supported LLMs and also get a detailed idea on models from [here](https://python.langchain.com/docs/modules/model_io/models/). Irrespective of the real-time data, the prompt used to communicate with the model have a certain text that stays the same. As a result, the application will do the same action multiple times. To parameterize this common prompt text, LangChain has a special prompts library. This implies that the prompts can have placeholders for areas that will be rightfully filled before the LLM submission. Instead of taking the time to replace the string on the prompt inefficiently, you can provide LangChain with the map of each placeholder along with their values. It will take care of the replacement efficiently and will get the prompt ready for completion. After completion, the application will do post-processing of data before continuation which involves the cleanup of characters or the inclusion of completion within parameter of an object class. Output Parsers from LangChain can easily do that by means of establishing a deep integration between the LLM's response and the custom classes within the application. Learn more about [Output Parsers](https://python.langchain.com/docs/modules/model_io/output_parsers/) and [Prompts](https://python.langchain.com/docs/modules/model_io/prompts/) from the official Langchain documentation. ### Data Retrieval A model is trained with a dataset and the size of the dataset determines the size of the model as well. The data with which a model gets trained is a fixed amount and there is no following of the data after the finishing of the model. In the design of a RAG application, the application compares the input data with some of the other recent data to gain a better context. Then, this up-to-date context is sent to the LLM to provide a updated response but the process involves continuous interaction with the LLM rather than a simple quick query. Also, the data needs to be stored in such an understood format that the retrieval of the same should be simple and consistent. The data in the form of contexts are not stored in our typical relational databases as they are available in the form of vectors. There is a special version of the database, known as Vector Store that helps in the storing and retrieval of vectors. This is where LangChain becomes useful as it provides integration for all the popular vector stores. Also, the framework offers a number of libraries that help in the implementation of the RAG pattern in an application. Data come from different sources to gain the right context with each source following a different schema. LangChain offers the `document` object that helps in the normalizing of data that are coming from multiple sources. The data can be comfortably passed as a `document` between different chains in a very structured manner. ### Chat Memory Everyone is going ga-ga on chatbots as it brings the reality of having a conversation with a language model under a defined flow. As the conversation progresses, the bot takes the history of the conversation into account to improve the query outputs going forward. This is called context and it is very much needed during the interaction between an AI application and a LLM. The application needs to give the LLM a historical information about the question being asked. Otherwise, it will result in an unproductive and dull conversation. Also, this context needs to be in the moment during a conversation because during conversations between humans, the person listening will not take note or try to remember each and every word. This takes the fun out of the conversation and transform it into a lecture and the same goes for interactions with LLMs. LangChain provides a special memory library where the application can save the chat's history to a fast lookup store. The data is stored in such a way that it can offer very quick lookups. The database in which the data is stored is usually either a nosql or an in-memory storage. Quickly retrieving the history of a conversation and injecting it into an LLM prompt is one powerful way of creating an intelligent contextual application. ## What are the advantages of using LangChain? LangChain provides several benefits in the realm of natural language processing that help developers create AI applications with great ease. Some notable ones are listed below: ### Simplified development process LangChain provides a framework that helps in creating and managing workflows involving multiple steps and components. This simplifies the development of language processing systems making it more straightforward and organized. Chaining different components together reduces the overall complexity and accelerates the development of applications. ### Enhanced Functionality LangChain has the ability to integrate with various language models, external tools, APIs, and databases which allows for richer and more comprehensive solutions offering sophesticated language processings. Agents add a layer of intelligence and adaptability by allowing decision-making based on real-time data and user interactions. All of these adds to the understanding and development of human-like language, particularly in complex tasks. ### Customization and Flexibility LangChain flexes its highly customizable offerings by allowing developers to tailor the framework according to specific use cases. The adaptability makes it suitable for a wide range of applications performing tasks ranging from natural language processing, data analysis, and automation. The modular structure of the framework makes it easier to scale applications by adding or removing components as per requirements. ### Improved Efficiency and Productivity Developing complex applications with LangChain is particularly easy, time-saving, and comfortable with the usage of ready-to-use multiple components and abstractions. With its array of built-in utility functions, developers can focus more on the actual work rather than repetitive coding. ### Versatality across sectors LangChain can handle diverse language tasks and thus makes itself a versatile framework in the AI ecosystem. With its modular design of chains and agents, developers can increase reusability which results in simple maintenance and reduced redundancy. This type of adaptability makes it valuable among various sectors that includes data analysis, content creation, customer service, etc. ## Examples of applications using LangChain LangChain provides a powerful and adaptable framework that helps you create generative AI applications. These applications are of different types and below are those types that facilitates development using the LangChain framework. ### Retrieval-Augmented Generation (RAG) RAG is one of the most popular type of generative AI application that is creating all the buzz in modern day tech. LangChain offers a number of scenarios that implement the RAG pattern. Learn more about these scenarios by visiting the `Q&A with RAG` section in the [use cases](https://python.langchain.com/docs/use_cases) area within the LangChain documentation. ### Chatbots Chatbots are the straightforward applications that provides direct access to LLM interactions. You can create a chatbot application using LangChain within a few minutes. Know more about it from [this use case](https://python.langchain.com/v0.1/docs/use_cases/chatbots/) section related to chatbots in the documentation. ### Synthetic Data Generation Tests are written to ensure that the functions of a system or an application are running properly, or if the system can handle enough load that it will see in a production environment. These tests require a lot of data and you can generate these data artificially with the help of a generative AI application using an LLM. This example of [Synthetic data generation](https://python.langchain.com/v0.1/docs/use_cases/data_generation/) from use cases in the LangChain documentation tells you how you can generate data by using an LLM. ### Database Interaction Natural language processing can help you interact with databases in the form of SQL queries. While writing a whole lot of it may not create a fun experience, but dictating a question and transforming it into an SQL query is surely a fun one. This can be done using an AI application which can be developed using the LangChain framework. This example of creating a [Q&A chain and agent over a SQL database](https://python.langchain.com/v0.1/docs/use_cases/sql/quickstart/) tells you more about in detail. ## Getting Started with LangChain So far, we have covered most of the LangChain basics and its time to install it in our local machine. As discussed earlier, it is a python framework and hence, installation of the Python language is a mandatory prerequisite. Download Python from the [language's official download page](https://www.python.org/downloads/) and refer to [this documentation](https://www.geeksforgeeks.org/download-and-install-python-3-latest-version/) for installation. Whether you want to add an AI touch to your already existing Python application or starting to develop an application from scratch, you have to add the LangChain library to the requirements. Execute the following command in your terminal to install LangChain for the respective package managers: For pip: ```bash= pip install langchain ``` For conda: ```bash= conda install langchain -c conda-forge ``` LangChain is available as a [Docker image](https://hub.docker.com/r/langchain/langchain) and hence, can be easily installed on cloud platforms. It is an open-source project and the source code is available on [GitHub](https://github.com/langchain-ai/langchain). You can download the source code and install it on your machine. Follow the [quickstart guide](https://python.langchain.com/v0.1/docs/get_started/quickstart/) to get started with LangChain and explore the possibilities of development with this multipurpose framework. ## Final words and future Throughout the article, we have covered all of the basics of LangChain and learned quite a lot of information about the framework. Wrapping everything up, we can see that it is an open-source framework that helps in the development of AI applications. With its simplified and streamlined development process alongside robust customization of modules and agents, it facilitates efficient and productive environment for developers and caters around a versatile sector of the market. It can facilitate the development of different types of generative AI application while enabling effective model communication, data retrieval, and storing chat memory for better contexts. While the present usage of this framework is enormous in the area of customer service, data analysis, content creation, etc., the future will show teams developing applications for other areas of the market. Keeping this in mind, hope you are taking all of the positives from this article and can't wait to try out LangChain for yourself and explore the possibilities of development with generative AI. Start building and bank on the endless ocean of knowledge on AI, while I will see you on the next one.