In this part of the course, we delve into the intricacies of Large Language Models (LLMs). We start off by exploring the historical context and fundamental concepts of artificial intelligence (AI), machine learning (ML), neural networks (NNs), and generative AI (GenAI). We then examine the core attributes of LLMs, focusing on their scale, extensive training on diverse datasets, and the role of model parameters. Then we go over the types of challenges associated with using LLMs.
在這課程的一部份中,我們會深入瞭解Large Language Models (LLMs)的複雜性。我們開場先來探討人工智慧(AI)、機器學習(ML)、神經網路(NNs)與生成式AI(GenAI)的歷史背景跟基本概念。然後研究LLMs的核心屬性,關注在它們的規格、在不同資料集上的大量訓練、以及模型參數的作用。然後再來回顧使用LLMs的相關挑戰類型。
In the next section, we explore practical applications of LLMs across various domains, emphasizing their versatility in areas like content generation, language translation, text summarization, question answering etc. The section concludes with an analysis of the challenges encountered in deploying LLMs, covering essential aspects such as scalability, latency, monitoring etc.
在下一章的話,我們會來探討LLMs在各種領域上的實際應用,強調它們在領域內的多功能性,像是內容生成、語言翻譯、文本摘要、問題回答等。章節最後會分析佈署LLMs所遇到的挑戰,涵蓋著像是可擴展性、延遲、監控等基本方面的問題。
In summary, this part of the course provides a practical and informative exploration of Large Language Models, offering insights into their evolution, functionality, applications, challenges, and real-world impact.
總的來說,課程在這部份提供了一個對大型語言模型實用且資訊豐富的探索,深入瞭解它們的演進、功能、應用、挑戰、以及對真實世界的影響。
Image Source: https://medium.com/womenintechnology/ai-c3412c5aa0ac
The terms mentioned in the image above have likely come up in conversations about ChatGPT. The visual representation offers a broad overview of how they fit into a hierarchy. AI is a comprehensive domain, where LLMs constitute a specific subdomain, and ChatGPT exemplifies an LLM in this context.
上面圖片提到的術語很可能會出現在討論關於ChatGPT的對話中。這個視覺的表示提供了一個它們如何融入層次架構的概述。AI是一個綜合領域,其中LLMs構成一個特定的子領域,而ChatGPT在這個背景下就是一個LLM的例示。
In summary, Artificial Intelligence (AI) is a branch of computer science that involves creating machines with human-like thinking and behavior. Machine Learning(ML), a subfield of AI, allows computers to learn patterns from data and make predictions without explicit programming. Neural Networks (NNs), a subset of ML, mimic the human brain's structure and are crucial in deep learning algorithms. Deep Learning (DL), a subset of NN, is effective for complex problem-solving, as seen in image recognition and language translation technologies. Generative AI (GenAI), a subset of DL, can create diverse content based on learned patterns. Large Language Models (LLMs), a form of GenAI, specialize in generating human-like text by learning from extensive textual data.
總的來說,人工智慧(AI)是一個電腦科學的分支,其涉及了創造類人類思想與行為的機器。機器學習(ML),AI的子領域,它允許電腦從資料學習模式並在不需要顯式的寫程式的情況下做出預測。神經網路(NNs),ML的子集,模仿人類的大腦結構,對深度學習演算法而言至關重要。深度學習(DL),NN的子集,對於解複雜問題非常有效,像是影像辨識與語言翻譯技術。生成式人工智慧(GenAI),DL的子集,基於學習到的模式建立各式各樣的內容。大型語言模型(LLMs),GenAI的一種形式,透過從大量文本資料中學習來生成類人類文本。
Generative AI and Large Language Models (LLMs) have revolutionized the field of artificial intelligence, allowing machines to create diverse content such as text, images, music, audio, and videos. Unlike discriminative models that classify, generative AI models generate new content by learning patterns and relationships from human-created datasets.
生成式人工智慧跟大型語言模型(LLMs)徹頭徹尾的改變了人工智慧領域,它讓機器能夠建立多樣化的內容,像是文字、影像要音樂、音訊以及視訊。與分類的判別模型(discriminative models )不同,生成式人工智慧模型透過從人類建立的資料集中學習模式和關係來生成新的內容。
At the core of generative AI are foundation models which essentially refer to large AI models capable of multi-tasking, performing tasks like summarization, Q&A, and classification out-of-the-box. These models, like the popular one that everyone’s heard of-ChatGPT, can adapt to specific use cases with minimal training and generate content with minimal example data.
生成式人工智慧的核心是基礎模型,本質上就是說能夠做多任務處理的大型人工智慧模型,執行任務像是摘要、問答跟分類,而且是開箱即用。這些模型,像是大家都知道的ChatGPT,可以用最少的訓練來適應特定的情況,然後用最少的樣本資料來生成內容。
The training of generative AI often involves supervised learning, where the model is provided with human-created content and corresponding labels. By learning from this data, the model becomes proficient in generating content similar to the training set.
生成式人工智慧的訓練通常涉及監督式學習,這種模型就是會由人類提供內容與對應的標記。透過從這些資料學習,模型就會在生成類似於訓練集的內容這方面變的熟門熟路的。
Generative AI is not a new concept. One notable example of early generative AI is the Markov chain, a statistical model introduced by Russian mathematician Andrey Markov in 1906. Markov models were initially used for tasks like next-word prediction, but their simplicity limited their ability to generate plausible text.
生成式人工智慧並不是一個新的概念。早期生成式人工智慧的一個著名的案例就是馬可夫鏈,一個由俄羅斯數學家Andrey Markov於1906年引入的統計模型。馬可夫模型一開始是用來做下一個單字預測的任務,不過它們的簡單性也同時限制了它們生成似真實文本的能力。
The landscape has significantly changed over the years with the advent of more powerful architectures and larger datasets. In 2014, generative adversarial networks (GANs) emerged, using two models working together—one generating output and the other discriminating real data from the generated output. This approach, exemplified by models like StyleGAN, significantly improved the realism of generated content.
隨著時間的推演,更強的架構與更大型的資料集的出現,情況有著明顯的改變。在2014年,對抗式生成網路(GANs)橫空出世,使用兩個模型一起通力合作,一個生成輸出,另一個判別生成的輸出是真的還是假的。這種方法,明顯提升生成內容的真實感(以StyleGAN等模型為例表)。
A year later, diffusion models were introduced, refining their output iteratively to generate new data samples resembling the training dataset. This innovation, as seen in Stable Diffusion, contributed to the creation of realistic-looking images.
一年之後,引入了擴散模型,迭代式地改進它們的輸出來生成類似於訓練資料集的新的資料樣本。正如Stable Diffusion所見,這項創新有助於建立看起來逼真的影像。
In 2017, Google introduced the transformer architecture, a breakthrough in natural language processing. Transformers encode each word as a token, generating an attention map that captures relationships between tokens. This attention to context enhances the model's ability to generate coherent text, exemplified by large language models like ChatGPT.
2017年,Google引入transformer,這是自然語言處理中的一個突破。transformer會把每個word編碼成token,生成一個attention map來補捉tokens之間的相關性。這種對於上下文的注意力機制強化了生成連貫文本的能力,像是ChatGPT。
The generative AI boom owes its momentum not only to larger datasets but also to diverse research advances. These approaches, including GANs, diffusion models, and transformers, showcase the breadth of methods contributing to the exciting field of generative AI.
生成式AI的蓬勃發展並不單純的因為更大的資料量,還有各式各樣的研究發展。這些方法包括GAN、diffusion models與transformers,說明了貢獻於令人興奮的生成式AI這個領域的各種方法的廣度。
The term "Large" in Large Language Models refers to the sheer scale of these models—both in terms of the size of their architecture and the vast amount of data they are trained on. The size matters because it allows them to capture more complex patterns and relationships within language. Popular LLMs like GPT-3, Gemini, Claude etc. have thousands of billion model parameters. In the context of machine learning, model parameters are like the knobs and switches that the algorithm tunes during training to make accurate predictions or generate meaningful outputs.
在大型語言模型中的"大型"指的是這些模型的龐大規模,包括架構的大小,以及訓練所需的資料量。大小很重要是因為它能夠讓模型補捉語言中更為複雜的模式以及關聯。知名的LLM像是GPT-3、Gemini、Claude等,就有著數千億的模型參數。在機器學習的背景下,模型參數就像是演算法在訓練過程中調整的旋轉鈕與開關(為了能夠有準確的預測或是生成有意義的輸出)。
Now, let's break down what "Language Models" mean in this context. Language models are essentially algorithms or systems that are trained to understand and generate human-like text. They serve as a representation of how language works, learning from diverse datasets to predict what words or sequences of words are likely to come next in a given context.
現在,讓我們來拆解一下在背景之下的"語言模型"的意義。語言模型在本質上是訓練來瞭解與生成類似於人類文本的演算法或是系統。它們作為語言如何工作的表示,學習從不同的資料集中預測在給定的上下文中接下來可能會出現的文字或是文字序列。
The "Large" aspect amplifies their capabilities. Traditional language models, especially those from the past, were smaller in scale and couldn't capture the intricacies of language as effectively. With advancements in technology and the availability of massive computing power, we've been able to build much larger models. These Large Language Models, like ChatGPT, have billions of parameters, which are essentially the variables the model uses to make sense of language.
"大"所指的就是放大它們的能力。傳統的語言模型,特別是過去的那些模型,規模比較小,所以無法有效補捉模型的複雜性。隨著技術的進步與大量算力的可用性,我們已經能夠建立更大的模型。這些大型語言模型,像是ChatGPT,有數十億個參數,這些參數本質上就是模型用來理解語言的變數。
Take a look at the infographic from “Information is beautiful” below to see how many parameters recent LLMs have. You can view the live visualization here
看看下面那個"Information is beautiful"的資訊圖表,就能瞭解近來的LLMs的參數有多少。可以從這邊連結看一下即時的視覺化。
Image source: https://informationisbeautiful.net/visualizations/the-rise-of-generative-ai-large-language-models-llms-like-chatgpt/
Training LLMs is a complex process that involves instructing the model to comprehend and produce human-like text. Here's a simplified breakdown of how LLM training works:
訓練LLMs是一個複雜的過程,這涉及了指導模型去理解並生成似人類的文本。這邊給出關於如何訓練LLM的簡單拆解:
The training process may vary depending on the specific type of LLM being developed, such as those optimized for continuous text or dialogue.
訓練過程也許會有差異,這取決於正在開發的LLM的具體類型,像是那些對於連續文本或是對話的最佳化。
LLM performance is heavily influenced by two key factors:
LLM的效能受到兩個關鍵因素的影響:
Training a private LLM demands substantial computational resources and expertise. The duration of the process can range from several days to weeks, contingent on the model's complexity and dataset size. Commonly, cloud-based solutions and high-performance GPUs are employed to expedite the training process, making it more efficient. Overall, LLM training is a meticulous and resource-intensive undertaking that lays the groundwork for the model's language comprehension and generation capabilities.
訓練一個私有的LLM需要大量的運算資源與專業。整個訓練過程可以從幾天到幾週不等,視你的模型複雜度與資料集大小而定。一般來說,我們會採用基於雲端的解決方案與高效能GPU來加快訓練過程,使其更為有效率。總的來說,LLM訓練是一個細膩且資源密集的工程,為模型的語言理解與生成生成能力奠定基礎。
After the initial training, LLMs can be easily customized for various tasks using relatively small sets of supervised data, a procedure referred to as fine-tuning.
初步訓練之後,LLMs就可以用相對較小的監督資料集輕輕鬆鬆的自定義各種任務,這個過程稱為微調。
There are three prevalent learning models:
We will be diving deep into each of these methods during the course.
目前流行的學習模式有三種:
我們將在課程中深入研究每種方法。
LLMs are already being leveraged in various applications showcasing their versatility and power of these models in transforming several domains. Here's how LLMs can be applied to specific cases:
Summary of popular LLM use-cases
NO | Use case | Description |
---|---|---|
1 | Content Generation | Craft human-like text, videos, code and images when provided with instructions |
2 | Language Translation | Translate languages from one to another |
3 | Text Summarization | Summarize lengthy texts, simplifying comprehension by highlighting key points. |
4 | Question Answering and Chatbots | LLMs can provide relevant answers to queries, leveraging their vast knowledge |
5 | Content Moderation | Assist in content moderation by identifying and filtering inappropriate or harmful language |
6 | Information Retrieval | Retrieve relevant information from large datasets or documents |
7 | Educational Tools | Tutor, provide explanations, and generate learning materials |
LLMs已經被應用在各種應用程式中,說明了這些模型在改變多個領域方面的多功能性和強大。以下是LLMs如何被應用於特定案例:
Understanding the utilization of generative AI models, especially LLMs, can also be gleaned from the extensive array of startups operating in this domain. An infographic presented by Sequoia Capital highlighted these companies across diverse sectors, illustrating the versatile applications and the significant presence of numerous players in the generative AI space.
瞭解生成式人工智慧模型的使用,特別是LLMs,可以從該領域運營的大量新創公司中得到啟示。Sequoia Capital所呈現的一張資訊圖表,將這些公司分佈於各個產業,生動地展現了生成式AI領域的多樣化應用,以及眾多參與者在此領域的顯著地位。
Although LLMs have undoubtedly revolutionized various applications, numerous challenges persist. These challenges are categorized into different themes:
雖然LLMs確實是徹底改變了各種應用,不過仍然存在著各種挑戰。這些挑戰分類成不同主題:
Data Challenges:
資料挑戰:
Ethical and Social Challenges:
道德挑選:
Technical Challenges:
技術挑戰:
Deployment Challenges:
部署挑戰: