In this section of the course on LLM Challenges, we've identified two main areas of concern with LLMs: behavioral challenges and deployment challenges. Behavioral challenges include issues like hallucination, where LLMs generate fictitious information, and adversarial attacks, where inputs are crafted to manipulate model behavior. Deployment challenges encompass memory and scalability issues, as well as security and privacy concerns. LLMs demand significant computational resources for deployment and face risks of privacy breaches due to their ability to process vast datasets and generate text. To mitigate these challenges, we discuss various strategies such as robust defenses against adversarial attacks, efficient memory management, and privacy-preserving training algorithms. Additionally we will go over techniques like differential privacy, model stacking, and preprocessing methods that are being employed to safeguard user privacy and ensure the reliable and ethical use of LLMs across different applications.
在LLM挑戰的課程章節中,我們確定了LLMs關注的兩個主要領域:行為挑戰和部署挑戰。行為挑戰包括幻覺,也就是LLM生成虛構訊息,與對抗性攻擊,也就是精心設計輸入來操縱模型行為。部署挑戰包括記憶體和可擴展性問題,以及安全性和隱私疑慮。LLMs需要大量的計算資源來部署,並且由於其處理大量資料集和生成文本的能力而面臨隱私洩露的風險。為了緩解這些挑戰,我們討論了各種策略,例如針對對抗性攻擊的穩健防禦、高效的記憶體管理和隱私保護訓練演算法。此外,我們還將討論差分隱私、模型堆疊和預處理方法等技術,這些技術用於保護使用者隱私並確保LLM在不同應用程式中的可靠度與合乎道德的使用。
We categorize the challenges into two main areas: managing the behavior of LLMs and the technical difficulties encountered during their deployment. Given the evolving nature of this technology, it's likely that current challenges will be mitigated, and new ones may emerge over time. However, as of February 15, 2024, these are the prominently discussed challenges associated with LLMs:
我們將挑戰分為兩個主要領域:管理LLMs的行為以及部署期間遇到的技術難點。鑑於這項技術不斷發展的性質,當前的挑戰很可能會得到緩解,並且隨著時間的推移可能會出現新的挑戰。然而,截至 2024年2月15日為止,這些是與LLMs相關的主要討論的挑戰:
LLMs sometimes generate plausible but entirely fictitious information or responses, known as "hallucinations." This challenge is particularly harmful in applications requiring high factual accuracy, such as news generation, educational content, or medical advice as hallucinations can erode trust in LLM outputs, leading to misinformation or potentially harmful advice being followed.
LLMs有時會生成一些看似合理但完全唬爛的信息或響應,我們將之稱為"幻覺"。這項挑戰在需要高度事實準確性的應用程式中特別有毛,例如新聞生成、教育內容或醫療建議,因為幻覺可能會削弱人們對LLM輸出的信任,導致錯誤信息或潛在有害的建議被採納。
LLMs can be vulnerable to adversarial attacks, where inputs are specially crafted to trick the model into making errors or revealing sensitive information. These attacks can compromise the integrity and reliability of LLM applications, posing significant security risks.
LLM可能容易受到對抗性攻擊,其輸入是經過精心設計的,主要是為了欺騙模型犯錯或洩露敏感信息。這些攻擊可能會損害LLM應用程式的完整性和可靠性,從而帶來重大的安全風險。
Ensuring LLMs align with human values and intentions is a complex task. Misalignment can result from the model pursuing objectives that don't fully encapsulate the user's goals or ethical standards. Misalignment can lead to undesirable outcomes, such as generating content that is offensive, biased, or ethically questionable.
確保LLMs符合人類價值和意圖是一項複雜的任務。不一致可能是由於模型追求的目標沒有完全反映使用者的目標或道德標準。不一致可能會導致非預期的結果,像是生成令人反感、有偏見或道德上有問題的內容。
LLMs can be overly sensitive to the exact wording of prompts, leading to inconsistent or unpredictable outputs. Small changes in prompt structure can yield vastly different responses. This brittleness complicates the development of reliable applications and requires users to have a deep understanding of how to effectively interact with LLMs.
LLMs可能對提示的確切措辭過於敏感,導致不一致或不可預測的輸出。提示結構的微小變化可能會產生截然不同的響應。這種脆弱性使得開發可靠地應用程式變得複雜,並要求使用者深入了解如何有效地與LLMs互動。
Deploying LLMs at scale involves significant memory and computational resource demands. Managing these resources efficiently while maintaining high performance and low latency is a technical hurdle. Scalability challenges can limit the ability of LLMs to be integrated into real-time or resource-constrained applications, affecting their accessibility and utility.
大規模部署LLMs需要大量的記憶體和運算資源。有效管理這些資源,同時維持高效能且低延遲則是一個技術難點。可擴展性挑戰可能會限制LLMs整合到即時或資源受限的應用程式中的能力,從而影響其可訪問性和實用性。
Protecting the data used by and generated from LLMs is critical, especially when dealing with personal or sensitive information. LLMs need robust security measures to prevent unauthorized access and ensure privacy. Without adequate security and privacy protections, there is a risk of data breaches, unauthorized data usage, and loss of user trust.
Let’s dig a little deeper into each of issues and existing solutions for them
保護LLMs所使用與生成的資料是非常重要的,尤其是在處理個人或敏感信息時。LLMs需要穩健的安全措施來防止未經授權的存取並確保隱私。如果沒有足夠的安全和隱私保護,就會存在資料外洩、未經授權的資料使用和失去用戶信任的風險。
讓我們更深入地研究每個問題及其現有解決方案:
Hallucination refers to the model generating information that seems plausible but is actually false or made up. This happens because LLMs are designed to create text that mimics the patterns they've seen in their training data, regardless of whether those patterns reflect real, accurate information. Hallucination is particularly harmful in RAG based applications where the model can generate content that is not supported by data but it is very hard to decipher.
幻覺是指模型生成的信息看似合理,但實際上卻是錯誤的或捏造的。發生這種情況是因為LLMs主要是模仿他們在訓練資料中看到的模式來建立文本,無論這些模式是否反映真實、準確的信息。在基於RAG的應用程式中,模型可能會生成沒有資料支撐的內容,難以辨別,這種情況下的幻覺就特別有害。
Hallucination can arise from several factors:
幻覺可能由以下幾個因素所引起:
How to detect and mitigate hallucinations?
There's a need for automated methods to identify hallucinations in order to understand the model's performance without constant manual checks. Below, we explore various popular research efforts focused on detecting such hallucinations and some of them also propose methods to mitigate hallucinations.
我們需要一種自動化的方法來識別幻覺,以便在無需持續手動檢查的情況下了解模型的性能。下面,我們探討了各種專注於檢測此類幻覺的流行研究工作,其中一些還提出了減輕幻覺的方法。
These are only two of the popular methods, the list is not comprehensive by any means:
這邊就只是兩種流行的方法,這個清單無論如何並不代表全部:
SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection
for Generative Large Language Models (link)
✅Hallucination Detection
❌Hallucination Mitigation
Image Source: https://arxiv.org/pdf/2303.08896.pdf
SelfCheckGPT uses the following steps to detect hallucinations
SelfCheckGPT使用以下步驟來偵測幻覺
A significant advantage of this method is that it operates without the need for external knowledge bases or databases, making it especially useful for black-box models where the internal data or processing mechanisms are inaccessible.
這種方法的一個顯著優點就是,它的運作不需要外部知識庫或資料庫,這使得它對於內部資料或處理機制無法存取的黑盒子模型特別有用。
✅Hallucination Detection
✅Hallucination Mitigation
Image Source: https://arxiv.org/pdf/2305.15852.pdf
This research presents a three-step pipeline to detect and mitigate hallucinations, specifically self-contradictions, in LLMs.
這個研究提出了一個三步驟流程來檢測與減輕LLM的幻覺,特別是自相矛盾的部份。
💡 Self-contradiction refers to a scenario where a statement or series of statements within the same context logically conflict with each other, making them mutually incompatible. In the context of LLMs, self-contradiction occurs when the model generates two or more sentences that present opposing facts, ideas, or claims, such that if one sentence is true, the other must be false, given the same context.
💡 自相矛盾是指同一個上下文中的一個陳述或一系列陳述在邏輯上相互衝突,從而使它們相互不相容(矛盾)。在LLM的背景下,當模型生成兩個或多個呈現相反事實、想法或主張的句子時,就會出現自相矛盾,這樣,在給定相同上下文的情況下,如果一個句子是正確的,則另一個句子一定是錯的。
Here's a breakdown of the process:
以下是該過程的詳細說明:
The framework is extensively evaluated across four modern LLMs, revealing a significant prevalence of self-contradictions in their outputs. For instance, 17.7% of all sentences generated by ChatGPT contained self-contradictions, many of which could not be verified using external knowledge bases like Wikipedia.
該框架在四個現代大型語言模型中進行了廣泛的評估,揭示了它們的輸出中存在大量的自相矛盾。舉例來說,ChatGPT所生成的句子中有17.7%包含自相矛盾,其中很多是無法使用維基百科等外部知識庫來進行驗證的。
Adversarial attacks involve manipulating the LLM’s behavior by providing crafted inputs or prompts, with the goal of causing unintended or malicious outcomes. There are many types of adversarial attacks, we discuss a few here:
對抗性攻擊涉及透過提供精心設計的輸入或提示來操縱LLM的行為,其目的是造成非預期或惡意的結果。對抗性攻擊有很多種類型,我們在這裡討論其中幾種:
Adversarial attacks pose a significant challenge to LLMs by compromising model integrity and security. These attacks enable adversaries to remotely control the model, steal data, and propagate disinformation. Furthermore, LLMs' adaptability and autonomy make them potent tools for user manipulation, increasing the risk of societal harm.
對抗性攻擊會損害模型的完整性和安全性,從而對LLM構成重大挑戰。這些攻擊使對手能夠遠端控制模型、竊取資料並傳播虛假的信息。此外,LLMs的適應性和自主性使其成為使用者操控的有力工具,增加了社會危害的風險。
Effectively addressing these challenges requires robust defenses and proactive measures to safeguard against adversarial manipulation of AI systems.
有效應對這些挑戰需要強而有力的防禦和積極主動的措施,以防止人工智慧系統受到對抗性操縱。
Several efforts have been made to develop robust LLMs and evaluate them against adversarial attacks. One approach to mitigating such attacks involves training the LLM to become accustomed to adversarial inputs, instructing it not to respond to them. An example of this is presented in the paper SmoothLLM, which functions by perturbing multiple copies of a given input prompt at the character level and then consolidating the resulting predictions to identify adversarial inputs. Leveraging the fragility of prompts generated adversarially to changes at the character level, SmoothLLM notably decreases the success rate of jailbreaking attacks on various widely-used LLMs to less than one percent. Critically, this defensive strategy avoids unnecessary caution and provides demonstrable assurances regarding the mitigation of attacks.
人們已經努力開發出幾個強大的LLMs並評估它們對抗對抗性攻擊的能力。緩解這類攻擊的一種方法就是是訓練LLMs習慣對抗性輸入,指示模型不要做出回應。論文SmoothLLM中提供了一個這樣的範例,該論文透過在字元等級擾動(perturb)給定輸入提示的多個副本,然後整合(consolidate)得到的預測結果來識別對抗性輸入。利用對抗性生成提示的脆弱性來應對字元層級的變化,SmoothLLM顯著地將各種對LLMs廣泛使用的越獄攻擊的成功率降低到不到1%。關鍵的部份在於,這種防禦策略避免了不必要的謹慎,並為減輕攻擊提供了明顯的保證。
Another mechanism to defend LLMs against adversarial attacks involves the use of a perplexity filter as presented in this paper. This filter operates on the principle that unconstrained attacks on LLMs often result in gibberish strings with high perplexity, indicating a lack of fluency, grammar mistakes, or illogical sequences. In this approach, two variations of the perplexity filter are considered. The first is a simple threshold-based filter, where a prompt passes the filter if its log perplexity is below a predefined threshold. The second variation involves checking perplexity in windows, treating the text as a sequence of contiguous chunks and flagging the text as suspicious if any window has high perplexity.
另一種保護LLMs面對對抗性攻擊的機制涉及使用困惑度過濾器(perplexity filter),如論文中所述。該過濾器的工作原理是,對LLMs的無約束攻擊通常會導致高困惑度的亂碼字串,表明了該字串缺乏流暢性、語法錯誤或不合邏輯的序列。在這個方法中,考慮兩種變體的困惑度過濾器。第一個是一種簡單的基於閾值的過濾器,如果提示的對數困惑度(log perplexity)低於預先定義的閾值,則提示通過過濾器。第二種變體涉及檢查視窗中的困惑度,將文本視為一系列連續區塊,如果任何視窗的困惑度很高,那就將文本標記為可疑。
A good starting point to read about Adversarial techniques is Greshake et al. 2023. The paper proposes a classification of attacks and potential causes, as depicted in the image below.
瞭解對抗性技術的一個很好的起點就是讀讀論文Greshake et al. 2023。該論文提出了攻擊和潛在原因的分類,如下圖所示。
Alignment refers to the ability of LLMs to understand instructions and generate outputs that align with human expectations. Foundational LLMs, like GPT-3, are trained on massive textual datasets to predict subsequent tokens, giving them extensive world knowledge. However, they may still struggle with accurately interpreting instructions and producing outputs that match human expectations. This can lead to biased or incorrect content generation, limiting their practical usefulness.
一致性是指LLM理解指令並生成符合人類期望的輸出的能力。基礎的LLMs,像是例如GPT-3,接受大量文本資料集的訓練以預測後續tokens,為它們提供廣泛的世界知識。然而,它們可能仍然會糾結在解釋指令並產生符合人類期望的輸出。這可能會導致生成有偏見或不正確的內容,從而限制其實際用途。
Alignment is a broad concept that can be explained in various dimensions, one such categorization is done in this paper. The paper proposes multiple dimensions and sub-classes for ensuring LLM alignment. For instance, harmful content generated by LLMs can be categorized into harms incurred to individual users (e.g., emotional harm, offensiveness, discrimination), society (e.g., instructions for creating violent or dangerous behaviors), or stakeholders (e.g., providing misinformation leading to wrong business decisions).
一致性是一個廣泛的概念,可以從多個維度來解釋,論文中提出了這樣的分類。該論文提出了多個維度和子類別來確保LLM的一致性。舉例來說,LLMs所生成的有害內容可以分為對個人使用者(例如,情緒傷害、冒犯性、歧視)、社會(例如,製造暴力或危險行為的指示)或利害關係人(例如,提供導致錯誤商業決策的錯誤信息)造成的傷害。
In broad terms, LLM Alignment can be improved through the following process:
Some popular aligned LLMs and benchmarks are listed in the image below
從廣義上來說,LLM一致性可以透過下面過程來改進:
下圖列出了一些熱門的切齊LLMs和基準的方法
Image Source: https://arxiv.org/pdf/2307.12966.pdf
During the prompt engineering segment of our course, we explored various techniques for prompting LLMs. These sophisticated methods are essential because providing instructions similar to humans isn't suitable for LLMs. An overview of commonly used prompting methods is shown in the image below.
在我們課程的提示工程部分,我們探索了提示LLMs的各種技術。這些複雜的方法非常重要,因為提供類似人類的指令並不適合LLMs。常用的提示方法概述如下圖。
LLMs require precise prompting, and even slight alterations can impact LLMs, altering their responses. This poses a challenge during deployment, as individuals unfamiliar with prompting methods may struggle to obtain accurate answers from LLMs.
LLMs需要精確的提示,即使是輕微的改變也會影響LLMs,改變他們的響應。這導致在部署過程中構成了一個挑戰,因為不熟悉提示方法的人可能很難從LLMs那裡獲得準確的答案。
Image Source: https://arxiv.org/pdf/2307.10169.pdf
In general, prompt brittleness in LLMs can be reduced by adopting the following high level strategies:
一般來說,可以通過採用下面高階策略來減少LLMs中的提示脆弱性:
In this section, we delve into the specific challenges related to memory and scalability when deploying LLMs, rather than focusing on their development.
在這一節中,我們將深入研究部署LLMs時,與記憶體和可擴展性相關的具體挑戰,而不是專注於它們的開發。
Let's explore these challenges and potential solutions in detail:
讓我們詳細探討這些挑戰和潛在的解決方案:
Fine-tuning LLMs: Continuous fine-tuning of LLMs is crucial to ensure they stay updated with the latest knowledge or adapt to specific domains. Fine-tuning involves adjusting pre-trained model parameters on smaller, task-specific datasets to enhance performance. However, fine-tuning entire LLMs requires substantial memory, making it impractical for many users and leading to computational inefficiencies during deployment.
Solutions: One approach is to leverage systems like RAG, where information can be utilized as context, enabling the model to learn from any knowledge base. Another solution is Parameter-efficient Fine-tuning (PEFT), such as adapters, which update only a subset of model parameters, reducing memory requirements while maintaining task performance. Methods like prefix-tuning and prompt-tuning prepend learnable token embeddings to inputs, facilitating efficient adaptation to specific datasets without the need to store and load individual fine-tuned models for each task. All these methods have been discussed in our previous weeks’ content. Please read through for deeper insights.
微調LLMs: LLM的持續微調對於確保它們與最新知識保持同步或適應特定領域來說非常重要。微調涉及在較小的、特定於任務的資料集上調整預訓練的模型參數以提高效能。然而,微調整個LLMs需要大量的記憶體,這對於許多使用者來說並不切實際,也導致部署期間計算效率低下。
解決方案: 一種方法是利用像是RAG這樣的系統,其信息可以做為上下文,讓模型能夠從任意知識庫中學習。另一種解決方案是Parameter-efficient Fine-tuning (PEFT),舉例來說,adapters,它就只會更新模型參數的一個子集,降低記憶體需求的同時也能維持任務效能。像是prefix-tuning和prompt-tuning等方法會將可學習的token預先嵌入到輸入中,促進對特定資料集的高效適應,而無需為每個任務儲存和載入單獨的微調模型。所有這些方法都已在我們前幾週的課程內容中討論過。請通讀以獲得更深入的見解。
Inference Latency: LLMs often suffer from high inference latencies due to low parallelizability and large memory footprints. This results from processing tokens sequentially during inference and the extensive memory needed for decoding.
Solution: Various techniques address these challenges, including efficient attention mechanisms. These mechanisms aim to accelerate attention computations by reducing memory bandwidth bottlenecks and introducing sparsity patterns to the attention matrix. Multi-query attention and FlashAttention optimize memory bandwidth usage, while quantization and pruning techniques reduce memory footprint and computational complexity without sacrificing performance.
推理延遲: 由於可平行性低和記憶體佔用大,LLMs經常會遇到推理高延遲的問題。這是由於在推理過程中按順序處理tokens以及解碼所需的大量記憶體造成的。
解決方案: 各種技術可以解決這些挑戰,包括高效的的注意力機制。這些機制主要是透過減少記憶體頻寬的瓶頸並向注意力矩陣引入稀疏模式來加速注意力的計算。 Multi-query attention和FlashAttention優化了記憶體頻寬的使用,同時quantization和[pruning](https://medium.com/@bnjmn_marie/freeze-and- prune-to-fine-tune-your-llm-with-apt-dc750b7bfbae)技術在不會犧牲效能的情況下可以減少記憶體佔用和計算複雜度。
Limited Context Length: Limited context length refers to the constraint on the amount of contextual information an LLM can effectively process during computations. This limitation stems from practical considerations such as computational resources and memory constraints, posing challenges for tasks requiring understanding longer contexts, such as novel writing or summarization.
Solution: Researchers propose several solutions to address limited context length. Efficient attention mechanisms, like Luna and dilated attention, handle longer sequences efficiently by reducing computational requirements. Length generalization methods aim to enable LLMs trained on short sequences to perform well on longer sequences during inference. This involves exploring different positional embedding schemes, such as Absolute Positional Embeddings and ALiBi, to inject positional information effectively.
有限的上下文長度: 有限的上下文長度是指LLM的計算過程中可以有效處理的上下文信息量的限制。這種限制源於實際考慮,像是計算資源和記憶體限制等,這為需要理解較長上下文的任務,像是小說寫作或摘要等任務帶來了挑戰。
解決方案: 研究人員提出了幾種解決方案來解決上下文長度限制的問題。高效率的注意力機制,像是Luna 和dilated attention,可以透過減少計算需求有效處理較長的序列。長度泛化方法主要是讓在短序列上所訓練的LLMs在推理過程中能夠在較長的序列上表現良好。這涉及探索不同的位置嵌入方案,例如絕對位置嵌入(Absolute Positional Embeddings)和 ALiBi,以有效地方式註入位置資訊。
Privacy risks stem from their ability to process and generate text based on vast and varied training datasets. Models like GPT-3 have the potential to inadvertently capture and replicate sensitive information present in their training data, leading to potential privacy concerns during text generation. Issues such as unintentional data memorization, data leakage, and the possibility of disclosing confidential or personally identifiable information (PII) are significant challenges.
隱私風險源自於它們根據大量多樣的訓練資料集處理和生成文本的能力。像GPT-3這樣的模型有可能無意中捕獲並複製訓練資料中所存在的敏感信息,從而導致文本生成過程中潛在的隱私問題。無意間的資料記憶、資料外洩以及洩露機密或個人識別資訊(personally identifiable information,PII)的可能性等問題是重大挑戰。
Moreover, when LLMs are fine-tuned for specific tasks, additional privacy considerations arise. Striking a balance between harnessing the utility of these powerful language models and safeguarding user privacy is crucial for ensuring their reliable and ethical use across various applications.
此外,當LLMs針對特定任務進行微調時,就會出現額外的隱私考量。在利用這些強大的語言模型的實用性和保護用戶隱私權之間取得平衡對於確保其在各種應用程式中的可靠度和合乎道德的使用來說就是很重要的一件事。
We review key privacy risks and attacks, along with possible mitigation strategies. The classification provided below is adapted from this paper, which categorizes privacy attacks as:
我們審查主要的隱私風險和攻擊,以及可能的緩解策略。以下提供的分類改編自論文,將隱私攻擊分類為:
Image Source: https://arxiv.org/pdf/2402.00888.pdf
In this attack, adversaries exploit access to gradients or gradient information to compromise the privacy and safety of deep learning models. Gradients, which indicate the direction of the steepest increase in a function, are crucial for optimizing model parameters during training to minimize the loss function.
To mitigate gradient-based attacks, several strategies can be employed:
在這種攻擊中,攻擊者利用對梯度或梯度信息的存取來危害深度學習模型的隱私和安全性。梯度表示函數最急劇增加的方向,對於在訓練過程中最佳化模型參數以最小化損失函數至關重要。
為了減輕基於梯度的攻擊,可以採用下面幾種策略:
2. Membership Inference Attack
A Membership Inference Attack (MIA) aims to determine if a particular data sample was part of a machine learning model's training data, even without direct access to the model's parameters. Attackers exploit the model's tendency to overfit its training data, leading to lower loss values for training samples. These attacks raise serious privacy concerns, especially when models are trained on sensitive data like medical records or financial information.
Mitigating MIA in language models involves various mechanisms:
成員推理攻擊(Membership Inference Attack,MIA) 主要是確定特定資料樣本是否為機器學習模型訓練資料的一部分,即使沒有直接存取模型的參數。攻擊者利用模型過度擬合訓練資料的傾向,導致訓練樣本的損失值較低。這些攻擊會引發嚴重的隱私問題,特別是當模型是在醫療記錄或財務資訊等敏感資料上訓練時。
減輕語言模式中的MIA涉及多種機制:
3. Personally Identifiable Information (PII) attack
This attack involves the exposure of data that can uniquely identify individuals, either alone or in combination with other information. This includes direct identifiers like passport details and indirect identifiers such as race and date of birth. Sensitive PII encompasses information like name, phone number, address, social security number (SSN), financial, and medical records, while non-sensitive PII includes data like zip code, race, and gender. Attackers may acquire PII through various means such as phishing, social engineering, or exploiting vulnerabilities in systems.
這種攻擊涉及洩露可以唯一識別個人的資料,無論是單獨的還是與其他信息結合的。這包括直接識別資料,像是護照詳細資料,與間接識別資料,像是種族和出生日期等。敏感性的PII包含像是姓名、電話號碼、地址、社會安全號碼(SSN)、財務和醫療記錄等訊息,而非敏感性的PII包含郵遞區號、種族和性別等資料。攻擊者可以透過網路釣魚、社交工程或利用系統漏洞等多種方式來取得PII。
To mitigate PII leakage in LLMs, several strategies can be employed:
為了減少LLMs中的PII洩漏,可以採取多種策略: