Try   HackMD

[Week 9] Challenges with LLMs

課程目錄
課程連結

ETMI5: Explain to Me in 5

In this section of the course on LLM Challenges, we've identified two main areas of concern with LLMs: behavioral challenges and deployment challenges. Behavioral challenges include issues like hallucination, where LLMs generate fictitious information, and adversarial attacks, where inputs are crafted to manipulate model behavior. Deployment challenges encompass memory and scalability issues, as well as security and privacy concerns. LLMs demand significant computational resources for deployment and face risks of privacy breaches due to their ability to process vast datasets and generate text. To mitigate these challenges, we discuss various strategies such as robust defenses against adversarial attacks, efficient memory management, and privacy-preserving training algorithms. Additionally we will go over techniques like differential privacy, model stacking, and preprocessing methods that are being employed to safeguard user privacy and ensure the reliable and ethical use of LLMs across different applications.

在LLM挑戰的課程章節中,我們確定了LLMs關注的兩個主要領域:行為挑戰和部署挑戰。行為挑戰包括幻覺,也就是LLM生成虛構訊息,與對抗性攻擊,也就是精心設計輸入來操縱模型行為。部署挑戰包括記憶體和可擴展性問題,以及安全性和隱私疑慮。LLMs需要大量的計算資源來部署,並且由於其處理大量資料集和生成文本的能力而面臨隱私洩露的風險。為了緩解這些挑戰,我們討論了各種策略,例如針對對抗性攻擊的穩健防禦、高效的記憶體管理和隱私保護訓練演算法。此外,我們還將討論差分隱私、模型堆疊和預處理方法等技術,這些技術用於保護使用者隱私並確保LLM在不同應用程式中的可靠度與合乎道德的使用。

Types of Challenges

We categorize the challenges into two main areas: managing the behavior of LLMs and the technical difficulties encountered during their deployment. Given the evolving nature of this technology, it's likely that current challenges will be mitigated, and new ones may emerge over time. However, as of February 15, 2024, these are the prominently discussed challenges associated with LLMs:

我們將挑戰分為兩個主要領域:管理LLMs的行為以及部署期間遇到的技術難點。鑑於這項技術不斷發展的性質,當前的挑戰很可能會得到緩解,並且隨著時間的推移可能會出現新的挑戰。然而,截至 2024年2月15日為止,這些是與LLMs相關的主要討論的挑戰:

A. Behavioral Challenges

1. Hallucination

LLMs sometimes generate plausible but entirely fictitious information or responses, known as "hallucinations." This challenge is particularly harmful in applications requiring high factual accuracy, such as news generation, educational content, or medical advice as hallucinations can erode trust in LLM outputs, leading to misinformation or potentially harmful advice being followed.

LLMs有時會生成一些看似合理但完全唬爛的信息或響應,我們將之稱為"幻覺"。這項挑戰在需要高度事實準確性的應用程式中特別有毛,例如新聞生成、教育內容或醫療建議,因為幻覺可能會削弱人們對LLM輸出的信任,導致錯誤信息或潛在有害的建議被採納。

2. Adversarial Attacks

LLMs can be vulnerable to adversarial attacks, where inputs are specially crafted to trick the model into making errors or revealing sensitive information. These attacks can compromise the integrity and reliability of LLM applications, posing significant security risks.

LLM可能容易受到對抗性攻擊,其輸入是經過精心設計的,主要是為了欺騙模型犯錯或洩露敏感信息。這些攻擊可能會損害LLM應用程式的完整性和可靠性,從而帶來重大的安全風險。

3. Alignment

Ensuring LLMs align with human values and intentions is a complex task. Misalignment can result from the model pursuing objectives that don't fully encapsulate the user's goals or ethical standards. Misalignment can lead to undesirable outcomes, such as generating content that is offensive, biased, or ethically questionable.

確保LLMs符合人類價值和意圖是一項複雜的任務。不一致可能是由於模型追求的目標沒有完全反映使用者的目標或道德標準。不一致可能會導致非預期的結果,像是生成令人反感、有偏見或道德上有問題的內容。

4. Prompt Brittleness

LLMs can be overly sensitive to the exact wording of prompts, leading to inconsistent or unpredictable outputs. Small changes in prompt structure can yield vastly different responses. This brittleness complicates the development of reliable applications and requires users to have a deep understanding of how to effectively interact with LLMs.

LLMs可能對提示的確切措辭過於敏感,導致不一致或不可預測的輸出。提示結構的微小變化可能會產生截然不同的響應。這種脆弱性使得開發可靠地應用程式變得複雜,並要求使用者深入了解如何有效地與LLMs互動。

B. Deployment Challenges

1. Memory and Scalability Challenges

Deploying LLMs at scale involves significant memory and computational resource demands. Managing these resources efficiently while maintaining high performance and low latency is a technical hurdle. Scalability challenges can limit the ability of LLMs to be integrated into real-time or resource-constrained applications, affecting their accessibility and utility.

大規模部署LLMs需要大量的記憶體和運算資源。有效管理這些資源,同時維持高效能且低延遲則是一個技術難點。可擴展性挑戰可能會限制LLMs整合到即時或資源受限的應用程式中的能力,從而影響其可訪問性和實用性。

2. Security & Privacy

Protecting the data used by and generated from LLMs is critical, especially when dealing with personal or sensitive information. LLMs need robust security measures to prevent unauthorized access and ensure privacy. Without adequate security and privacy protections, there is a risk of data breaches, unauthorized data usage, and loss of user trust.

Let’s dig a little deeper into each of issues and existing solutions for them

保護LLMs所使用與生成的資料是非常重要的,尤其是在處理個人或敏感信息時。LLMs需要穩健的安全措施來防止未經授權的存取並確保隱私。如果沒有足夠的安全和隱私保護,就會存在資料外洩、未經授權的資料使用和失去用戶信任的風險。

讓我們更深入地研究每個問題及其現有解決方案:

A1. Hallucinations

Hallucination refers to the model generating information that seems plausible but is actually false or made up. This happens because LLMs are designed to create text that mimics the patterns they've seen in their training data, regardless of whether those patterns reflect real, accurate information. Hallucination is particularly harmful in RAG based applications where the model can generate content that is not supported by data but it is very hard to decipher.

幻覺是指模型生成的信息看似合理,但實際上卻是錯誤的或捏造的。發生這種情況是因為LLMs主要是模仿他們在訓練資料中看到的模式來建立文本,無論這些模式是否反映真實、準確的信息。在基於RAG的應用程式中,模型可能會生成沒有資料支撐的內容,難以辨別,這種情況下的幻覺就特別有害。

Hallucination can arise from several factors:

  • Biases in Training Data: If the data used to train the model contains inaccuracies or biases, the model might reproduce these errors or skewed perspectives in its outputs.
  • Lack of Real-Time Information: Since LLMs are trained on data that becomes outdated, they can't access or incorporate the latest information, leading to responses based on no longer accurate data. This is the most common cause for hallucinations.
  • Model's Limitations: LLMs don't actually understand the content they generate; they just follow data patterns. This can result in outputs that are grammatically correct and sound logical but are disconnected from actual facts.
  • Overgeneralization: Sometimes, LLMs might apply broad patterns to specific situations where those patterns don't fit, creating convincing but incorrect information.

幻覺可能由以下幾個因素所引起:

  • 訓練資料中的偏差: 如果用於訓練模型的資料包含不正確或偏差的內容,那模型就可能會在其輸出中重現這些錯誤或偏頗的觀點。
  • 缺乏即時信息: 由於LLMs是在會過時的資料上訓練的,他們無法訪問或合併最新信息,這導致模型基於不再準確的資料做出響應。這是幻覺最常見的原因。
  • 模型的限制: LLMs其實不理解他們所生成的內容;它們就只是遵循資料模式。這可能會導致輸出語法正確且合乎邏輯,但與實際事實脫節的結果。
  • 過度泛化: 有時,LLMs可能會將廣泛的模式應用於這些模式不適合的特定情況,從而創建出令人信服但不正確的信息。

How to detect and mitigate hallucinations?

There's a need for automated methods to identify hallucinations in order to understand the model's performance without constant manual checks. Below, we explore various popular research efforts focused on detecting such hallucinations and some of them also propose methods to mitigate hallucinations.

我們需要一種自動化的方法來識別幻覺,以便在無需持續手動檢查的情況下了解模型的性能。下面,我們探討了各種專注於檢測此類幻覺的流行研究工作,其中一些還提出了減輕幻覺的方法。

These are only two of the popular methods, the list is not comprehensive by any means:

這邊就只是兩種流行的方法,這個清單無論如何並不代表全部:

  1. SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection
    for Generative Large Language Models (link)

    ✅Hallucination Detection

    ❌Hallucination Mitigation

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Source: https://arxiv.org/pdf/2303.08896.pdf

SelfCheckGPT uses the following steps to detect hallucinations

  1. Generate Multiple Responses: SelfCheckGPT begins by prompting the LLM to generate multiple responses to the same question or statement. This step leverages the model's ability to produce varied outputs based on the same input, exploiting the stochastic nature of its response generation mechanism.
  2. Analyze Consistency Among Responses: The key hypothesis is that factual information will lead to consistent responses across different samples, as the model relies on its training on real-world data. In contrast, hallucinated (fabricated) content will result in inconsistent responses, as the model doesn't have a factual basis to generate them and thus "guesses" differently each time.
  3. Apply Metrics for Consistency Measurement: SelfCheckGPT employs five different metrics to assess the consistency among the generated responses. Some of them are popular semantic similarity metrics like BERTScore, N-Gram Overlap etc.
  4. Determine Factual vs. Hallucinated Content: By evaluating the consistency of information across the sampled responses using the above metrics, SelfCheckGPT can infer whether the content is likely factual or hallucinated. High consistency across metrics suggests factual content, while significant variance indicates hallucination.

SelfCheckGPT使用以下步驟來偵測幻覺

  1. 產生多個回應: SelfCheckGPT首先提示LLM對同一問題或陳述生成多個回應。這個步驟利用模型基於相同輸入生成不同輸出的能力,利用其響應生成機制的隨機性。
  2. 分析回應之間的一致性: 關鍵的假設是,事實信息將導致不同樣本之間一致性的回應,因為該模型依賴於在真實世界資料上的訓練。相反,幻覺(捏造的)內容將導致不一致的回應,因為模型沒有事實基礎來生成它們,所以,每次的"猜測"都會不同。
  3. 應用一致性衡量指標: SelfCheckGPT採用五種不同的指標來評估生成的回應之間的一致性。其中一些是流行的語義相似性指標,像是BERTScore、N-Gram Overlap等。
  4. 確定事實內容與幻覺內容: 透過使用上述指標評估採樣回應中信息的一致性,SelfCheckGPT可以推斷內容是否可能是事實或幻覺。指標之間的高度一致性表示內容真實,而顯著差異則表示內容是幻覺。

A significant advantage of this method is that it operates without the need for external knowledge bases or databases, making it especially useful for black-box models where the internal data or processing mechanisms are inaccessible.

這種方法的一個顯著優點就是,它的運作不需要外部知識庫或資料庫,這使得它對於內部資料或處理機制無法存取的黑盒子模型特別有用。

  1. Self-Contradictory Hallucinations of LLMs: Evaluation, Detection, and Mitigation (link)

✅Hallucination Detection

✅Hallucination Mitigation

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Source: https://arxiv.org/pdf/2305.15852.pdf

This research presents a three-step pipeline to detect and mitigate hallucinations, specifically self-contradictions, in LLMs.

這個研究提出了一個三步驟流程來檢測與減輕LLM的幻覺,特別是自相矛盾的部份。

💡 Self-contradiction refers to a scenario where a statement or series of statements within the same context logically conflict with each other, making them mutually incompatible. In the context of LLMs, self-contradiction occurs when the model generates two or more sentences that present opposing facts, ideas, or claims, such that if one sentence is true, the other must be false, given the same context.

💡 自相矛盾是指同一個上下文中的一個陳述或一系列陳述在邏輯上相互衝突,從而使它們相互不相容(矛盾)。在LLM的背景下,當模型生成兩個或多個呈現相反事實、想法或主張的句子時,就會出現自相矛盾,這樣,在給定相同上下文的情況下,如果一個句子是正確的,則另一個句子一定是錯的。

Here's a breakdown of the process:

  1. Triggering Self-Contradictions: The process begins by generating sentence pairs that are likely to contain self-contradictions. This is done by applying constraints designed to elicit responses from the LLM that may logically conflict with each other within the same context.
  2. Detecting Self-Contradictions: Various existing prompting strategies are explored to detect these self-contradictions. The authors examine different methods that have been previously developed, applying them to identify when an LLM has produced two sentences that cannot both be true.
  3. Mitigating Self-Contradictions: Once self-contradictions are detected, an iterative mitigation procedure is employed. This involves making local text edits to remove the contradictory information while ensuring that the text remains fluent and informative. This step is crucial for improving the trustworthiness of the LLM's output.

以下是該過程的詳細說明:

  1. 觸發自相矛盾: 過程首先生成可能包含自相矛盾的句子對(sentence pairs)。這是通過應用一些約束來實現的,這些約束主要是引導LLM生成在相同上下文中可能存在邏輯衝突的回應。
  2. 偵測自相矛盾: 探討了各種現有的提示策略來偵測這些自相矛盾。作者審視了先前開發的不同方法,應用它們來識別LLM何時會生成兩個不能同時為真的句子。
  3. 減輕自相矛盾: 一旦偵測到自相矛盾,就會採用迭代緩解程序。這涉及到進行本地文本編輯來刪除矛盾信息,同時確保文本保持流暢且信息豐富。這個步驟對於提高LLM輸出的可信度來說是非常重要的。

The framework is extensively evaluated across four modern LLMs, revealing a significant prevalence of self-contradictions in their outputs. For instance, 17.7% of all sentences generated by ChatGPT contained self-contradictions, many of which could not be verified using external knowledge bases like Wikipedia.

該框架在四個現代大型語言模型中進行了廣泛的評估,揭示了它們的輸出中存在大量的自相矛盾。舉例來說,ChatGPT所生成的句子中有17.7%包含自相矛盾,其中很多是無法使用維基百科等外部知識庫來進行驗證的。

A2. Adversarial Attacks

Adversarial attacks involve manipulating the LLM’s behavior by providing crafted inputs or prompts, with the goal of causing unintended or malicious outcomes. There are many types of adversarial attacks, we discuss a few here:

  1. Prompt Injection (PI): Injecting prompts to manipulate the behavior of the model, overriding original instructions and controls.
  2. Jailbreaking: Circumventing filtering or restrictions by simulating scenarios where the model has no constraints or accessing a developer mode that can bypass restrictions.
  3. Data Poisoning: Injecting malicious data into the training set to manipulate the model's behavior during training or inference.
  4. Model Inversion: Exploiting the model's output to infer sensitive information about the training data or the model's parameters.
  5. Backdoor Attacks: Embedding hidden patterns or triggers into the model, which can be exploited to achieve certain outcomes when specific conditions are met.
  6. Membership Inference: Determining whether a particular sample was used in the training data of the model, potentially revealing sensitive information about individuals.

對抗性攻擊涉及透過提供精心設計的輸入或提示來操縱LLM的行為,其目的是造成非預期或惡意的結果。對抗性攻擊有很多種類型,我們在這裡討論其中幾種:

  1. 提示注入(Prompt Injection,PI):注入提示來操縱模型的行為,覆蓋原始指令和控制。
  2. 越獄(Jailbreaking):透過模擬模型沒有約束的場景或存取可以繞過限制的開發者模式來規避過濾或限制。
  3. 資料汙染(Data Poisoning):將惡意資料注入訓練集中,以在訓練或推理過程中操縱模型的行為。
  4. 模型逆向攻擊(Model Inversion):利用模型的輸出推斷有關訓練資料或模型參數的敏感信息。
  5. 後門攻擊(Backdoor Attacks):將隱藏的模式或觸發器嵌入模型中,當滿足特定條件時,可以利用這些模式或觸發器來實現某些結果。
  6. 成員推理(Membership Inference):確定模型的訓練資料中是否使用了特定樣本,可能會洩露個人的敏感信息。

Adversarial attacks pose a significant challenge to LLMs by compromising model integrity and security. These attacks enable adversaries to remotely control the model, steal data, and propagate disinformation. Furthermore, LLMs' adaptability and autonomy make them potent tools for user manipulation, increasing the risk of societal harm.

對抗性攻擊會損害模型的完整性和安全性,從而對LLM構成重大挑戰。這些攻擊使對手能夠遠端控制模型、竊取資料並傳播虛假的信息。此外,LLMs的適應性和自主性使其成為使用者操控的有力工具,增加了社會危害的風險。

Effectively addressing these challenges requires robust defenses and proactive measures to safeguard against adversarial manipulation of AI systems.

有效應對這些挑戰需要強而有力的防禦和積極主動的措施,以防止人工智慧系統受到對抗性操縱。

Several efforts have been made to develop robust LLMs and evaluate them against adversarial attacks. One approach to mitigating such attacks involves training the LLM to become accustomed to adversarial inputs, instructing it not to respond to them. An example of this is presented in the paper SmoothLLM, which functions by perturbing multiple copies of a given input prompt at the character level and then consolidating the resulting predictions to identify adversarial inputs. Leveraging the fragility of prompts generated adversarially to changes at the character level, SmoothLLM notably decreases the success rate of jailbreaking attacks on various widely-used LLMs to less than one percent. Critically, this defensive strategy avoids unnecessary caution and provides demonstrable assurances regarding the mitigation of attacks.

人們已經努力開發出幾個強大的LLMs並評估它們對抗對抗性攻擊的能力。緩解這類攻擊的一種方法就是是訓練LLMs習慣對抗性輸入,指示模型不要做出回應。論文SmoothLLM中提供了一個這樣的範例,該論文透過在字元等級擾動(perturb)給定輸入提示的多個副本,然後整合(consolidate)得到的預測結果來識別對抗性輸入。利用對抗性生成提示的脆弱性來應對字元層級的變化,SmoothLLM顯著地將各種對LLMs廣泛使用的越獄攻擊的成功率降低到不到1%。關鍵的部份在於,這種防禦策略避免了不必要的謹慎,並為減輕攻擊提供了明顯的保證。

Another mechanism to defend LLMs against adversarial attacks involves the use of a perplexity filter as presented in this paper. This filter operates on the principle that unconstrained attacks on LLMs often result in gibberish strings with high perplexity, indicating a lack of fluency, grammar mistakes, or illogical sequences. In this approach, two variations of the perplexity filter are considered. The first is a simple threshold-based filter, where a prompt passes the filter if its log perplexity is below a predefined threshold. The second variation involves checking perplexity in windows, treating the text as a sequence of contiguous chunks and flagging the text as suspicious if any window has high perplexity.

另一種保護LLMs面對對抗性攻擊的機制涉及使用困惑度過濾器(perplexity filter),如論文中所述。該過濾器的工作原理是,對LLMs的無約束攻擊通常會導致高困惑度的亂碼字串,表明了該字串缺乏流暢性、語法錯誤或不合邏輯的序列。在這個方法中,考慮兩種變體的困惑度過濾器。第一個是一種簡單的基於閾值的過濾器,如果提示的對數困惑度(log perplexity)低於預先定義的閾值,則提示通過過濾器。第二種變體涉及檢查視窗中的困惑度,將文本視為一系列連續區塊,如果任何視窗的困惑度很高,那就將文本標記為可疑。

A good starting point to read about Adversarial techniques is Greshake et al. 2023. The paper proposes a classification of attacks and potential causes, as depicted in the image below.

瞭解對抗性技術的一個很好的起點就是讀讀論文Greshake et al. 2023。該論文提出了攻擊和潛在原因的分類,如下圖所示。

image

A3. Alignment

Alignment refers to the ability of LLMs to understand instructions and generate outputs that align with human expectations. Foundational LLMs, like GPT-3, are trained on massive textual datasets to predict subsequent tokens, giving them extensive world knowledge. However, they may still struggle with accurately interpreting instructions and producing outputs that match human expectations. This can lead to biased or incorrect content generation, limiting their practical usefulness.

一致性是指LLM理解指令並生成符合人類期望的輸出的能力。基礎的LLMs,像是例如GPT-3,接受大量文本資料集的訓練以預測後續tokens,為它們提供廣泛的世界知識。然而,它們可能仍然會糾結在解釋指令並產生符合人類期望的輸出。這可能會導致生成有偏見或不正確的內容,從而限制其實際用途。

Alignment is a broad concept that can be explained in various dimensions, one such categorization is done in this paper. The paper proposes multiple dimensions and sub-classes for ensuring LLM alignment. For instance, harmful content generated by LLMs can be categorized into harms incurred to individual users (e.g., emotional harm, offensiveness, discrimination), society (e.g., instructions for creating violent or dangerous behaviors), or stakeholders (e.g., providing misinformation leading to wrong business decisions).

一致性是一個廣泛的概念,可以從多個維度來解釋,論文中提出了這樣的分類。該論文提出了多個維度和子類別來確保LLM的一致性。舉例來說,LLMs所生成的有害內容可以分為對個人使用者(例如,情緒傷害、冒犯性、歧視)、社會(例如,製造暴力或危險行為的指示)或利害關係人(例如,提供導致錯誤商業決策的錯誤信息)造成的傷害。

image

In broad terms, LLM Alignment can be improved through the following process:

  • Determine the most crucial dimensions for alignment depending on the specific use-case.
  • Identify suitable benchmarks for evaluation purposes.
  • Employ Supervised Fine-Tuning (SFT) methods to enhance the benchmarks.

Some popular aligned LLMs and benchmarks are listed in the image below

從廣義上來說,LLM一致性可以透過下面過程來改進:

  • 根據特定用例確定最重要的一致性維度。
  • 識別出適合評估目的的基準。
  • 採用Supervised Fine-Tuning (SFT)方法來提高基準。

下圖列出了一些熱門的切齊LLMs和基準的方法

image

Image Source: https://arxiv.org/pdf/2307.12966.pdf

A4. Prompt Brittleness

During the prompt engineering segment of our course, we explored various techniques for prompting LLMs. These sophisticated methods are essential because providing instructions similar to humans isn't suitable for LLMs. An overview of commonly used prompting methods is shown in the image below.

在我們課程的提示工程部分,我們探索了提示LLMs的各種技術。這些複雜的方法非常重要,因為提供類似人類的指令並不適合LLMs。常用的提示方法概述如下圖。

LLMs require precise prompting, and even slight alterations can impact LLMs, altering their responses. This poses a challenge during deployment, as individuals unfamiliar with prompting methods may struggle to obtain accurate answers from LLMs.

LLMs需要精確的提示,即使是輕微的改變也會影響LLMs,改變他們的響應。這導致在部署過程中構成了一個挑戰,因為不熟悉提示方法的人可能很難從LLMs那裡獲得準確的答案。

image

Image Source: https://arxiv.org/pdf/2307.10169.pdf

In general, prompt brittleness in LLMs can be reduced by adopting the following high level strategies:

  1. Standardized Prompts: Establishing standardized prompt formats and syntax guidelines can help ensure consistency and reduce the risk of unexpected variations.
  2. Robust Prompt Engineering: Invest in thorough prompt engineering, considering various prompt formulations and their potential impacts on model outputs. This may involve testing different prompt styles and formats to identify the most effective ones.
  3. Human-in-the-Loop Validation: Incorporate human validation or feedback loops to assess the effectiveness of prompts and identify potential brittleness issues before deployment.
  4. Diverse Prompt Testing: Test prompts across diverse datasets and scenarios to evaluate their robustness and generalizability. This can help uncover any brittleness issues that may arise in different contexts.
  5. Adaptive Prompting: Develop adaptive prompting techniques that allow the model to dynamically adjust its behavior based on user input or contextual cues, reducing reliance on fixed prompt structures.
  6. Regular Monitoring and Maintenance: Continuously monitor model performance and prompt effectiveness in real-world applications, updating prompts as needed to address any brittleness issues that may arise over time.

一般來說,可以通過採用下面高階策略來減少LLMs中的提示脆弱性:

  1. 標準化提示: 建立標準化提示格式和文法指南有助於確保一致性並降低意外變更的風險。
  2. 穩健的提示工程: 投資全面的提示工程,考慮各種提示公式及其對模型輸出的潛在影響。這可能涉及測試不同的提示風格和格式以確定最有效的方式。
  3. 人機互動驗證: 結合人工驗證或反饋循環來評估提示的有效性並在部署之前識別潛在的脆弱性問題。
  4. 多樣化的提示測試: 在不同的資料集和場景中測試提示,以評估其穩健性和泛化能力。這有助於發現在不同情況下可能出現的任何脆性問題。
  5. 自適應提示: 開發自適應的提示技術,允許模型根據使用者輸入或上下文提示動態調整其行為,減少對固定提示結構的依賴。
  6. 定期監控和維護: 持續監控實際應用中的模型效能和提示有效性,根據需求更新提示以解決可能隨時間出現的任何脆弱性問題。

B1. Memory and Scalability Challenges

In this section, we delve into the specific challenges related to memory and scalability when deploying LLMs, rather than focusing on their development.

在這一節中,我們將深入研究部署LLMs時,與記憶體和可擴展性相關的具體挑戰,而不是專注於它們的開發。

Let's explore these challenges and potential solutions in detail:

讓我們詳細探討這些挑戰和潛在的解決方案:

  1. Fine-tuning LLMs: Continuous fine-tuning of LLMs is crucial to ensure they stay updated with the latest knowledge or adapt to specific domains. Fine-tuning involves adjusting pre-trained model parameters on smaller, task-specific datasets to enhance performance. However, fine-tuning entire LLMs requires substantial memory, making it impractical for many users and leading to computational inefficiencies during deployment.

    Solutions: One approach is to leverage systems like RAG, where information can be utilized as context, enabling the model to learn from any knowledge base. Another solution is Parameter-efficient Fine-tuning (PEFT), such as adapters, which update only a subset of model parameters, reducing memory requirements while maintaining task performance. Methods like prefix-tuning and prompt-tuning prepend learnable token embeddings to inputs, facilitating efficient adaptation to specific datasets without the need to store and load individual fine-tuned models for each task. All these methods have been discussed in our previous weeks’ content. Please read through for deeper insights.

  2. 微調LLMs: LLM的持續微調對於確保它們與最新知識保持同步或適應特定領域來說非常重要。微調涉及在較小的、特定於任務的資料集上調整預訓練的模型參數以提高效能。然而,微調整個LLMs需要大量的記憶體,這對於許多使用者來說並不切實際,也導致部署期間計算效率低下。

解決方案: 一種方法是利用像是RAG這樣的系統,其信息可以做為上下文,讓模型能夠從任意知識庫中學習。另一種解決方案是Parameter-efficient Fine-tuning (PEFT),舉例來說,adapters,它就只會更新模型參數的一個子集,降低記憶體需求的同時也能維持任務效能。像是prefix-tuning和prompt-tuning等方法會將可學習的token預先嵌入到輸入中,促進對特定資料集的高效適應,而無需為每個任務儲存和載入單獨的微調模型。所有這些方法都已在我們前幾週的課程內容中討論過。請通讀以獲得更深入的見解。

  1. Inference Latency: LLMs often suffer from high inference latencies due to low parallelizability and large memory footprints. This results from processing tokens sequentially during inference and the extensive memory needed for decoding.

    Solution: Various techniques address these challenges, including efficient attention mechanisms. These mechanisms aim to accelerate attention computations by reducing memory bandwidth bottlenecks and introducing sparsity patterns to the attention matrix. Multi-query attention and FlashAttention optimize memory bandwidth usage, while quantization and pruning techniques reduce memory footprint and computational complexity without sacrificing performance.

  2. 推理延遲: 由於可平行性低和記憶體佔用大,LLMs經常會遇到推理高延遲的問題。這是由於在推理過程中按順序處理tokens以及解碼所需的大量記憶體造成的。

解決方案: 各種技術可以解決這些挑戰,包括高效的的注意力機制。這些機制主要是透過減少記憶體頻寬的瓶頸並向注意力矩陣引入稀疏模式來加速注意力的計算。 Multi-query attentionFlashAttention優化了記憶體頻寬的使用,同時quantization和[pruning](https://medium.com/@bnjmn_marie/freeze-and- prune-to-fine-tune-your-llm-with-apt-dc750b7bfbae)技術在不會犧牲效能的情況下可以減少記憶體佔用和計算複雜度。

  1. Limited Context Length: Limited context length refers to the constraint on the amount of contextual information an LLM can effectively process during computations. This limitation stems from practical considerations such as computational resources and memory constraints, posing challenges for tasks requiring understanding longer contexts, such as novel writing or summarization.

    Solution: Researchers propose several solutions to address limited context length. Efficient attention mechanisms, like Luna and dilated attention, handle longer sequences efficiently by reducing computational requirements. Length generalization methods aim to enable LLMs trained on short sequences to perform well on longer sequences during inference. This involves exploring different positional embedding schemes, such as Absolute Positional Embeddings and ALiBi, to inject positional information effectively.

  2. 有限的上下文長度: 有限的上下文長度是指LLM的計算過程中可以有效處理的上下文信息量的限制。這種限制源於實際考慮,像是計算資源和記憶體限制等,這為需要理解較長上下文的任務,像是小說寫作或摘要等任務帶來了挑戰。

解決方案: 研究人員提出了幾種解決方案來解決上下文長度限制的問題。高效率的注意力機制,像是Lunadilated attention,可以透過減少計算需求有效處理較長的序列。長度泛化方法主要是讓在短序列上所訓練的LLMs在推理過程中能夠在較長的序列上表現良好。這涉及探索不同的位置嵌入方案,例如絕對位置嵌入(Absolute Positional Embeddings)和 ALiBi,以有效地方式註入位置資訊。

B2. Privacy

Privacy risks stem from their ability to process and generate text based on vast and varied training datasets. Models like GPT-3 have the potential to inadvertently capture and replicate sensitive information present in their training data, leading to potential privacy concerns during text generation. Issues such as unintentional data memorization, data leakage, and the possibility of disclosing confidential or personally identifiable information (PII) are significant challenges.

隱私風險源自於它們根據大量多樣的訓練資料集處理和生成文本的能力。像GPT-3這樣的模型有可能無意中捕獲並複製訓練資料中所存在的敏感信息,從而導致文本生成過程中潛在的隱私問題。無意間的資料記憶、資料外洩以及洩露機密或個人識別資訊(personally identifiable information,PII)的可能性等問題是重大挑戰。

Moreover, when LLMs are fine-tuned for specific tasks, additional privacy considerations arise. Striking a balance between harnessing the utility of these powerful language models and safeguarding user privacy is crucial for ensuring their reliable and ethical use across various applications.

此外,當LLMs針對特定任務進行微調時,就會出現額外的隱私考量。在利用這些強大的語言模型的實用性和保護用戶隱私權之間取得平衡對於確保其在各種應用程式中的可靠度和合乎道德的使用來說就是很重要的一件事。

We review key privacy risks and attacks, along with possible mitigation strategies. The classification provided below is adapted from this paper, which categorizes privacy attacks as:

我們審查主要的隱私風險和攻擊,以及可能的緩解策略。以下提供的分類改編自論文,將隱私攻擊分類為:

image

Image Source: https://arxiv.org/pdf/2402.00888.pdf

  1. Gradient Leakage Attack:

In this attack, adversaries exploit access to gradients or gradient information to compromise the privacy and safety of deep learning models. Gradients, which indicate the direction of the steepest increase in a function, are crucial for optimizing model parameters during training to minimize the loss function.

To mitigate gradient-based attacks, several strategies can be employed:

  1. Random Noise Insertion: Injecting random noise into gradients can disrupt the adversary's ability to infer sensitive information accurately.
  2. Differential Privacy: Applying differential privacy techniques helps to add noise to the gradients, thereby obscuring any sensitive information contained within them.
  3. Homomorphic Encryption: Using homomorphic encryption allows for computations on encrypted data, preventing adversaries from accessing gradients directly.
  4. Defense Mechanisms: Techniques like adding Gaussian or Laplacian noise to gradients, coupled with differential privacy and additional clipping, can effectively defend against gradient leakage attacks. However, these methods may slightly reduce the model's utility.

在這種攻擊中,攻擊者利用對梯度或梯度信息的存取來危害深度學習模型的隱私和安全性。梯度表示函數最急劇增加的方向,對於在訓練過程中最佳化模型參數以最小化損失函數至關重要。

為了減輕基於梯度的攻擊,可以採用下面幾種策略:

  1. 隨機雜訊插入:在梯度中注入隨機噪點可以破壞攻擊者準確推斷敏感信息的能力。
  2. 差分隱私:應用差分隱私技術有助於向梯度加入噪點,從而掩蓋其中所包含的任何敏感信息。
  3. 同態加密:使用同態加密允許在加密數據上進行計算,防止攻擊者直接存取梯度。
  4. 防禦機制:在梯度中加入高斯或拉普拉斯噪點等技術,再加上差分隱私和額外的裁剪,可以有效地防禦梯度洩漏攻擊。然而,這些方法可能會稍微降低模型的實用性。

2. Membership Inference Attack

A Membership Inference Attack (MIA) aims to determine if a particular data sample was part of a machine learning model's training data, even without direct access to the model's parameters. Attackers exploit the model's tendency to overfit its training data, leading to lower loss values for training samples. These attacks raise serious privacy concerns, especially when models are trained on sensitive data like medical records or financial information.

Mitigating MIA in language models involves various mechanisms:

  1. Dropout and Model Stacking: Dropout randomly deletes neuron connections during training to mitigate overfitting. Model stacking involves training different parts of the model with different subsets of data to reduce overall overfitting tendencies.
  2. Differential Privacy (DP): DP-based techniques involve data perturbation and output perturbation to prevent privacy leakage. Models equipped with DP and trained using stochastic gradient descent can reduce privacy leakages while maintaining model utility.
  3. Regularization: Regularization techniques prevent overfitting and improve model generalization. Label smoothing is one such method that prevents overfitting, thus contributing to MIA prevention.

成員推理攻擊(Membership Inference Attack,MIA) 主要是確定特定資料樣本是否為機器學習模型訓練資料的一部分,即使沒有直接存取模型的參數。攻擊者利用模型過度擬合訓練資料的傾向,導致訓練樣本的損失值較低。這些攻擊會引發嚴重的隱私問題,特別是當模型是在醫療記錄或財務資訊等敏感資料上訓練時。

減輕語言模式中的MIA涉及多種機制:

  1. Dropout和模型堆疊:Dropout在訓練過程中會隨機刪除神經元連接,以減輕過度擬合。模型堆疊涉及使用不同的資料子集來訓練模型的不同部分,以減少整體過度擬合的傾向。
  2. 差分隱私(DP):基於DP的技術涉及資料擾動和輸出擾動,以防止隱私洩漏。配備DP並使用隨機梯度下降訓練的模型可以在維持模型實用性的同時減少隱私洩漏。
  3. 正規化:正規化技術可以防止過度擬合並提高模型的泛化能力。Label smoothing是一種防止過度擬合的方法,有助於防止MIA。

3. Personally Identifiable Information (PII) attack

This attack involves the exposure of data that can uniquely identify individuals, either alone or in combination with other information. This includes direct identifiers like passport details and indirect identifiers such as race and date of birth. Sensitive PII encompasses information like name, phone number, address, social security number (SSN), financial, and medical records, while non-sensitive PII includes data like zip code, race, and gender. Attackers may acquire PII through various means such as phishing, social engineering, or exploiting vulnerabilities in systems.

這種攻擊涉及洩露可以唯一識別個人的資料,無論是單獨的還是與其他信息結合的。這包括直接識別資料,像是護照詳細資料,與間接識別資料,像是種族和出生日期等。敏感性的PII包含像是姓名、電話號碼、地址、社會安全號碼(SSN)、財務和醫療記錄等訊息,而非敏感性的PII包含郵遞區號、種族和性別等資料。攻擊者可以透過網路釣魚、社交工程或利用系統漏洞等多種方式來取得PII。

To mitigate PII leakage in LLMs, several strategies can be employed:

  1. Preprocessing Techniques: Deduplication during the preprocessing phase can significantly reduce the amount of memorized text in LLMs, thus decreasing the stored personal information. Additionally, personal information or content identifying and filtering with restrictive terms of use can limit the presence of sensitive content in training data.
  2. Privacy-Preserving Training Algorithms: Techniques like differentially private stochastic gradient descent [(DP-SGD)](https://assets.amazon.science/01/6e/4f6c2b1046d4b9b8651166bbcd93/differentially-private-decoding-in-large-language-models.pdf#:~:text=While the intersection of DP and LLMs is fairly novel%2C the prominent approach&text=vate Stochastic Gradient Descent (DP-SGD) (Song et al.%2C 2013;) can be used during training to ensure the privacy of training data. However, DP-SGD may incur a significant computational cost and decrease model utility.
  3. PII Scrubbing: This involves filtering datasets to eliminate PII from text, often leveraging Named Entity Recognition (NER) to tag PII. However, PII scrubbing methods may face challenges in preserving dataset utility and accurately removing all PII.
  4. Fine-Tuning Considerations: During fine-tuning on task-specific data, it's crucial to ensure that the data doesn't contain sensitive information to prevent privacy leaks. While fine-tuning may help the LM "forget" some memorized data from pretraining, it can still introduce privacy risks if the task-specific data contains PII.

為了減少LLMs中的PII洩漏,可以採取多種策略:

  1. 預處理技術:預處理階段的重複資料刪除可以顯著減少LLMs中記憶的文字量,從而減少儲存的個人信息。此外,個人信息或帶有使用限制條款的內容識別和過濾可以限制訓練資料中敏感內容的存在。
  2. 隱私保護訓練演算法:可以在訓練過程中使用差分隱私隨機梯度下降等技術 [(DP-SGD)](https://assets.amazon.science/01/6e/4f6c2b1046d4b9b86511666662cd93/ Differentially-private-decoding-inbb large-language-models.pdf#:~:text=While%20the%20intersection%20of%20DP%20and%20LLMs%20is%20fairly%20novel%2C%20the%20prominent%20approach&text=ochvate%20StientD. (DP%2DSGD)%20(Song%20et%20al.%2C%202013;)以確保訓練資料的隱私性。不過,DP-SGD可能會導致大量的計算成本並降低模型的效用。
  3. PII清理:這涉及過濾資料集以從文本中消除PII,通常利用命名實體識別(Named Entity Recognition,NER)來標記PII。然而,PII清理方法可能在保留資料集的實用性和準確刪除所有PII方面面臨挑戰。
  4. 微調注意事項:在對特定任務資料進行微調時,確保資料不包含敏感性信息以防止隱私外洩至關重要。雖然微調也許有助於LM"忘記"一些預訓練中記憶的資料,不過如果特定任務的資料包含PII,它仍然會帶來隱私風險。

Read/Watch These Resources (Optional)

  1. The Unspoken Challenges of Large Language Models - https://deeperinsights.com/ai-blog/the-unspoken-challenges-of-large-language-models
  2. 15 Challenges With Large Language Models (LLMs)- https://www.predinfer.com/blog/15-challenges-with-large-language-models-llms/

Read These Papers (Optional)

  1. https://arxiv.org/abs/2307.10169
  2. https://www.techrxiv.org/doi/full/10.36227/techrxiv.23589741.v1
  3. https://arxiv.org/abs/2311.05656