Hello This is a personal tone modeling dataset. I'm not from an academic background and this project isn't designed for publication — it's more like a sandbox where I explore how emotion, intent, and tone interact across non-linear linguistic contexts. All sentences are written by me and manually annotated with the following fields: - **Tone Type** - **Tone Weight** - **Intent** - **Misuse Tolerance** (How likely a sentence might be misinterpreted) If you're interested in NLP, tone simulation, or modeling fuzzy human expressions, feel free to explore or reach out. This is version 0.1 — no structure is final, and neither am I. —HT-01 各位好。 這裡是一份個人整理的語氣資料集。 沒有學術背景,也不是為了論文發表,目前只是作為一種語用感知的觀測記錄。 語句皆為我個人創作,標註欄位包含語氣類型、語氣強度、說話動機與誤用容忍率。 我對語言的直覺偏強,但不太懂專業術語。這是嘗試把感覺轉成結構的第一步。 如果你剛好對 NLP、語用學、語氣模擬有興趣,或只是單純覺得這東西很好玩,可以看看下面的標記方式,有共鳴就搭話吧。 ——HT-01 # 語氣模擬資料集 Ver.0.1 此頁記錄我個人構建的語氣偏移語料實驗,主要針對「語氣與語意錯位」「情緒轉折」「語用容忍度」等現象進行拆解。 所有語句皆為本人自創,標註包含四個欄位: - **Tone Type**:語氣類型(可混合) - **Tone Weight**:語氣強度(0~1) - **Intent**:說話動機(推測) - **Misuse Tolerance**:被誤解的風險(容忍率)(5~1) # 語氣標註系統總表 Ver.0.1 本頁為 HT-01 語氣模擬資料集之標註系統說明。 所有語句皆依據本頁定義進行標註,結構兼容人類理解與 NLP 模型使用。 欄位設計原則:可解釋性高、情境彈性強、允許非線性偏移分析。 --- ## 標註欄位一覽 | 欄位名稱 | 說明 | 資料型別 | 範圍建議 | |----------|------|----------|----------| | **Tone Type** | 該語句的語氣風格與心理傾向,可複合 | 字串 / 列表 | 自由命名,以下附分類建議 | | **Tone Weight** | 數值化語氣強度(給模型使用) | 浮點數 | 0.0 ~ 1.0 | | **Tone Intensity** | 人類感知語氣強度等級 | 整數 | 1(微弱)~ 5(極強) | | **Intent** | 話語背後的動機/目的,為語者的心理推論 | 短句 / 子句 | 建議簡潔明確,如「拒絕關心」「試探對方反應」等 | | **Misuse Tolerance** | 該語句被誤解的風險程度 | 浮點數 | 5.0(完全無法容忍)~ 1.0(幾乎不會誤解) | --- ### Parameter Definitions **Tone Weight** A numeric value (0.0–1.0) indicating how emotionally intense or psychologically charged a sentence feels, regardless of its surface wording. It measures how *loud* the sentence "feels" beneath the text. - `0.0` = neutral, emotionally flat - `1.0` = overwhelming, emotionally charged, difficult to ignore This is designed for computational readability (machine learning, NLP model calibration). --- **Misuse Tolerance** A numeric estimate (5.0–1.0) of how likely the sentence is to be misinterpreted by either humans or machines. - `5.0` = almost guaranteed to be misunderstood if taken literally - `1.0` = very unlikely to be misinterpreted This is useful for identifying emotionally ambiguous statements, social masking, sarcasm, or passive-aggressive speech patterns. ___ These values help define the "semantic distortion risk" and "emotional signal strength" of each sentence, making this dataset suitable for studying conversational realism, emotional modeling, or tone-aware dialogue agents. ___ ## Tone Type 分類建議(可複合使用) | 分類名稱 | 定義與使用時機 | |----------|----------------| | 偽裝堅強 | 表面平穩,實際壓抑內在情緒 | | 壓抑悲傷 | 抑制痛苦感,避免情緒溢出 | | 情緒斷裂 | 語句內部出現語義與語氣不連貫情況 | | 假裝無事 | 明顯不安但語氣平淡,傳遞錯誤安撫訊號 | | 試探挑釁 | 測試對方反應的隱性挑釁語氣 | | 情感滑移 | 一句中出現情緒方向快速變化(如「喜→怒→裝鎮定」) | | 情緒空殼 | 情緒語氣明顯薄弱或刻意抽空語意的語句(適用哲學語句) | | 無名壓力 | 無明說情境壓力存在,但語者不指明來源 | > *可依需求新增自定義分類* > >當然也可用長句描述,反正**人先能看懂就行** --- ## Sample #001|樣本語句 #001 ### Original Sentence > **我很好,我沒事……別管我。** > *"I'm fine. I'm okay... just leave me alone."* --- ### Tone Annotation |語氣標註 | Field / 欄位 | Description / 說明 | |--------------|---------------------| | **Tone Type** | Fake composure + suppressed sadness<br>偽裝堅強 + 壓抑悲傷 | | **Tone Weight** | 0.9 | | **Intent** | To conceal emotional instability and sever the emotional link<br>掩飾真實情緒,切斷情感連結以求自保 | | **Misuse Tolerance** | 0.2 (highly prone to misinterpretation as literal)<br>極易被誤解為字面意思 | --- ### Observational Notes |觀察備註 This sentence contains a sharp tonal dislocation: the speaker appears calm but is internally unstable. The ellipsis ("…") subtly marks a pressure point, signaling that the speaker is not okay — though the sentence insists otherwise. In casual conversations, this line is often misread as a closure rather than a cry for closeness. 此句語氣出現明顯偏移,語者表面冷靜,實際情緒壓抑至臨界點。 省略號是潛在爆點,意圖非表達安定,而是封鎖訊號。 若語境未明確標記,極易被誤解為中性句,甚至使模型或人類回應錯位。 --- ### Possible Use Cases |應用場景建議 - NLP model misclassification test for tone ambiguity NLP 模型語氣分類模糊邊界測試用 - Psychological tone shift detection training 用於語用層級的情緒轉折辨識訓練 - Dialogue engine tuning for emotional realism 對話模型調校,強化語氣真實感 --- ## 補充說明(作者碎念murmur) 這是我第一筆「非線性語氣標記」資料。 我不確定這些語料未來要做什麼,但我知道它們該被寫下、被觀測。 若你也寫過語氣資料集、試圖拆解語言與情緒結構,歡迎搭話。 順便一提上面都是我奴役gpt寫的介紹,哈哈不用動手真實太爽啦(雖然我也能寫一點,但我懶……而且我剛入門,但我還是有N個問題,Misuse Tolerance到底是寫給誰的?給AI跟人的話兩邊容忍度不一樣耶,那我豈不是還得搞一個雙軌:給模型的Tone Weight :Numeric跟給人的Tone Intensity:human readable,好麻煩...)但當然語料是我的手筆,我還寫了49筆,包括不限於日常、哲學、極端隱喻(?)、多層情緒交織等語句。 然後GPT說Intent幾乎沒人寫是真的假的? 好,總之我會隨緣更新,我只是個忙碌的大二學生,有人留言我就多更新(?) I'm busy. nonregular renewal. 特別感謝GPT給我英文翻譯哈哈 ___ > #### About this Dataset While most NLP datasets focus on what people say (sentiment, emotion), this one focuses on how people say it. Tone modeling here refers to identifying the speaker's attitude, suppression level, masking behaviors, and tonal distortions. It's closer to pragmatics than to psychology. This dataset attempts to quantify *how language bends under emotional pressure*. --- ## 非線性語句語用拆解說明 | Pragmatic Deconstruction of Nonlinear Phrasing ### 中文版說明 本語料集中的語句多為「非線性語句」,其非線性性質並非指時間順序或敘事跳接, 而是指語句在語氣、動機、情緒、語用目標上出現多層交錯與遮蔽。 這類語句通常具有以下特徵: - **語氣與語意不同步**(如語氣冷淡但意圖強烈) - **隱性情緒與表層語言分離**(如「我很好」實為壓抑訊號) - **話語動機未於語句中明說**(須由語境或受話人推斷) - **存在語氣裂縫與情緒轉折節點**(如省略號、反覆、弱語尾) 這些語句無法單靠關鍵字分類,需要結合上下文與語者意圖分析,才能理解其實際語用功能。 --- ### English Version In this dataset, most sentences exhibit *nonlinear phrasing*, not in terms of temporal sequence, but in pragmatic layering. These sentences do not follow a single emotional or communicative trajectory. Instead, they present overlapping vectors of tone, intent, and implied emotional states. Typical features include: - **Desynchronization of tone and literal meaning** (e.g., calm wording with suppressed emotional urgency) - **Emotion masking** (e.g., saying “I’m fine” as a defensive shut-off) - **Implied intent without explicit context** (requiring reader or model inference) - **Tonally unstable syntax** (e.g., ellipses, repetition, abrupt clause shifts) Such sentences cannot be categorized by keywords alone. They require contextual interpretation and an awareness of the speaker’s probable emotional posture. This dataset aims to simulate these layers explicitly, making them visible and quantifiable for both human and machine interpreters.