使用 OpenAI 的 Moderation 模型偵測不適當內容

# 使用 OpenAI 的 Moderation 模型偵測不適當內容 ## 介紹 Moderation 模型是一個 OpenAI 所提供的免費工具，用來審查所謂的『不適當內容』。詳細的禁止條例可以參考 https://openai.com/policies/usage-policies。目前此工具對英文的支援度較高，對其他語言可能相對沒那麼好用。使用者可以透過這個工具辨識出不適當的內容並做出處理，比如**過濾**掉訊息。 moderation 過濾掉的訊息種類如下： |CATEGORY|DESCRIPTION| |-|-| |hate|Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is not covered by this category.| |hate/threatening|Hateful content that also includes violence or serious harm towards the targeted group.| |self-harm|Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.| |sexual|Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).| |sexual/minors|Sexual content that includes an individual who is under 18 years old.| |violence|Content that promotes or glorifies violence or celebrates the suffering or humiliation of others.| |violence/graphic|Violent content that depicts death, violence, or serious physical injury in extreme graphic detail.| ## 使用方法 ``` curl https://api.openai.com/v1/moderations \ -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{"input": "Sample text goes here"}' ``` ```python= import requests import os headers = { "Content-Type": "application/json", "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}", } data = { "input": "I want to kill them." } response = requests.post("https://api.openai.com/v1/moderations", headers=headers, json=data) # Print the response print(response.json()) ``` output: ``` {'id': 'modr-7PTCmG5D6bTT5hTbBwJEOjVoGfBa0', 'model': 'text-moderation-004', 'results': [{'flagged': True, 'categories': {'sexual': False, 'hate': False, 'violence': True, 'self-harm': False, 'sexual/minors': False, 'hate/threatening': False, 'violence/graphic': False}, 'category_scores': {'sexual': 9.530887e-07, 'hate': 0.18386647, 'violence': 0.8870859, 'self-harm': 1.7594473e-09, 'sexual/minors': 1.3112696e-08, 'hate/threatening': 0.003258761, 'violence/graphic': 3.173159e-08}}]} ``` 可以看到，對於『**暴力**』的分類，預測數值是很高的。 <br/> --- ## 其他語言的可用性對我而言，最重要的就是中文的可用性了。