這邊假設你已經成功啟動dify的服務
Whisper Asr Webservice是一個很方便的工具,因為作者已經幫忙把語音轉文字的服務API化,對於應用端來說只要做常規的API呼叫就可以滿足應用上的需求。
啟動服務最快的方法就是docker:
docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest
docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
不過要注意自己的cuda版本是否有對應,如果最新版的無法啟動的話就要降版,這取決於你的硬體設備。
成功啟動docker container之後,可以直接連接到應網頁,http://你的ip:9000/docs 沒意外的話你會看到下圖:
Learn More →
點擊上圖紅框處的openapi.json
,系統會開立一個新的視窗,一堆的json格式的字串,複製出來之後,拿到你喜歡的平台去做排版,排版之後我們要做兩個處理,這樣才能成功在dify中引用:
在info
與path
的中間加入servers
Learn More →
修正requestBody
中的content-type
,原本是multipart/form-data
,修正為multipart/form-data; boundary=----WebKitFormBoundarydzemBAPhdeDfTCfR
Learn More →
更多關於模型的選擇與細部配置請自行參考Whisper Asr Webservice官方文件說明。
進入Dify之後,我們就可以自定義工具,首先點擊工具:
Learn More →
點擊自定義之後,點擊建立自定義工具:
Learn More →
自己定義名稱,然後將剛剛調整的openapi貼上,就可以成功建立一個自定義工具:
Learn More →
現在,你已經可以直接在dify的workflow中取用自定義的語音轉文字的工具:
Learn More →
把一些參數設置一下:
Learn More →
測試應用:
Learn More →
把取到的文字再弄成問答集:
問題:客戶在聯絡退稅服務時需要提供哪些信息?
答案:當客戶聯繫退稅服務時,他們需要先將退稅輔導單的背面展示給客服人員。這是因為背面上有重要的信息如退稅編號,而該編號是核實和處理退稅事宜的關鍵資訊之一。此外,客戶還需告知他們想將退稅款轉入的帳戶號碼,以便在核准退稅後正確地入帳。
問題:退稅作業需要多久時間?
答案:退稅的整個作業過程大約需要一個星期的時間。這段時間內,相關部門會處理客戶提供的信息,完成核對和審批程序。
問題:退稅款入帳後會如何通知客戶?
答案:一旦退稅款成功入帳到客戶指定的帳戶中,相關單位將透過簡訊的方式向客戶發出通知。這樣可以讓客戶即時了解退稅款項處理的情況。
問題:如果客戶需要查詢或提供退稅相關信息,應如何聯繫?
答案:客戶若需查詢或提供退稅相關資訊時,可以直接致電服務中心。在通話過程中,客服人員會引導客戶展示退稅輔導單的背面以取得所需信息,並記錄下客戶想要轉入的帳戶號碼等詳細資料。
問題:客服人員在接獲退稅請求時通常會有何反應?
答案:根據提供的文本內容,客服人員表現出專業且友善的態度。他們會告知客戶如何取得退稅編號,需求收集轉入帳戶的信息,並確認已記錄下所有必要資料後表示感謝。此外,客服人員也會鼓勵客戶耐心等待退稅作業完成,並祝願他們一切順利。
有興趣的話,可以再把問答集轉成主持人與來賓的對話,那就是一個podcast了。
如果是公司會議記錄,就可以讓與會人員每天聽聽老闆愛的盯寧了。
Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pretraining DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.
Apr 7, 2025神經網路相關論文翻譯
Mar 27, 2025說明 排版的順序為先原文,再繁體中文,並且圖片與表格都會出現在第一次出現的段落下面 原文 繁體中文 照片或表格 :::warning
Feb 20, 2025Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving efficiency while maintaining model capabilities. We present NSA, a Natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. NSA employs a dynamic hierarchical sparse strategy, combining coarse-grained token compression with fine-grained token selection to preserve both global context awareness and local precision. Our approach advances sparse attention design with two key innovations: (1) We achieve substantial speedups through arithmetic intensity-balanced algorithm design, with implementation optimizations for modern hardware. (2) We enable end-to-end training, reducing pretraining computation without sacrificing model performance. As shown in Figure 1, experiments show the model pretrained with NSA maintains or exceeds Full Attention models across general benchmarks, long-context tasks, and instruction-based reasoning. Meanwhile, NSA achieves substantial speedups over Full Attention on 64k-length sequences across decoding, forward propagation, and backward propagation, validating its efficiency throughout the model lifecycle.
Feb 20, 2025or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up