Qwen 3 ASR Service — Documentation

# Qwen 3 ASR Service — Documentation ## 1. Service Overview Qwen 3 ASR is a speech-to-text (Automatic Speech Recognition) service that converts audio recordings into text. It is built on Alibaba's **Qwen3-ASR-1.7B** model, served via vLLM for high-throughput GPU inference. | Field | Value | |-------|-------| | Model | Qwen/Qwen3-ASR-1.7B (1.7 billion parameters) | | Default language | Vietnamese (`vi`) | | Supported audio formats | WAV, MP3, M4A, FLAC, OGG, AAC (auto-converted to 16 kHz mono WAV) | | Production URL | `https://aifarm.mservice.com.vn/internal/halong-qwen-asr/` | | Swagger UI | `https://aifarm.mservice.com.vn/internal/halong-qwen-asr/docs` | | Cluster / Namespace / Sub Namespace | aifarm / halong / astro | | Authentication | None required | --- ## 2. How to Use the Service There are three supported ways to call the API. Each has different strengths depending on your use case. ### 2.1 curl (CLI) Best for: scripting, quick testing from terminal, CI/CD pipelines. **Chunked mode (recommended for speed):** ```bash curl -X POST \ "https://aifarm.mservice.com.vn/internal/halong-qwen-asr/v1/asr/transcribe/audio" \ -F "file=@/path/to/audio.mp3" \ -F "process_type=chunked" ``` **Full mode (recommended for short high-accuracy transcription):** ```bash curl -X POST \ "https://aifarm.mservice.com.vn/internal/halong-qwen-asr/v1/asr/transcribe/audio" \ -F "file=@/path/to/audio.wav" \ -F "process_type=full" ``` **With custom request ID for tracing:** ```bash curl -X POST \ "https://aifarm.mservice.com.vn/internal/halong-qwen-asr/v1/asr/transcribe/audio" \ -F "file=@/path/to/audio.wav" \ -F "process_type=chunked" \ -F "request_id=my-trace-id-123" ``` | Strengths | Weaknesses | |-----------|------------| | Simplest method, no extra tooling | Not visual, harder to inspect responses | | Easily scriptable and automatable | Manual file path management | | Available on any Linux/Mac terminal | | ### 2.2 n8n (Workflow Automation) Best for: automated workflows, chaining ASR with downstream processing (e.g., summarization, translation, notification). **HTTP Request node configuration:** | Setting | Value | |---------|-------| | Method | POST | | URL | `https://aifarm.mservice.com.vn/internal/halong-qwen-asr/v1/asr/transcribe/audio` | | Send Body | ON | | Body Content Type | **Multipart Form Data** | | Timeout | `300000` (5 minutes, under Options > Timeout) | **Body Parameters:** | Name | Parameter Type | Value | |------|---------------|-------| | `file` | n8n Binary Data | Binary data from a previous node (e.g., Read Binary File, HTTP Request, Google Drive) | | `process_type` | String | `chunked` or `full` | **Network requirement:** The n8n server must have firewall access to `aifarm.mservice.com.vn` on port 443. The domain resolves to the public IP `210.245.72.36`. If your n8n instance is on an internal network, ensure the whitelist rule targets this resolved IP, not just the internal IP. If you get `ETIMEDOUT` errors, this means the network path is blocked. Verify with your IT/network team that the n8n server's outgoing IP is whitelisted to reach `210.245.72.36:443`. | Strengths | Weaknesses | |-----------|------------| | Fully automatable workflows | Requires n8n instance | | Can chain with other nodes (Slack, email, database) | Network whitelist required from internal networks | | Supports scheduling and event triggers | | --- ## 3. API Specification ### 3.1 Transcribe Audio **Endpoint:** `POST /internal/halong-qwen-asr/v1/asr/transcribe/audio` **Content-Type:** `multipart/form-data` **Request fields:** | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `file` | binary | Yes | — | Audio file (WAV, MP3, M4A, FLAC, OGG, AAC) | | `process_type` | string | No | `full` | `chunked` or `full` (see comparison below) | | `request_id` | string | No | auto UUID | Custom trace/correlation ID. Also accepted via `X-Request-ID` header. | **Success response (200):** ```json { "text": "transcribed text content here", "language": "vi", "request_id": "your-trace-id-or-auto-uuid" } ``` **Error response (4xx / 5xx):** ```json { "error": "Human-readable error description", "request_id": "trace-id" } ``` For 429 (rate limit) responses, an additional field is included: ```json { "error": "Too many requests. Server is at maximum capacity. Please retry.", "retry_after_seconds": 5, "request_id": "trace-id" } ``` ### 3.2 Processing Modes: Chunked vs Full | Aspect | `chunked` | `full` | |--------|-----------|--------| | **How it works** | Splits audio into 30-second chunks with 2-second overlap, processes all chunks in parallel, then merges results | Sends the entire audio as a single request to the model | | **Max audio duration** | 30 minutes | 10 minutes | | **Speed** | Faster for long audio (parallel processing) | Slower (single sequential call) | | **Accuracy** | May have minor artifacts at chunk boundaries | Higher accuracy (no boundary effects) | | **Concurrency limit** | 25 simultaneous requests | 5 simultaneous requests | | **Timeout** | 30 seconds per chunk | 300 seconds total | | **Best for** | Long recordings, batch processing, call center audio | Short recordings where accuracy matters most | | **Trade-off** | Speed over perfection | Perfection over speed | **When to use which:** - Audio under 30 seconds: either mode works equally well (no chunking occurs). - Audio 30 seconds to 10 minutes: use `full` if accuracy is critical, `chunked` if speed matters. - Audio 10 to 30 minutes: must use `chunked` (`full` rejects audio over 10 minutes). ### 3.3 Healthcheck **Endpoint:** `GET /internal/halong-qwen-asr/healthcheck` **Success (200):** ```json {"status": "ok"} ``` **Failure (500):** ```json {"detail": "service is not available"} ``` Use this endpoint to verify the service is running before sending transcription requests. --- ## 4. Grafana Monitoring **Grafana Dashboard URL:** https://aifarm.mservice.com.vn/grafana/d/halong-qwen-asr-prod-asr-metrics/halong-qwen-asr-prod-asr?orgId=1&from=now-1h&to=now&timezone=browser&var-ds=default&refresh=30s **Grafana VoctoriaLogs URL:** https://aifarm.mservice.com.vn/grafana/d/halong-qwen-asr-prod-asr-logs/halong-qwen-asr-prod-asr-logs?orgId=1&from=now-1h&to=now&timezone=browser&var-DS_VICTORIALOGS=PD775F2863313E6C7&var-query=&refresh=30s ### 4.1 Dashboard Panels The Grafana dashboard provides two views: **Metrics Dashboard (Prometheus):** | Panel | What it shows | |-------|---------------| | P99 Latency (ms) | 99th percentile end-to-end transcription latency over 5-minute windows | | Request Rate (req/min) | Total and error request rates | | Error Rate (%) | Percentage of failed requests | | Audio Duration Distribution | P50, P90, P99 of submitted audio duration | **Logs Dashboard (VictoriaLogs):** | Panel | What it shows | |-------|---------------| | Log Volume by Level | Count of INFO / WARNING / ERROR logs over time | | Errors by Error Code | Breakdown of errors by ASR error code | | Transcription Events | Completed and failed transcriptions with chunk details and timing | | Top Error Codes | Bar chart of most frequent error codes | | Per-Chunk vLLM Events | Individual chunk processing logs | | All Raw Logs | Unfiltered log search | ### 4.2 Prometheus Metrics All metrics are exposed on port 9090 and scraped by Victoria Metrics. | Metric Name | Type | Labels | Description | |-------------|------|--------|-------------| | `halong_qwen_asr_latency` | Histogram | `path` | End-to-end transcription latency (seconds) | | `halong_qwen_asr_http_requests_total` | Counter | `path`, `status` | Total requests. `status`: `all` or `error` | | `halong_qwen_asr_http_errors_total` | Counter | `path`, `status_code` | Failed requests by HTTP status code | | `halong_qwen_asr_audio_duration_seconds` | Histogram | `path` | Duration of submitted audio (seconds) | | `halong_qwen_asr_queue_depth` | Gauge | — | Requests currently waiting in concurrency queue | | `halong_qwen_asr_queue_wait_seconds` | Histogram | `path` | Time waiting in queue before processing starts | | `halong_qwen_asr_ffmpeg_convert_seconds` | Histogram | — | Audio format conversion latency | | `halong_qwen_asr_ffmpeg_convert_total` | Counter | `status` | Conversion attempts. `status`: `success` or `failed` | ### 4.3 Key Alerts to Watch - **Queue depth rising** (`halong_qwen_asr_queue_depth` > 10): service is approaching capacity. - **Error rate spike** (`halong_qwen_asr_http_errors_total`): check error codes in logs dashboard. - **P99 latency increase** (`halong_qwen_asr_latency`): may indicate GPU contention or longer-than-usual audio. --- ## 5. Error Codes Quick Reference All error codes are in the range ASR-6000 to ASR-6099. | Code | HTTP Status | Meaning | What to do | |------|-------------|---------|------------| | ASR-6000 | 500 | Healthcheck failed — vLLM backend is down | Wait and retry. If persistent, escalate to the Astro team. | | ASR-6001 | 500 | Unexpected vLLM proxy error | Retry. If persistent, check service logs. | | ASR-6002 | 503 | vLLM backend timed out | Audio may be too complex or service is overloaded. Retry later. | | ASR-6003 | 503 | vLLM backend unreachable | Service is restarting or the pod is down. Wait and retry. | | ASR-6004 | varies | vLLM returned an HTTP error | Check the response body for details. | | ASR-6010 | — | vLLM process failed to start | Internal startup error. Escalate to the Astro team. | | ASR-6030 | — | One or more chunks failed (partial result returned) | The response may be incomplete. Retry with `full` mode if accuracy matters. | | ASR-6040 | 400 | Blank / empty audio file | Provide a valid, non-empty audio file. | | ASR-6041 | 415 | Unsupported or corrupted audio format | Use a supported format: WAV, MP3, M4A, FLAC, OGG, AAC. | | ASR-6042 | 400 | Audio exceeds max duration | `chunked`: max 30 min. `full`: max 10 min. Trim or split the audio. | | ASR-6043 | 429 | Concurrency limit reached | Retry after the number of seconds indicated in `retry_after_seconds`. | | ASR-6044 | — | ffmpeg audio conversion failed | The audio file may be corrupted. Try a different format. | --- ## 6. Performance (Accuracy and Latency of System) [Qwen 3 ASR performance benchmark - 45 GB VRAM A100 ](https://docs.google.com/spreadsheets/d/19iyvdd_1KeeL77zRIMIbfohXS9Raqfnps_C_e5HhuG4/edit?usp=sharing) *For questions or issues, contact the Astro team (NLP-Research/bao.nguyen14) .*

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.