# Qwen 3 ASR Service — Documentation
## 1. Service Overview
Qwen 3 ASR is a speech-to-text (Automatic Speech Recognition) service that converts audio recordings into text. It is built on Alibaba's **Qwen3-ASR-1.7B** model, served via vLLM for high-throughput GPU inference.
| Field | Value |
|-------|-------|
| Model | Qwen/Qwen3-ASR-1.7B (1.7 billion parameters) |
| Default language | Vietnamese (`vi`) |
| Supported audio formats | WAV, MP3, M4A, FLAC, OGG, AAC (auto-converted to 16 kHz mono WAV) |
| Production URL | `https://aifarm.mservice.com.vn/internal/halong-qwen-asr/` |
| Swagger UI | `https://aifarm.mservice.com.vn/internal/halong-qwen-asr/docs` |
| Cluster / Namespace / Sub Namespace | aifarm / halong / astro |
| Authentication | None required |
---
## 2. How to Use the Service
There are three supported ways to call the API. Each has different strengths depending on your use case.
### 2.1 curl (CLI)
Best for: scripting, quick testing from terminal, CI/CD pipelines.
**Chunked mode (recommended for speed):**
```bash
curl -X POST \
"https://aifarm.mservice.com.vn/internal/halong-qwen-asr/v1/asr/transcribe/audio" \
-F "file=@/path/to/audio.mp3" \
-F "process_type=chunked"
```
**Full mode (recommended for short high-accuracy transcription):**
```bash
curl -X POST \
"https://aifarm.mservice.com.vn/internal/halong-qwen-asr/v1/asr/transcribe/audio" \
-F "file=@/path/to/audio.wav" \
-F "process_type=full"
```
**With custom request ID for tracing:**
```bash
curl -X POST \
"https://aifarm.mservice.com.vn/internal/halong-qwen-asr/v1/asr/transcribe/audio" \
-F "file=@/path/to/audio.wav" \
-F "process_type=chunked" \
-F "request_id=my-trace-id-123"
```
| Strengths | Weaknesses |
|-----------|------------|
| Simplest method, no extra tooling | Not visual, harder to inspect responses |
| Easily scriptable and automatable | Manual file path management |
| Available on any Linux/Mac terminal | |
### 2.2 n8n (Workflow Automation)
Best for: automated workflows, chaining ASR with downstream processing (e.g., summarization, translation, notification).
**HTTP Request node configuration:**
| Setting | Value |
|---------|-------|
| Method | POST |
| URL | `https://aifarm.mservice.com.vn/internal/halong-qwen-asr/v1/asr/transcribe/audio` |
| Send Body | ON |
| Body Content Type | **Multipart Form Data** |
| Timeout | `300000` (5 minutes, under Options > Timeout) |
**Body Parameters:**
| Name | Parameter Type | Value |
|------|---------------|-------|
| `file` | n8n Binary Data | Binary data from a previous node (e.g., Read Binary File, HTTP Request, Google Drive) |
| `process_type` | String | `chunked` or `full` |
**Network requirement:**
The n8n server must have firewall access to `aifarm.mservice.com.vn` on port 443. The domain resolves to the public IP `210.245.72.36`. If your n8n instance is on an internal network, ensure the whitelist rule targets this resolved IP, not just the internal IP.
If you get `ETIMEDOUT` errors, this means the network path is blocked. Verify with your IT/network team that the n8n server's outgoing IP is whitelisted to reach `210.245.72.36:443`.
| Strengths | Weaknesses |
|-----------|------------|
| Fully automatable workflows | Requires n8n instance |
| Can chain with other nodes (Slack, email, database) | Network whitelist required from internal networks |
| Supports scheduling and event triggers | |
---
## 3. API Specification
### 3.1 Transcribe Audio
**Endpoint:** `POST /internal/halong-qwen-asr/v1/asr/transcribe/audio`
**Content-Type:** `multipart/form-data`
**Request fields:**
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `file` | binary | Yes | — | Audio file (WAV, MP3, M4A, FLAC, OGG, AAC) |
| `process_type` | string | No | `full` | `chunked` or `full` (see comparison below) |
| `request_id` | string | No | auto UUID | Custom trace/correlation ID. Also accepted via `X-Request-ID` header. |
**Success response (200):**
```json
{
"text": "transcribed text content here",
"language": "vi",
"request_id": "your-trace-id-or-auto-uuid"
}
```
**Error response (4xx / 5xx):**
```json
{
"error": "Human-readable error description",
"request_id": "trace-id"
}
```
For 429 (rate limit) responses, an additional field is included:
```json
{
"error": "Too many requests. Server is at maximum capacity. Please retry.",
"retry_after_seconds": 5,
"request_id": "trace-id"
}
```
### 3.2 Processing Modes: Chunked vs Full
| Aspect | `chunked` | `full` |
|--------|-----------|--------|
| **How it works** | Splits audio into 30-second chunks with 2-second overlap, processes all chunks in parallel, then merges results | Sends the entire audio as a single request to the model |
| **Max audio duration** | 30 minutes | 10 minutes |
| **Speed** | Faster for long audio (parallel processing) | Slower (single sequential call) |
| **Accuracy** | May have minor artifacts at chunk boundaries | Higher accuracy (no boundary effects) |
| **Concurrency limit** | 25 simultaneous requests | 5 simultaneous requests |
| **Timeout** | 30 seconds per chunk | 300 seconds total |
| **Best for** | Long recordings, batch processing, call center audio | Short recordings where accuracy matters most |
| **Trade-off** | Speed over perfection | Perfection over speed |
**When to use which:**
- Audio under 30 seconds: either mode works equally well (no chunking occurs).
- Audio 30 seconds to 10 minutes: use `full` if accuracy is critical, `chunked` if speed matters.
- Audio 10 to 30 minutes: must use `chunked` (`full` rejects audio over 10 minutes).
### 3.3 Healthcheck
**Endpoint:** `GET /internal/halong-qwen-asr/healthcheck`
**Success (200):**
```json
{"status": "ok"}
```
**Failure (500):**
```json
{"detail": "service is not available"}
```
Use this endpoint to verify the service is running before sending transcription requests.
---
## 4. Grafana Monitoring
**Grafana Dashboard URL:** https://aifarm.mservice.com.vn/grafana/d/halong-qwen-asr-prod-asr-metrics/halong-qwen-asr-prod-asr?orgId=1&from=now-1h&to=now&timezone=browser&var-ds=default&refresh=30s
**Grafana VoctoriaLogs URL:** https://aifarm.mservice.com.vn/grafana/d/halong-qwen-asr-prod-asr-logs/halong-qwen-asr-prod-asr-logs?orgId=1&from=now-1h&to=now&timezone=browser&var-DS_VICTORIALOGS=PD775F2863313E6C7&var-query=&refresh=30s
### 4.1 Dashboard Panels
The Grafana dashboard provides two views:
**Metrics Dashboard (Prometheus):**
| Panel | What it shows |
|-------|---------------|
| P99 Latency (ms) | 99th percentile end-to-end transcription latency over 5-minute windows |
| Request Rate (req/min) | Total and error request rates |
| Error Rate (%) | Percentage of failed requests |
| Audio Duration Distribution | P50, P90, P99 of submitted audio duration |
**Logs Dashboard (VictoriaLogs):**
| Panel | What it shows |
|-------|---------------|
| Log Volume by Level | Count of INFO / WARNING / ERROR logs over time |
| Errors by Error Code | Breakdown of errors by ASR error code |
| Transcription Events | Completed and failed transcriptions with chunk details and timing |
| Top Error Codes | Bar chart of most frequent error codes |
| Per-Chunk vLLM Events | Individual chunk processing logs |
| All Raw Logs | Unfiltered log search |
### 4.2 Prometheus Metrics
All metrics are exposed on port 9090 and scraped by Victoria Metrics.
| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `halong_qwen_asr_latency` | Histogram | `path` | End-to-end transcription latency (seconds) |
| `halong_qwen_asr_http_requests_total` | Counter | `path`, `status` | Total requests. `status`: `all` or `error` |
| `halong_qwen_asr_http_errors_total` | Counter | `path`, `status_code` | Failed requests by HTTP status code |
| `halong_qwen_asr_audio_duration_seconds` | Histogram | `path` | Duration of submitted audio (seconds) |
| `halong_qwen_asr_queue_depth` | Gauge | — | Requests currently waiting in concurrency queue |
| `halong_qwen_asr_queue_wait_seconds` | Histogram | `path` | Time waiting in queue before processing starts |
| `halong_qwen_asr_ffmpeg_convert_seconds` | Histogram | — | Audio format conversion latency |
| `halong_qwen_asr_ffmpeg_convert_total` | Counter | `status` | Conversion attempts. `status`: `success` or `failed` |
### 4.3 Key Alerts to Watch
- **Queue depth rising** (`halong_qwen_asr_queue_depth` > 10): service is approaching capacity.
- **Error rate spike** (`halong_qwen_asr_http_errors_total`): check error codes in logs dashboard.
- **P99 latency increase** (`halong_qwen_asr_latency`): may indicate GPU contention or longer-than-usual audio.
---
## 5. Error Codes Quick Reference
All error codes are in the range ASR-6000 to ASR-6099.
| Code | HTTP Status | Meaning | What to do |
|------|-------------|---------|------------|
| ASR-6000 | 500 | Healthcheck failed — vLLM backend is down | Wait and retry. If persistent, escalate to the Astro team. |
| ASR-6001 | 500 | Unexpected vLLM proxy error | Retry. If persistent, check service logs. |
| ASR-6002 | 503 | vLLM backend timed out | Audio may be too complex or service is overloaded. Retry later. |
| ASR-6003 | 503 | vLLM backend unreachable | Service is restarting or the pod is down. Wait and retry. |
| ASR-6004 | varies | vLLM returned an HTTP error | Check the response body for details. |
| ASR-6010 | — | vLLM process failed to start | Internal startup error. Escalate to the Astro team. |
| ASR-6030 | — | One or more chunks failed (partial result returned) | The response may be incomplete. Retry with `full` mode if accuracy matters. |
| ASR-6040 | 400 | Blank / empty audio file | Provide a valid, non-empty audio file. |
| ASR-6041 | 415 | Unsupported or corrupted audio format | Use a supported format: WAV, MP3, M4A, FLAC, OGG, AAC. |
| ASR-6042 | 400 | Audio exceeds max duration | `chunked`: max 30 min. `full`: max 10 min. Trim or split the audio. |
| ASR-6043 | 429 | Concurrency limit reached | Retry after the number of seconds indicated in `retry_after_seconds`. |
| ASR-6044 | — | ffmpeg audio conversion failed | The audio file may be corrupted. Try a different format. |
---
## 6. Performance (Accuracy and Latency of System)
[Qwen 3 ASR performance benchmark - 45 GB VRAM A100 ](https://docs.google.com/spreadsheets/d/19iyvdd_1KeeL77zRIMIbfohXS9Raqfnps_C_e5HhuG4/edit?usp=sharing)
*For questions or issues, contact the Astro team (NLP-Research/bao.nguyen14) .*