KS
Killer-Skills

alicloud-ai-audio-tts — Categories.community

v1.0.0
GitHub

About this Skill

Perfect for Voice Assistant Agents needing advanced text-to-speech capabilities with Alibaba Cloud services. alibaba cloud skills,qwen ,wan and all skills

cinience cinience
[0]
[0]
Updated: 3/5/2026

Quality Score

Top 5%
52
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
Cursor IDE Windsurf IDE VS Code IDE
> npx killer-skills add cinience/alicloud-skills

Agent Capability Analysis

The alicloud-ai-audio-tts MCP Server by cinience is an open-source Categories.community integration for Claude and other AI agents, enabling seamless task automation and capability expansion.

Ideal Agent Persona

Perfect for Voice Assistant Agents needing advanced text-to-speech capabilities with Alibaba Cloud services.

Core Value

Empowers agents to generate high-quality audio files using Alibaba Cloud's AI-powered text-to-speech (TTS) services, leveraging Python scripts and the py_compile module for seamless integration with the alicloud-ai-audio-tts skill.

Capabilities Granted for alicloud-ai-audio-tts MCP Server

Generating audio links for voice-assisted applications
Creating sample audio files for TTS model testing
Validating request payloads for alicloud-ai-audio-tts API calls

! Prerequisites & Limits

  • Requires Alibaba Cloud account and credentials
  • Python 3.x compatibility required for py_compile module
  • Dependent on alicloud-ai-audio-tts API availability and pricing
Project
SKILL.md
3.1 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8

# Tags

[No tags]
SKILL.md
Readonly

Category: provider

Model Studio Qwen TTS

Validation

bash
1mkdir -p output/alicloud-ai-audio-tts 2python -m py_compile skills/ai/audio/alicloud-ai-audio-tts/scripts/generate_tts.py && echo "py_compile_ok" > output/alicloud-ai-audio-tts/validate.txt

Pass criteria: command exits 0 and output/alicloud-ai-audio-tts/validate.txt is generated.

Output And Evidence

  • Save generated audio links, sample audio files, and request payloads to output/alicloud-ai-audio-tts/.
  • Keep one validation log per execution.

Critical model names

Use one of the recommended models:

  • qwen3-tts-flash
  • qwen3-tts-instruct-flash
  • qwen3-tts-instruct-flash-2026-01-26

Prerequisites

  • Install SDK (recommended in a venv to avoid PEP 668 limits):
bash
1python3 -m venv .venv 2. .venv/bin/activate 3python -m pip install dashscope
  • Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials (env takes precedence).

Normalized interface (tts.generate)

Request

  • text (string, required)
  • voice (string, required)
  • language_type (string, optional; default Auto)
  • instruction (string, optional; recommended for instruct models)
  • stream (bool, optional; default false)

Response

  • audio_url (string, when stream=false)
  • audio_base64_pcm (string, when stream=true)
  • sample_rate (int, 24000)
  • format (string, wav or pcm depending on mode)

Quick start (Python + DashScope SDK)

python
1import os 2import dashscope 3 4# Prefer env var for auth: export DASHSCOPE_API_KEY=... 5# Or use ~/.alibabacloud/credentials with dashscope_api_key under [default]. 6# Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1 7dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1" 8 9text = "Hello, this is a short voice line." 10response = dashscope.MultiModalConversation.call( 11 model="qwen3-tts-instruct-flash", 12 api_key=os.getenv("DASHSCOPE_API_KEY"), 13 text=text, 14 voice="Cherry", 15 language_type="English", 16 instruction="Warm and calm tone, slightly slower pace.", 17 stream=False, 18) 19 20audio_url = response.output.audio.url 21print(audio_url)

Streaming notes

  • stream=True returns Base64-encoded PCM chunks at 24kHz.
  • Decode chunks and play or concatenate to a pcm buffer.
  • The response contains finish_reason == "stop" when the stream ends.

Operational guidance

  • Keep requests concise; split long text into multiple calls if you hit size or timeout errors.
  • Use language_type consistent with the text to improve pronunciation.
  • Use instruction only when you need explicit style/tone control.
  • Cache by (text, voice, language_type) to avoid repeat costs.

Output location

  • Default output: output/alicloud-ai-audio-tts/audio/
  • Override base dir with OUTPUT_DIR.

References

  • references/api_reference.md for parameter mapping and streaming example.

  • Realtime mode is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/.

  • Voice cloning/design are provided by skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ and skills/ai/audio/alicloud-ai-audio-tts-voice-design/.

  • Source list: references/sources.md

Related Skills

Looking for an alternative to alicloud-ai-audio-tts or building a Categories.community AI Agent? Explore these related open-source MCP Servers.

View All

widget-generator

Logo of f
f

widget-generator is an open-source AI agent skill for creating widget plugins that are injected into prompt feeds on prompts.chat. It supports two rendering modes: standard prompt widgets using default PromptCard styling and custom render widgets built as full React components.

149.6k
0
Design

chat-sdk

Logo of lobehub
lobehub

chat-sdk is a unified TypeScript SDK for building chat bots across multiple platforms, providing a single interface for deploying bot logic.

73.0k
0
Communication

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication