Geeigneter Einsatz: Ideal for AI agents that need build applications with groq's ultra-fast llm inference (300-1000+ tokens/sec). Lokalisierte Zusammenfassung: Flux-Free-Gateway is a robust ecosystem for managing and consuming AI models in free tiers, based on the model rotation architecture popularized by @midudev. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

How do I install groq-api?

Run the command: npx killer-skills add jesusjbriceno/flux-free-gateway/groq-api. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for groq-api?

Key use cases include: Anwendungsfall: Applying Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec), Anwendungsfall: Applying TypeScript/JavaScript, Anwendungsfall: Applying npm install groq-sdk.

Which IDEs are compatible with groq-api?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for groq-api?

Einschraenkung: Requires repository-specific context from the skill documentation. Einschraenkung: Works best when the underlying tools and dependencies are already configured.

Groq API

Name: groq-api
Availability: InStock
Author: jesusjbriceno

Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec).

Quick Start

Installation

bash
1# Python
2pip install groq
3
4# TypeScript/JavaScript
5npm install groq-sdk

Environment Setup

bash
1export GROQ_API_KEY=<your-api-key>

Basic Chat Completion

Python:

python
1from groq import Groq
2
3client = Groq()  # Uses GROQ_API_KEY env var
4
5response = client.chat.completions.create(
6    model="llama-3.3-70b-versatile",
7    messages=[{"role": "user", "content": "Hello"}]
8)
9print(response.choices[0].message.content)

TypeScript:

typescript
1import Groq from "groq-sdk";
2
3const client = new Groq();
4
5const response = await client.chat.completions.create({
6    model: "llama-3.3-70b-versatile",
7    messages: [{ role: "user", content: "Hello" }],
8});
9console.log(response.choices[0].message.content);

Model Selection

Use Case	Model	Notes
Fast + cheap	`llama-3.1-8b-instant`	Best for simple tasks
Balanced	`llama-3.3-70b-versatile`	Quality/cost balance
Highest quality	`openai/gpt-oss-120b`	Built-in tools + reasoning
Agentic	`groq/compound`	Web search + code exec
Reasoning	`openai/gpt-oss-20b`	Fast reasoning (low/med/high)
Vision/OCR	`llama-4-scout-17b-16e-instruct`	Image understanding
Audio STT	`whisper-large-v3-turbo`	Transcription
TTS	`playai-tts`	Text-to-speech

See references/models.md for full model list and pricing.

Common Patterns

Streaming Responses

python
1stream = client.chat.completions.create(
2    model="llama-3.3-70b-versatile",
3    messages=[{"role": "user", "content": "Tell me a story"}],
4    stream=True
5)
6
7for chunk in stream:
8    if chunk.choices[0].delta.content:
9        print(chunk.choices[0].delta.content, end="")

System Messages

python
1response = client.chat.completions.create(
2    model="llama-3.3-70b-versatile",
3    messages=[
4        {"role": "system", "content": "You are a helpful assistant."},
5        {"role": "user", "content": "Hello"}
6    ]
7)

Async Client (Python)

python
1import asyncio
2from groq import AsyncGroq
3
4async def main():
5    client = AsyncGroq()
6    response = await client.chat.completions.create(
7        model="llama-3.3-70b-versatile",
8        messages=[{"role": "user", "content": "Hello"}]
9    )
10    return response.choices[0].message.content
11
12print(asyncio.run(main()))

JSON Mode

python
1response = client.chat.completions.create(
2    model="llama-3.3-70b-versatile",
3    messages=[{"role": "user", "content": "List 3 colors as JSON array"}],
4    response_format={"type": "json_object"}
5)

Structured Outputs (JSON Schema)

Force output to match a schema. Two modes available:

Mode	Guarantee	Models
`strict: true`	100% schema compliance	`openai/gpt-oss-20b`, `openai/gpt-oss-120b`
`strict: false`	Best-effort compliance	All supported models

Strict Mode (guaranteed compliance):

python
1response = client.chat.completions.create(
2    model="openai/gpt-oss-20b",
3    messages=[{"role": "user", "content": "Extract: John is 30 years old"}],
4    response_format={
5        "type": "json_schema",
6        "json_schema": {
7            "name": "person",
8            "strict": True,
9            "schema": {
10                "type": "object",
11                "properties": {
12                    "name": {"type": "string"},
13                    "age": {"type": "integer"}
14                },
15                "required": ["name", "age"],
16                "additionalProperties": False
17            }
18        }
19    }
20)

With Pydantic:

python
1from pydantic import BaseModel
2
3class Person(BaseModel):
4    name: str
5    age: int
6
7response = client.chat.completions.create(
8    model="openai/gpt-oss-20b",
9    messages=[{"role": "user", "content": "Extract: John is 30"}],
10    response_format={
11        "type": "json_schema",
12        "json_schema": {
13            "name": "person",
14            "strict": True,
15            "schema": Person.model_json_schema()
16        }
17    }
18)
19person = Person.model_validate(json.loads(response.choices[0].message.content))

See references/structured-outputs.md for schema requirements, validation libraries, and examples.

Audio

Transcription (Speech-to-Text)

python
1with open("audio.mp3", "rb") as f:
2    transcription = client.audio.transcriptions.create(
3        model="whisper-large-v3-turbo",
4        file=f,
5        language="en",  # Optional: ISO-639-1 code
6        response_format="verbose_json",  # json, text, verbose_json
7        timestamp_granularities=["word", "segment"]
8    )
9print(transcription.text)

Translation (to English)

python
1with open("french_audio.mp3", "rb") as f:
2    translation = client.audio.translations.create(
3        model="whisper-large-v3",
4        file=f
5    )
6print(translation.text)  # English text

Text-to-Speech

python
1response = client.audio.speech.create(
2    model="playai-tts",
3    input="Hello, world!",
4    voice="Fritz-PlayAI",
5    response_format="wav",  # flac, mp3, mulaw, ogg, wav
6    speed=1.0  # 0.5 to 5
7)
8response.write_to_file("output.wav")

Vision

Process images with Llama 4 multimodal models. Supports up to 5 images per request.

Models: meta-llama/llama-4-scout-17b-16e-instruct (faster), meta-llama/llama-4-maverick-17b-128e-instruct (higher quality)

Image from URL

python
1response = client.chat.completions.create(
2    model="meta-llama/llama-4-scout-17b-16e-instruct",
3    messages=[{
4        "role": "user",
5        "content": [
6            {"type": "text", "text": "What's in this image?"},
7            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
8        ]
9    }]
10)

Local Image (Base64)

python
1import base64
2
3def encode_image(path: str) -> str:
4    with open(path, "rb") as f:
5        return base64.b64encode(f.read()).decode("utf-8")
6
7response = client.chat.completions.create(
8    model="meta-llama/llama-4-scout-17b-16e-instruct",
9    messages=[{
10        "role": "user",
11        "content": [
12            {"type": "text", "text": "Describe this image"},
13            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}}
14        ]
15    }]
16)

OCR / Extract Data as JSON

python
1response = client.chat.completions.create(
2    model="meta-llama/llama-4-scout-17b-16e-instruct",
3    messages=[{
4        "role": "user",
5        "content": [
6            {"type": "text", "text": "Extract all text and data as JSON"},
7            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
8        ]
9    }],
10    response_format={"type": "json_object"}
11)

See references/vision.md for multi-image, tool use with images, and multi-turn conversations.

Tool Use

For tool calling patterns and examples, see references/tool-use.md.

Quick example:

python
1import json
2
3tools = [{
4    "type": "function",
5    "function": {
6        "name": "get_weather",
7        "description": "Get weather for a location",
8        "parameters": {
9            "type": "object",
10            "properties": {"location": {"type": "string"}},
11            "required": ["location"]
12        }
13    }
14}]
15
16response = client.chat.completions.create(
17    model="llama-3.3-70b-versatile",
18    messages=[{"role": "user", "content": "Weather in Paris?"}],
19    tools=tools
20)
21
22if response.choices[0].message.tool_calls:
23    for tc in response.choices[0].message.tool_calls:
24        args = json.loads(tc.function.arguments)
25        # Execute function and continue conversation

Built-In Tools (Agentic)

Use groq/compound or openai/gpt-oss-120b for built-in web search and code execution:

python
1response = client.chat.completions.create(
2    model="groq/compound",
3    messages=[{"role": "user", "content": "Search for latest Python news"}]
4)
5# Model automatically uses web search

MCP (Remote Tools)

Connect to third-party MCP servers for tools like Stripe, GitHub, web scraping. Use the Responses API:

python
1import openai
2
3client = openai.OpenAI(
4    api_key=os.environ.get("GROQ_API_KEY"),
5    base_url="https://api.groq.com/openai/v1"
6)
7
8response = client.responses.create(
9    model="openai/gpt-oss-120b",
10    input="What models are trending on Huggingface?",
11    tools=[{
12        "type": "mcp",
13        "server_label": "Huggingface",
14        "server_url": "https://huggingface.co/mcp"
15    }]
16)

See references/tool-use.md for MCP configuration and popular servers.

Reasoning Models

Control how models think through complex problems.

Models: openai/gpt-oss-20b, openai/gpt-oss-120b (low/medium/high), qwen/qwen3-32b (none/default)

GPT-OSS with Reasoning Effort

python
1response = client.chat.completions.create(
2    model="openai/gpt-oss-20b",
3    messages=[{"role": "user", "content": "How many r's in strawberry?"}],
4    reasoning_effort="high",  # low, medium, high
5    temperature=0.6,
6    max_completion_tokens=1024
7)
8
9print(response.choices[0].message.content)
10print("Reasoning:", response.choices[0].message.reasoning)

Qwen3 with Parsed Reasoning

python
1response = client.chat.completions.create(
2    model="qwen/qwen3-32b",
3    messages=[{"role": "user", "content": "Solve: x + 5 = 12"}],
4    reasoning_format="parsed"  # raw, parsed, hidden
5)
6
7print("Answer:", response.choices[0].message.content)
8print("Reasoning:", response.choices[0].message.reasoning)

Hide Reasoning (GPT-OSS)

python
1response = client.chat.completions.create(
2    model="openai/gpt-oss-20b",
3    messages=[{"role": "user", "content": "What is 15% of 80?"}],
4    include_reasoning=False  # Hide reasoning in response
5)

See references/reasoning.md for streaming, tool use with reasoning, and best practices.

Batch Processing

For high-volume async processing (24h-7d completion window):

python
1# 1. Create JSONL file with requests
2# 2. Upload file
3# 3. Create batch
4batch = client.batches.create(
5    input_file_id=file_id,
6    endpoint="/v1/chat/completions",
7    completion_window="24h"
8)
9
10# 4. Check status
11batch = client.batches.retrieve(batch.id)
12if batch.status == "completed":
13    results = client.files.content(batch.output_file_id)

See references/api-reference.md for full batch API details.

Prompt Caching

Automatically reduce latency and costs by 50% for repeated prompt prefixes. No code changes required.

Supported models: moonshotai/kimi-k2-instruct-0905, openai/gpt-oss-20b, openai/gpt-oss-120b, openai/gpt-oss-safeguard-20b

How it works:

Place static content (system prompts, tools, examples) at the beginning
Place dynamic content (user queries) at the end
Cache automatically matches prefixes and applies 50% discount
Cache expires after 2 hours without use

Track cache usage:

python
1response = client.chat.completions.create(
2    model="moonshotai/kimi-k2-instruct-0905",
3    messages=[{"role": "system", "content": large_system_prompt}, ...]
4)
5
6cached = response.usage.prompt_tokens_details.cached_tokens
7print(f"Cached tokens: {cached}")  # 50% discount applied to these

See references/prompt-caching.md for optimization strategies and examples.

Content Moderation

Detect and filter harmful content using safeguard models.

Llama Guard 4

General content safety classification. Returns safe or unsafe\nSX (category code).

python
1response = client.chat.completions.create(
2    model="meta-llama/Llama-Guard-4-12B",
3    messages=[{"role": "user", "content": user_input}]
4)
5
6if response.choices[0].message.content.startswith("unsafe"):
7    # Block or handle unsafe content
8    pass

GPT-OSS Safeguard 20B

Prompt injection detection with custom policies. Returns structured JSON.

python
1response = client.chat.completions.create(
2    model="openai/gpt-oss-safeguard-20b",
3    messages=[
4        {"role": "system", "content": injection_detection_policy},
5        {"role": "user", "content": user_input}
6    ]
7)
8# Returns: {"violation": 1, "category": "Direct Override", "rationale": "..."}

See references/moderation.md for complete policies, harm taxonomy, and integration patterns.

Error Handling

python
1from groq import Groq, RateLimitError, APIConnectionError, APIStatusError
2
3client = Groq()
4
5try:
6    response = client.chat.completions.create(
7        model="llama-3.3-70b-versatile",
8        messages=[{"role": "user", "content": "Hello"}]
9    )
10except RateLimitError:
11    # Wait and retry with exponential backoff
12    pass
13except APIConnectionError:
14    # Network issue
15    pass
16except APIStatusError as e:
17    # API error (check e.status_code)
18    pass

See references/audio.md for complete audio API reference including file handling, metadata fields, and prompting guidelines.

Resources

Models & pricing: references/models.md
Tool use guide: references/tool-use.md
Vision guide: references/vision.md
Audio guide: references/audio.md
Reasoning guide: references/reasoning.md
Structured outputs: references/structured-outputs.md
Prompt caching: references/prompt-caching.md
Moderation guide: references/moderation.md
SDK reference: references/sdk.md
Full API reference: references/api-reference.md
Official docs: https://console.groq.com/docs
Python SDK: https://github.com/groq/groq-python
TypeScript SDK: https://github.com/groq/groq-typescript

groq-api — for Claude Code groq-api, community, for Claude Code, ide skills, ### Environment Setup, bash export GROQ_API_KEY=<your-api-key>, **TypeScript:**, llama-3.1-8b-instant, llama-3.3-70b-versatile, llama-4-scout-17b-16e-instruct

Über diesen Skill

Funktionen

# Core Topics

Killer-Skills Review

Warum diese Fähigkeit verwenden

Am besten geeignet für

↓ Handlungsfähige Anwendungsfälle for groq-api

! Sicherheit & Einschränkungen

Why this page is reference-only

Source Boundary

Decide The Next Action Before You Keep Reading Repository Material

Start With Installation And Validation

Cross-Check Against Trusted Picks

Move To Workflow Collections For Team Rollout

Browser Sandbox Environment

⚡️ Ready to unleash?

FAQ & Installation Steps

? Frequently Asked Questions

What is groq-api?