groq-api — for Claude Code groq-api, community, for Claude Code, ide skills, ### Environment Setup, bash export GROQ_API_KEY=<your-api-key>, **TypeScript:**, llama-3.1-8b-instant, llama-3.3-70b-versatile, llama-4-scout-17b-16e-instruct

v1.0.0

Über diesen Skill

Geeigneter Einsatz: Ideal for AI agents that need build applications with groq's ultra-fast llm inference (300-1000+ tokens/sec). Lokalisierte Zusammenfassung: Flux-Free-Gateway is a robust ecosystem for managing and consuming AI models in free tiers, based on the model rotation architecture popularized by @midudev. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

Funktionen

Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec).
TypeScript/JavaScript
npm install groq-sdk
export GROQ API KEY=<your-api-key
Basic Chat Completion

# Core Topics

jesusjbriceno jesusjbriceno
[0]
[0]
Updated: 3/13/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 10/11

This page remains useful for teams, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Original recommendation layer Concrete use-case guidance Explicit limitations and caution Quality floor passed for review
Review Score
10/11
Quality Score
70
Canonical Locale
en
Detected Body Locale
en

Geeigneter Einsatz: Ideal for AI agents that need build applications with groq's ultra-fast llm inference (300-1000+ tokens/sec). Lokalisierte Zusammenfassung: Flux-Free-Gateway is a robust ecosystem for managing and consuming AI models in free tiers, based on the model rotation architecture popularized by @midudev. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

Warum diese Fähigkeit verwenden

Empfehlung: groq-api helps agents build applications with groq's ultra-fast llm inference (300-1000+ tokens/sec). Flux-Free-Gateway is a robust ecosystem for managing and consuming AI models in free tiers, based on the

Am besten geeignet für

Geeigneter Einsatz: Ideal for AI agents that need build applications with groq's ultra-fast llm inference (300-1000+ tokens/sec).

Handlungsfähige Anwendungsfälle for groq-api

Anwendungsfall: Applying Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec)
Anwendungsfall: Applying TypeScript/JavaScript
Anwendungsfall: Applying npm install groq-sdk

! Sicherheit & Einschränkungen

  • Einschraenkung: Requires repository-specific context from the skill documentation
  • Einschraenkung: Works best when the underlying tools and dependencies are already configured

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.

Source Boundary

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

After The Review

Decide The Next Action Before You Keep Reading Repository Material

Killer-Skills should not stop at opening repository instructions. It should help you decide whether to install this skill, when to cross-check against trusted collections, and when to move into workflow rollout.

Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is groq-api?

Geeigneter Einsatz: Ideal for AI agents that need build applications with groq's ultra-fast llm inference (300-1000+ tokens/sec). Lokalisierte Zusammenfassung: Flux-Free-Gateway is a robust ecosystem for managing and consuming AI models in free tiers, based on the model rotation architecture popularized by @midudev. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

How do I install groq-api?

Run the command: npx killer-skills add jesusjbriceno/flux-free-gateway/groq-api. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for groq-api?

Key use cases include: Anwendungsfall: Applying Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec), Anwendungsfall: Applying TypeScript/JavaScript, Anwendungsfall: Applying npm install groq-sdk.

Which IDEs are compatible with groq-api?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for groq-api?

Einschraenkung: Requires repository-specific context from the skill documentation. Einschraenkung: Works best when the underlying tools and dependencies are already configured.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add jesusjbriceno/flux-free-gateway/groq-api. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use groq-api immediately in the current project.

! Reference-Only Mode

This page remains useful for installation and reference, but Killer-Skills no longer treats it as a primary indexable landing page. Read the review above before relying on the upstream repository instructions.

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Upstream Source

groq-api

Install groq-api, an AI agent skill for AI agent workflows and automation. Review the use cases, limitations, and setup path before rollout.

SKILL.md
Readonly
Upstream Repository Material
The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.
Supporting Evidence

Groq API

Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec).

Quick Start

Installation

bash
1# Python 2pip install groq 3 4# TypeScript/JavaScript 5npm install groq-sdk

Environment Setup

bash
1export GROQ_API_KEY=<your-api-key>

Basic Chat Completion

Python:

python
1from groq import Groq 2 3client = Groq() # Uses GROQ_API_KEY env var 4 5response = client.chat.completions.create( 6 model="llama-3.3-70b-versatile", 7 messages=[{"role": "user", "content": "Hello"}] 8) 9print(response.choices[0].message.content)

TypeScript:

typescript
1import Groq from "groq-sdk"; 2 3const client = new Groq(); 4 5const response = await client.chat.completions.create({ 6 model: "llama-3.3-70b-versatile", 7 messages: [{ role: "user", content: "Hello" }], 8}); 9console.log(response.choices[0].message.content);

Model Selection

Use CaseModelNotes
Fast + cheapllama-3.1-8b-instantBest for simple tasks
Balancedllama-3.3-70b-versatileQuality/cost balance
Highest qualityopenai/gpt-oss-120bBuilt-in tools + reasoning
Agenticgroq/compoundWeb search + code exec
Reasoningopenai/gpt-oss-20bFast reasoning (low/med/high)
Vision/OCRllama-4-scout-17b-16e-instructImage understanding
Audio STTwhisper-large-v3-turboTranscription
TTSplayai-ttsText-to-speech

See references/models.md for full model list and pricing.

Common Patterns

Streaming Responses

python
1stream = client.chat.completions.create( 2 model="llama-3.3-70b-versatile", 3 messages=[{"role": "user", "content": "Tell me a story"}], 4 stream=True 5) 6 7for chunk in stream: 8 if chunk.choices[0].delta.content: 9 print(chunk.choices[0].delta.content, end="")

System Messages

python
1response = client.chat.completions.create( 2 model="llama-3.3-70b-versatile", 3 messages=[ 4 {"role": "system", "content": "You are a helpful assistant."}, 5 {"role": "user", "content": "Hello"} 6 ] 7)

Async Client (Python)

python
1import asyncio 2from groq import AsyncGroq 3 4async def main(): 5 client = AsyncGroq() 6 response = await client.chat.completions.create( 7 model="llama-3.3-70b-versatile", 8 messages=[{"role": "user", "content": "Hello"}] 9 ) 10 return response.choices[0].message.content 11 12print(asyncio.run(main()))

JSON Mode

python
1response = client.chat.completions.create( 2 model="llama-3.3-70b-versatile", 3 messages=[{"role": "user", "content": "List 3 colors as JSON array"}], 4 response_format={"type": "json_object"} 5)

Structured Outputs (JSON Schema)

Force output to match a schema. Two modes available:

ModeGuaranteeModels
strict: true100% schema complianceopenai/gpt-oss-20b, openai/gpt-oss-120b
strict: falseBest-effort complianceAll supported models

Strict Mode (guaranteed compliance):

python
1response = client.chat.completions.create( 2 model="openai/gpt-oss-20b", 3 messages=[{"role": "user", "content": "Extract: John is 30 years old"}], 4 response_format={ 5 "type": "json_schema", 6 "json_schema": { 7 "name": "person", 8 "strict": True, 9 "schema": { 10 "type": "object", 11 "properties": { 12 "name": {"type": "string"}, 13 "age": {"type": "integer"} 14 }, 15 "required": ["name", "age"], 16 "additionalProperties": False 17 } 18 } 19 } 20)

With Pydantic:

python
1from pydantic import BaseModel 2 3class Person(BaseModel): 4 name: str 5 age: int 6 7response = client.chat.completions.create( 8 model="openai/gpt-oss-20b", 9 messages=[{"role": "user", "content": "Extract: John is 30"}], 10 response_format={ 11 "type": "json_schema", 12 "json_schema": { 13 "name": "person", 14 "strict": True, 15 "schema": Person.model_json_schema() 16 } 17 } 18) 19person = Person.model_validate(json.loads(response.choices[0].message.content))

See references/structured-outputs.md for schema requirements, validation libraries, and examples.

Audio

Transcription (Speech-to-Text)

python
1with open("audio.mp3", "rb") as f: 2 transcription = client.audio.transcriptions.create( 3 model="whisper-large-v3-turbo", 4 file=f, 5 language="en", # Optional: ISO-639-1 code 6 response_format="verbose_json", # json, text, verbose_json 7 timestamp_granularities=["word", "segment"] 8 ) 9print(transcription.text)

Translation (to English)

python
1with open("french_audio.mp3", "rb") as f: 2 translation = client.audio.translations.create( 3 model="whisper-large-v3", 4 file=f 5 ) 6print(translation.text) # English text

Text-to-Speech

python
1response = client.audio.speech.create( 2 model="playai-tts", 3 input="Hello, world!", 4 voice="Fritz-PlayAI", 5 response_format="wav", # flac, mp3, mulaw, ogg, wav 6 speed=1.0 # 0.5 to 5 7) 8response.write_to_file("output.wav")

Vision

Process images with Llama 4 multimodal models. Supports up to 5 images per request.

Models: meta-llama/llama-4-scout-17b-16e-instruct (faster), meta-llama/llama-4-maverick-17b-128e-instruct (higher quality)

Image from URL

python
1response = client.chat.completions.create( 2 model="meta-llama/llama-4-scout-17b-16e-instruct", 3 messages=[{ 4 "role": "user", 5 "content": [ 6 {"type": "text", "text": "What's in this image?"}, 7 {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}} 8 ] 9 }] 10)

Local Image (Base64)

python
1import base64 2 3def encode_image(path: str) -> str: 4 with open(path, "rb") as f: 5 return base64.b64encode(f.read()).decode("utf-8") 6 7response = client.chat.completions.create( 8 model="meta-llama/llama-4-scout-17b-16e-instruct", 9 messages=[{ 10 "role": "user", 11 "content": [ 12 {"type": "text", "text": "Describe this image"}, 13 {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}} 14 ] 15 }] 16)

OCR / Extract Data as JSON

python
1response = client.chat.completions.create( 2 model="meta-llama/llama-4-scout-17b-16e-instruct", 3 messages=[{ 4 "role": "user", 5 "content": [ 6 {"type": "text", "text": "Extract all text and data as JSON"}, 7 {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}} 8 ] 9 }], 10 response_format={"type": "json_object"} 11)

See references/vision.md for multi-image, tool use with images, and multi-turn conversations.

Tool Use

For tool calling patterns and examples, see references/tool-use.md.

Quick example:

python
1import json 2 3tools = [{ 4 "type": "function", 5 "function": { 6 "name": "get_weather", 7 "description": "Get weather for a location", 8 "parameters": { 9 "type": "object", 10 "properties": {"location": {"type": "string"}}, 11 "required": ["location"] 12 } 13 } 14}] 15 16response = client.chat.completions.create( 17 model="llama-3.3-70b-versatile", 18 messages=[{"role": "user", "content": "Weather in Paris?"}], 19 tools=tools 20) 21 22if response.choices[0].message.tool_calls: 23 for tc in response.choices[0].message.tool_calls: 24 args = json.loads(tc.function.arguments) 25 # Execute function and continue conversation

Built-In Tools (Agentic)

Use groq/compound or openai/gpt-oss-120b for built-in web search and code execution:

python
1response = client.chat.completions.create( 2 model="groq/compound", 3 messages=[{"role": "user", "content": "Search for latest Python news"}] 4) 5# Model automatically uses web search

MCP (Remote Tools)

Connect to third-party MCP servers for tools like Stripe, GitHub, web scraping. Use the Responses API:

python
1import openai 2 3client = openai.OpenAI( 4 api_key=os.environ.get("GROQ_API_KEY"), 5 base_url="https://api.groq.com/openai/v1" 6) 7 8response = client.responses.create( 9 model="openai/gpt-oss-120b", 10 input="What models are trending on Huggingface?", 11 tools=[{ 12 "type": "mcp", 13 "server_label": "Huggingface", 14 "server_url": "https://huggingface.co/mcp" 15 }] 16)

See references/tool-use.md for MCP configuration and popular servers.

Reasoning Models

Control how models think through complex problems.

Models: openai/gpt-oss-20b, openai/gpt-oss-120b (low/medium/high), qwen/qwen3-32b (none/default)

GPT-OSS with Reasoning Effort

python
1response = client.chat.completions.create( 2 model="openai/gpt-oss-20b", 3 messages=[{"role": "user", "content": "How many r's in strawberry?"}], 4 reasoning_effort="high", # low, medium, high 5 temperature=0.6, 6 max_completion_tokens=1024 7) 8 9print(response.choices[0].message.content) 10print("Reasoning:", response.choices[0].message.reasoning)

Qwen3 with Parsed Reasoning

python
1response = client.chat.completions.create( 2 model="qwen/qwen3-32b", 3 messages=[{"role": "user", "content": "Solve: x + 5 = 12"}], 4 reasoning_format="parsed" # raw, parsed, hidden 5) 6 7print("Answer:", response.choices[0].message.content) 8print("Reasoning:", response.choices[0].message.reasoning)

Hide Reasoning (GPT-OSS)

python
1response = client.chat.completions.create( 2 model="openai/gpt-oss-20b", 3 messages=[{"role": "user", "content": "What is 15% of 80?"}], 4 include_reasoning=False # Hide reasoning in response 5)

See references/reasoning.md for streaming, tool use with reasoning, and best practices.

Batch Processing

For high-volume async processing (24h-7d completion window):

python
1# 1. Create JSONL file with requests 2# 2. Upload file 3# 3. Create batch 4batch = client.batches.create( 5 input_file_id=file_id, 6 endpoint="/v1/chat/completions", 7 completion_window="24h" 8) 9 10# 4. Check status 11batch = client.batches.retrieve(batch.id) 12if batch.status == "completed": 13 results = client.files.content(batch.output_file_id)

See references/api-reference.md for full batch API details.

Prompt Caching

Automatically reduce latency and costs by 50% for repeated prompt prefixes. No code changes required.

Supported models: moonshotai/kimi-k2-instruct-0905, openai/gpt-oss-20b, openai/gpt-oss-120b, openai/gpt-oss-safeguard-20b

How it works:

  • Place static content (system prompts, tools, examples) at the beginning
  • Place dynamic content (user queries) at the end
  • Cache automatically matches prefixes and applies 50% discount
  • Cache expires after 2 hours without use

Track cache usage:

python
1response = client.chat.completions.create( 2 model="moonshotai/kimi-k2-instruct-0905", 3 messages=[{"role": "system", "content": large_system_prompt}, ...] 4) 5 6cached = response.usage.prompt_tokens_details.cached_tokens 7print(f"Cached tokens: {cached}") # 50% discount applied to these

See references/prompt-caching.md for optimization strategies and examples.

Content Moderation

Detect and filter harmful content using safeguard models.

Llama Guard 4

General content safety classification. Returns safe or unsafe\nSX (category code).

python
1response = client.chat.completions.create( 2 model="meta-llama/Llama-Guard-4-12B", 3 messages=[{"role": "user", "content": user_input}] 4) 5 6if response.choices[0].message.content.startswith("unsafe"): 7 # Block or handle unsafe content 8 pass

GPT-OSS Safeguard 20B

Prompt injection detection with custom policies. Returns structured JSON.

python
1response = client.chat.completions.create( 2 model="openai/gpt-oss-safeguard-20b", 3 messages=[ 4 {"role": "system", "content": injection_detection_policy}, 5 {"role": "user", "content": user_input} 6 ] 7) 8# Returns: {"violation": 1, "category": "Direct Override", "rationale": "..."}

See references/moderation.md for complete policies, harm taxonomy, and integration patterns.

Error Handling

python
1from groq import Groq, RateLimitError, APIConnectionError, APIStatusError 2 3client = Groq() 4 5try: 6 response = client.chat.completions.create( 7 model="llama-3.3-70b-versatile", 8 messages=[{"role": "user", "content": "Hello"}] 9 ) 10except RateLimitError: 11 # Wait and retry with exponential backoff 12 pass 13except APIConnectionError: 14 # Network issue 15 pass 16except APIStatusError as e: 17 # API error (check e.status_code) 18 pass

See references/audio.md for complete audio API reference including file handling, metadata fields, and prompting guidelines.

Resources

Verwandte Fähigkeiten

Looking for an alternative to groq-api or another community skill for your workflow? Explore these related open-source skills.

Alle anzeigen

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
Künstliche Intelligenz

widget-generator

Logo of f
f

Generate customizable widget plugins for the prompts.chat feed system

149.6k
0
Künstliche Intelligenz

flags

Logo of vercel
vercel

The React Framework

138.4k
0
Browser

pr-review

Logo of pytorch
pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

98.6k
0
Entwickler