KS
Killer-Skills

semantik-plugin-development — how to develop semantik plugins how to develop semantik plugins, semantik-plugin-development tutorial, semantik plugin alternatives, semantik-plugin-development vs semantic search engines, semantik-plugin-development setup guide, semantik-plugin-development install, what is semantik-plugin-development, semantik-plugin-development documentation, semantik plugin development best practices

v1.0.0
GitHub

About this Skill

Perfect for AI Agents needing custom semantic search engine integration, such as extending document ingestion, embedding, and reranking capabilities. semantik-plugin-development is a skill that allows developers to create plugins for Semantik, a self-hosted semantic search engine, to extend its capabilities for document ingestion, embedding, and other tasks.

Features

Extends Semantik's capabilities for document ingestion, embedding, chunking, reranking, extraction, and AI agents
Supports protocol version 1.0.0 with backwards compatibility
Allows plugins to run in-process with the main Semantik app
Enables customization of Semantik's search engine functionality
Supports development of plugins for specific use cases, such as extraction and embedding

# Core Topics

jbmiller10 jbmiller10
[0]
[0]
Updated: 3/7/2026

Quality Score

Top 5%
45
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
Cursor IDE Windsurf IDE VS Code IDE
> npx killer-skills add jbmiller10/semantik-plugin-template/embedding-guide.md

Agent Capability Analysis

The semantik-plugin-development MCP Server by jbmiller10 is an open-source Categories.community integration for Claude and other AI agents, enabling seamless task automation and capability expansion. Optimized for how to develop semantik plugins, semantik-plugin-development tutorial, semantik plugin alternatives.

Ideal Agent Persona

Perfect for AI Agents needing custom semantic search engine integration, such as extending document ingestion, embedding, and reranking capabilities.

Core Value

Empowers agents to create custom plugins for Semantik using protocol version 1.0.0, enabling advanced semantic search functionalities like chunking, extraction, and AI agent integration, all while ensuring compatibility through the protocol interface.

Capabilities Granted for semantik-plugin-development MCP Server

Extending document ingestion for specialized data formats
Creating custom embedding models for improved search results
Developing plugins for reranking search queries based on specific criteria

! Prerequisites & Limits

  • Plugins run in-process with the main Semantik app, requiring careful security considerations
  • Compatibility dependent on adherence to the Semantik protocol interface
Project
SKILL.md
11.9 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8

# Tags

[No tags]
SKILL.md
Readonly

Semantik Plugin Development

This skill helps you create plugins for Semantik, a self-hosted semantic search engine. Plugins extend Semantik's capabilities for document ingestion, embedding, chunking, reranking, extraction, and AI agents.

Protocol Version

Current Version: 1.0.0

Breaking changes to protocols increment the major version. Your plugins continue to work as long as they satisfy the protocol interface.

Security Note

Plugins run in-process with the main Semantik application (no sandboxing). Only install plugins you trust. See Security Guide for details.

Quick Start

Create a minimal connector plugin in 5 minutes:

python
1# my_connector.py 2from typing import ClassVar, Any, AsyncIterator 3import hashlib 4 5class MyConnector: 6 PLUGIN_ID: ClassVar[str] = "my-connector" 7 PLUGIN_TYPE: ClassVar[str] = "connector" 8 PLUGIN_VERSION: ClassVar[str] = "1.0.0" 9 10 def __init__(self, config: dict[str, Any]) -> None: 11 self._config = config 12 13 async def authenticate(self) -> bool: 14 return True 15 16 async def load_documents(self, source_id: int | None = None) -> AsyncIterator[dict[str, Any]]: 17 content = "Document content..." 18 yield { 19 "content": content, 20 "unique_id": "doc-1", 21 "source_type": self.PLUGIN_ID, 22 "metadata": {}, 23 "content_hash": hashlib.sha256(content.encode()).hexdigest(), 24 } 25 26 @classmethod 27 def get_config_fields(cls) -> list[dict[str, Any]]: 28 return [] 29 30 @classmethod 31 def get_secret_fields(cls) -> list[dict[str, Any]]: 32 return [] 33 34 @classmethod 35 def get_manifest(cls) -> dict[str, Any]: 36 return {"id": cls.PLUGIN_ID, "type": cls.PLUGIN_TYPE, "version": cls.PLUGIN_VERSION, 37 "display_name": "My Connector", "description": "Custom connector"}

Plugin Types

TypePurposeKey MethodTemplate
connectorIngest documents from sourcesload_documents()connector.py
embeddingConvert text to vectorsembed_texts()embedding.py
chunkingSplit documents into chunkschunk()chunking.py
rerankerReorder search resultsrerank()reranker.py
extractorExtract entities/metadataextract()extractor.py
agentLLM-powered capabilitiesexecute()agent.py

Type-specific guides:

Cross-cutting guides:


Development Approach

Protocol-Based (Recommended)

Use plain Python classes with no semantik imports. Plugins are validated by structural typing (duck typing):

python
1class MyPlugin: 2 PLUGIN_ID: ClassVar[str] = "my-plugin" 3 PLUGIN_TYPE: ClassVar[str] = "connector" # or embedding, chunking, etc. 4 PLUGIN_VERSION: ClassVar[str] = "1.0.0" 5 # ... implement required methods

Benefits:

  • Zero dependencies on semantik
  • Develop in separate repository
  • Distribute via PyPI or git
  • No version conflicts

ABC-Based (Advanced)

Inherit from semantik base classes when you need access to internal utilities:

python
1from shared.connectors.base import BaseConnector 2 3class MyConnector(BaseConnector): 4 # ... inherit helper methods

Use when:

  • Building embedding plugins with GPU management
  • Need access to shared utilities
  • Developing internal/builtin plugins

Required Class Variables

Every plugin must define:

python
1from typing import ClassVar, Any 2 3class MyPlugin: 4 PLUGIN_ID: ClassVar[str] = "my-plugin" # Unique ID (lowercase, hyphens) 5 PLUGIN_TYPE: ClassVar[str] = "connector" # One of 6 types 6 PLUGIN_VERSION: ClassVar[str] = "1.0.0" # Semantic version

Some plugin types require additional class variables:

TypeAdditional Variables
connectorMETADATA (dict with name, description, icon)
embeddingINTERNAL_NAME, API_ID, PROVIDER_TYPE, METADATA
chunking(none)
reranker(none)
extractor(none)
agent(none)

Manifest Method

All plugins must implement get_manifest():

python
1@classmethod 2def get_manifest(cls) -> dict[str, Any]: 3 return { 4 "id": cls.PLUGIN_ID, 5 "type": cls.PLUGIN_TYPE, 6 "version": cls.PLUGIN_VERSION, 7 "display_name": "My Plugin", 8 "description": "What the plugin does", 9 # Optional fields: 10 "author": "Your Name", 11 "license": "MIT", 12 "homepage": "https://github.com/...", 13 "requires": ["other-plugin"], # Dependencies 14 "capabilities": {}, # Plugin-specific capabilities 15 }

Configuration

Config Fields (UI)

Define configuration fields for the Semantik UI:

python
1@classmethod 2def get_config_fields(cls) -> list[dict[str, Any]]: 3 return [ 4 { 5 "name": "base_url", 6 "type": "text", # text, password, number, boolean, select 7 "label": "Base URL", 8 "description": "API endpoint", 9 "required": True, 10 "placeholder": "https://api.example.com", 11 }, 12 { 13 "name": "model", 14 "type": "select", 15 "label": "Model", 16 "options": ["model-a", "model-b"], 17 "default": "model-a", 18 }, 19 ]

Secret Fields

Mark fields that contain secrets (encrypted at rest):

python
1@classmethod 2def get_secret_fields(cls) -> list[dict[str, Any]]: 3 return [ 4 {"name": "api_key", "label": "API Key", "required": True}, 5 ]

Environment Variables

Use the _env suffix pattern for secrets:

python
1# In config schema - user enters env var name 2"api_key_env": "OPENAI_API_KEY" 3 4# At runtime, semantik resolves it 5config = {"api_key": "sk-actual-key-value"} # Resolved

Testing

Manual Verification

bash
1pip install -e . 2python -c " 3from my_plugin import MyConnector 4print(f'ID: {MyConnector.PLUGIN_ID}') 5print(f'Type: {MyConnector.PLUGIN_TYPE}') 6print(f'Manifest: {MyConnector.get_manifest()}') 7"

Protocol Validation

python
1import pytest 2 3class TestMyPlugin: 4 def test_has_required_attributes(self): 5 assert hasattr(MyPlugin, "PLUGIN_ID") 6 assert hasattr(MyPlugin, "PLUGIN_TYPE") 7 assert hasattr(MyPlugin, "PLUGIN_VERSION") 8 assert MyPlugin.PLUGIN_TYPE == "connector" 9 10 def test_manifest_format(self): 11 manifest = MyPlugin.get_manifest() 12 assert "id" in manifest 13 assert "type" in manifest 14 assert "display_name" in manifest 15 16 @pytest.mark.asyncio 17 async def test_core_functionality(self): 18 plugin = MyPlugin(config={}) 19 # Test plugin-specific methods

With Semantik Test Mixins

If semantik is installed:

python
1from shared.plugins.testing.contracts import ConnectorProtocolTestMixin 2 3class TestMyConnector(ConnectorProtocolTestMixin): 4 plugin_class = MyConnector

Packaging

pyproject.toml

toml
1[project] 2name = "semantik-plugin-myconnector" 3version = "1.0.0" 4requires-python = ">=3.10" 5dependencies = [] # Your dependencies only 6 7[project.entry-points."semantik.plugins"] 8my-connector = "my_plugin.connector:MyConnector" 9 10[build-system] 11requires = ["hatchling"] 12build-backend = "hatchling.build"

See templates/pyproject.toml for a complete template.

Entry Point Format

plugin-id = "module.path:ClassName"
  • plugin-id: Should match PLUGIN_ID
  • module.path: Python import path
  • ClassName: Your plugin class

Installation

bash
1# Development 2pip install -e . 3 4# From git 5pip install git+https://github.com/you/semantik-plugin-myconnector.git 6 7# Via Semantik API 8POST /api/v2/plugins/install 9{"install_command": "git+https://github.com/..."}

Common Issues

Plugin Not Loading

  1. Check entry point is registered:

    bash
    1pip show semantik-plugin-myconnector
  2. Verify PLUGIN_TYPE is valid:

    python
    1assert PLUGIN_TYPE in ["connector", "embedding", "chunking", "reranker", "extractor", "agent"]
  3. Check for import errors:

    python
    1try: 2 from my_plugin import MyConnector 3except ImportError as e: 4 print(f"Error: {e}")

Validation Errors

ErrorFix
missing required keys: {'content'}Add all required fields to returned dict
Invalid role: 'xyz'Use valid string from MESSAGE_ROLES
content_hash must be 64 charactersUse hashlib.sha256(text.encode()).hexdigest()

Async Issues

All I/O methods must be async:

python
1# Wrong 2def load_documents(self): 3 yield {"content": "..."} 4 5# Right 6async def load_documents(self) -> AsyncIterator[dict]: 7 yield {"content": "..."}

Templates

Ready-to-use templates in templates/:

FileDescription
connector.pyDocument source connector
embedding.pyEmbedding model provider
chunking.pyText chunking strategy
reranker.pySearch result reranker
extractor.pyEntity/metadata extractor
agent.pyLLM-powered agent
pyproject.tomlPackage configuration

Copy a template and modify:

bash
1cp templates/connector.py my_connector.py 2# Edit PLUGIN_ID, PLUGIN_VERSION, and implement methods

Data Format Reference

Connector Documents (IngestedDocumentDict)

python
1{ 2 "content": str, # Full text (required) 3 "unique_id": str, # Unique identifier (required) 4 "source_type": str, # Your PLUGIN_ID (required) 5 "metadata": dict, # Source metadata (required) 6 "content_hash": str, # SHA-256, 64 hex chars (required) 7 "file_path": str | None, # Local path (optional) 8}

Chunk Format (ChunkDict)

python
1{ 2 "content": str, # Chunk text (required) 3 "metadata": { # Chunk metadata (required) 4 "chunk_index": int, 5 "start_offset": int, 6 "end_offset": int, 7 }, 8 "chunk_id": str | None, # Unique ID (optional) 9 "embedding": list[float] | None, # Pre-computed (optional) 10}

Rerank Result (RerankResultDict)

python
1{ 2 "index": int, # Original document index (required) 3 "score": float, # Relevance score (required) 4 "text": str | None, # Document text (optional) 5 "metadata": dict | None, # Metadata (optional) 6}

Agent Message (AgentMessageDict)

python
1{ 2 "id": str, # Unique ID (required) 3 "role": str, # user, assistant, system, tool_call, tool_result, error 4 "type": str, # text, thinking, tool_use, tool_output, partial, final, error 5 "content": str, # Message content (required) 6 "timestamp": str, # ISO 8601 (required) 7 "is_partial": bool, # Streaming partial (optional) 8 "sequence_number": int, # Message order (optional) 9}

Getting Help

  • Semantik docs: See semantik/docs/external-plugins.md for protocol details
  • Protocol reference: See semantik/docs/plugin-protocols.md for full specifications
  • Examples: Check semantik/packages/shared/plugins/builtins/ for built-in plugins

Related Skills

Looking for an alternative to semantik-plugin-development or building a Categories.community AI Agent? Explore these related open-source MCP Servers.

View All

widget-generator

Logo of f
f

widget-generator is an open-source AI agent skill for creating widget plugins that are injected into prompt feeds on prompts.chat. It supports two rendering modes: standard prompt widgets using default PromptCard styling and custom render widgets built as full React components.

149.6k
0
Design

chat-sdk

Logo of lobehub
lobehub

chat-sdk is a unified TypeScript SDK for building chat bots across multiple platforms, providing a single interface for deploying bot logic.

73.0k
0
Communication

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication