What is semantik-plugin-development?

Perfect for AI Agents needing custom semantic search engine integration, such as extending document ingestion, embedding, and reranking capabilities. semantik-plugin-development is a skill that allows developers to create plugins for Semantik, a self-hosted semantic search engine, to extend its capabilities for document ingestion, embedding, and other tasks.

How do I install semantik-plugin-development?

Run the command: npx killer-skills add jbmiller10/semantik-plugin-template/embedding-guide.md. It works with Cursor, Windsurf, VS Code, Claude Code, and 15+ other IDEs.

What are the use cases for semantik-plugin-development?

Key use cases include: Extending document ingestion for specialized data formats, Creating custom embedding models for improved search results, Developing plugins for reranking search queries based on specific criteria.

Which IDEs are compatible with semantik-plugin-development?

This skill is compatible with Cursor, Windsurf, VS Code, Claude Code, GitHub Copilot, JetBrains, Cline, Roo Code, and many more. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for semantik-plugin-development?

Plugins run in-process with the main Semantik app, requiring careful security considerations. Compatibility dependent on adherence to the Semantik protocol interface.

Semantik Plugin Development

Name: semantik-plugin-development
Availability: InStock
Rating: 2.3 (1 reviews)
Author: jbmiller10

This skill helps you create plugins for Semantik, a self-hosted semantic search engine. Plugins extend Semantik's capabilities for document ingestion, embedding, chunking, reranking, extraction, and AI agents.

Protocol Version

Current Version: 1.0.0

Breaking changes to protocols increment the major version. Your plugins continue to work as long as they satisfy the protocol interface.

Security Note

Plugins run in-process with the main Semantik application (no sandboxing). Only install plugins you trust. See Security Guide for details.

Quick Start

Create a minimal connector plugin in 5 minutes:

python
1# my_connector.py
2from typing import ClassVar, Any, AsyncIterator
3import hashlib
4
5class MyConnector:
6    PLUGIN_ID: ClassVar[str] = "my-connector"
7    PLUGIN_TYPE: ClassVar[str] = "connector"
8    PLUGIN_VERSION: ClassVar[str] = "1.0.0"
9
10    def __init__(self, config: dict[str, Any]) -> None:
11        self._config = config
12
13    async def authenticate(self) -> bool:
14        return True
15
16    async def load_documents(self, source_id: int | None = None) -> AsyncIterator[dict[str, Any]]:
17        content = "Document content..."
18        yield {
19            "content": content,
20            "unique_id": "doc-1",
21            "source_type": self.PLUGIN_ID,
22            "metadata": {},
23            "content_hash": hashlib.sha256(content.encode()).hexdigest(),
24        }
25
26    @classmethod
27    def get_config_fields(cls) -> list[dict[str, Any]]:
28        return []
29
30    @classmethod
31    def get_secret_fields(cls) -> list[dict[str, Any]]:
32        return []
33
34    @classmethod
35    def get_manifest(cls) -> dict[str, Any]:
36        return {"id": cls.PLUGIN_ID, "type": cls.PLUGIN_TYPE, "version": cls.PLUGIN_VERSION,
37                "display_name": "My Connector", "description": "Custom connector"}

Plugin Types

Type	Purpose	Key Method	Template
`connector`	Ingest documents from sources	`load_documents()`	connector.py
`embedding`	Convert text to vectors	`embed_texts()`	embedding.py
`chunking`	Split documents into chunks	`chunk()`	chunking.py
`reranker`	Reorder search results	`rerank()`	reranker.py
`extractor`	Extract entities/metadata	`extract()`	extractor.py
`agent`	LLM-powered capabilities	`execute()`	agent.py

Type-specific guides:

Connector Guide - Document sources, async iterators
Embedding Guide - Query/document modes, dimensions
Chunking Guide - Text segmentation strategies
Reranker Guide - Cross-encoder reranking
Extractor Guide - Entity and metadata extraction
Agent Guide - LLM agents, streaming, context

Cross-cutting guides:

Testing Guide - Contract tests, mocks, fixtures
Security Guide - Trust model, best practices
Advanced Guide - Health checks, dependencies, migration

Development Approach

Protocol-Based (Recommended)

Use plain Python classes with no semantik imports. Plugins are validated by structural typing (duck typing):

python
1class MyPlugin:
2    PLUGIN_ID: ClassVar[str] = "my-plugin"
3    PLUGIN_TYPE: ClassVar[str] = "connector"  # or embedding, chunking, etc.
4    PLUGIN_VERSION: ClassVar[str] = "1.0.0"
5    # ... implement required methods

Benefits:

Zero dependencies on semantik
Develop in separate repository
Distribute via PyPI or git
No version conflicts

ABC-Based (Advanced)

Inherit from semantik base classes when you need access to internal utilities:

python
1from shared.connectors.base import BaseConnector
2
3class MyConnector(BaseConnector):
4    # ... inherit helper methods

Use when:

Building embedding plugins with GPU management
Need access to shared utilities
Developing internal/builtin plugins

Required Class Variables

Every plugin must define:

python
1from typing import ClassVar, Any
2
3class MyPlugin:
4    PLUGIN_ID: ClassVar[str] = "my-plugin"      # Unique ID (lowercase, hyphens)
5    PLUGIN_TYPE: ClassVar[str] = "connector"    # One of 6 types
6    PLUGIN_VERSION: ClassVar[str] = "1.0.0"     # Semantic version

Some plugin types require additional class variables:

Type	Additional Variables
`connector`	`METADATA` (dict with name, description, icon)
`embedding`	`INTERNAL_NAME`, `API_ID`, `PROVIDER_TYPE`, `METADATA`
`chunking`	(none)
`reranker`	(none)
`extractor`	(none)
`agent`	(none)

Manifest Method

All plugins must implement get_manifest():

python
1@classmethod
2def get_manifest(cls) -> dict[str, Any]:
3    return {
4        "id": cls.PLUGIN_ID,
5        "type": cls.PLUGIN_TYPE,
6        "version": cls.PLUGIN_VERSION,
7        "display_name": "My Plugin",
8        "description": "What the plugin does",
9        # Optional fields:
10        "author": "Your Name",
11        "license": "MIT",
12        "homepage": "https://github.com/...",
13        "requires": ["other-plugin"],  # Dependencies
14        "capabilities": {},  # Plugin-specific capabilities
15    }

Configuration

Config Fields (UI)

Define configuration fields for the Semantik UI:

python
1@classmethod
2def get_config_fields(cls) -> list[dict[str, Any]]:
3    return [
4        {
5            "name": "base_url",
6            "type": "text",        # text, password, number, boolean, select
7            "label": "Base URL",
8            "description": "API endpoint",
9            "required": True,
10            "placeholder": "https://api.example.com",
11        },
12        {
13            "name": "model",
14            "type": "select",
15            "label": "Model",
16            "options": ["model-a", "model-b"],
17            "default": "model-a",
18        },
19    ]

Secret Fields

Mark fields that contain secrets (encrypted at rest):

python
1@classmethod
2def get_secret_fields(cls) -> list[dict[str, Any]]:
3    return [
4        {"name": "api_key", "label": "API Key", "required": True},
5    ]

Environment Variables

Use the _env suffix pattern for secrets:

python
1# In config schema - user enters env var name
2"api_key_env": "OPENAI_API_KEY"
3
4# At runtime, semantik resolves it
5config = {"api_key": "sk-actual-key-value"}  # Resolved

Testing

Manual Verification

bash
1pip install -e .
2python -c "
3from my_plugin import MyConnector
4print(f'ID: {MyConnector.PLUGIN_ID}')
5print(f'Type: {MyConnector.PLUGIN_TYPE}')
6print(f'Manifest: {MyConnector.get_manifest()}')
7"

Protocol Validation

python
1import pytest
2
3class TestMyPlugin:
4    def test_has_required_attributes(self):
5        assert hasattr(MyPlugin, "PLUGIN_ID")
6        assert hasattr(MyPlugin, "PLUGIN_TYPE")
7        assert hasattr(MyPlugin, "PLUGIN_VERSION")
8        assert MyPlugin.PLUGIN_TYPE == "connector"
9
10    def test_manifest_format(self):
11        manifest = MyPlugin.get_manifest()
12        assert "id" in manifest
13        assert "type" in manifest
14        assert "display_name" in manifest
15
16    @pytest.mark.asyncio
17    async def test_core_functionality(self):
18        plugin = MyPlugin(config={})
19        # Test plugin-specific methods

With Semantik Test Mixins

If semantik is installed:

python
1from shared.plugins.testing.contracts import ConnectorProtocolTestMixin
2
3class TestMyConnector(ConnectorProtocolTestMixin):
4    plugin_class = MyConnector

Packaging

pyproject.toml

toml
1[project]
2name = "semantik-plugin-myconnector"
3version = "1.0.0"
4requires-python = ">=3.10"
5dependencies = []  # Your dependencies only
6
7[project.entry-points."semantik.plugins"]
8my-connector = "my_plugin.connector:MyConnector"
9
10[build-system]
11requires = ["hatchling"]
12build-backend = "hatchling.build"

See templates/pyproject.toml for a complete template.

Entry Point Format

plugin-id = "module.path:ClassName"

plugin-id: Should match PLUGIN_ID
module.path: Python import path
ClassName: Your plugin class

Installation

bash
1# Development
2pip install -e .
3
4# From git
5pip install git+https://github.com/you/semantik-plugin-myconnector.git
6
7# Via Semantik API
8POST /api/v2/plugins/install
9{"install_command": "git+https://github.com/..."}

Common Issues

Plugin Not Loading

Check entry point is registered:

bash
1pip show semantik-plugin-myconnector

Verify PLUGIN_TYPE is valid:

python
1assert PLUGIN_TYPE in ["connector", "embedding", "chunking", "reranker", "extractor", "agent"]

Check for import errors:

python
1try:
2    from my_plugin import MyConnector
3except ImportError as e:
4    print(f"Error: {e}")

Validation Errors

Error	Fix
`missing required keys: {'content'}`	Add all required fields to returned dict
`Invalid role: 'xyz'`	Use valid string from MESSAGE_ROLES
`content_hash must be 64 characters`	Use `hashlib.sha256(text.encode()).hexdigest()`

Async Issues

All I/O methods must be async:

python
1# Wrong
2def load_documents(self):
3    yield {"content": "..."}
4
5# Right
6async def load_documents(self) -> AsyncIterator[dict]:
7    yield {"content": "..."}

Templates

Ready-to-use templates in templates/:

File	Description
`connector.py`	Document source connector
`embedding.py`	Embedding model provider
`chunking.py`	Text chunking strategy
`reranker.py`	Search result reranker
`extractor.py`	Entity/metadata extractor
`agent.py`	LLM-powered agent
`pyproject.toml`	Package configuration

Copy a template and modify:

bash
1cp templates/connector.py my_connector.py
2# Edit PLUGIN_ID, PLUGIN_VERSION, and implement methods

Data Format Reference

Connector Documents (IngestedDocumentDict)

python
1{
2    "content": str,              # Full text (required)
3    "unique_id": str,            # Unique identifier (required)
4    "source_type": str,          # Your PLUGIN_ID (required)
5    "metadata": dict,            # Source metadata (required)
6    "content_hash": str,         # SHA-256, 64 hex chars (required)
7    "file_path": str | None,     # Local path (optional)
8}

Chunk Format (ChunkDict)

python
1{
2    "content": str,              # Chunk text (required)
3    "metadata": {                # Chunk metadata (required)
4        "chunk_index": int,
5        "start_offset": int,
6        "end_offset": int,
7    },
8    "chunk_id": str | None,      # Unique ID (optional)
9    "embedding": list[float] | None,  # Pre-computed (optional)
10}

Rerank Result (RerankResultDict)

python
1{
2    "index": int,                # Original document index (required)
3    "score": float,              # Relevance score (required)
4    "text": str | None,          # Document text (optional)
5    "metadata": dict | None,     # Metadata (optional)
6}

Agent Message (AgentMessageDict)

python
1{
2    "id": str,                   # Unique ID (required)
3    "role": str,                 # user, assistant, system, tool_call, tool_result, error
4    "type": str,                 # text, thinking, tool_use, tool_output, partial, final, error
5    "content": str,              # Message content (required)
6    "timestamp": str,            # ISO 8601 (required)
7    "is_partial": bool,          # Streaming partial (optional)
8    "sequence_number": int,      # Message order (optional)
9}

Getting Help

Semantik docs: See semantik/docs/external-plugins.md for protocol details
Protocol reference: See semantik/docs/plugin-protocols.md for full specifications
Examples: Check semantik/packages/shared/plugins/builtins/ for built-in plugins

About this Skill

Features

# Core Topics

↓ Quality Score

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for semantik-plugin-development MCP Server

! Prerequisites & Limits

# Tags