KS
Killer-Skills

reference-indexer — Categories.community

v1.0.0
GitHub

About this Skill

Perfect for Knowledge Retrieval Agents needing advanced document indexing and search capabilities for PDF and Word files. Sistema de alertas em tempo real para volatilidade de ativos da B3

arbgjr arbgjr
[0]
[0]
Updated: 3/5/2026

Quality Score

Top 5%
31
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
Cursor IDE Windsurf IDE VS Code IDE
> npx killer-skills add arbgjr/satra/reference-indexer

Agent Capability Analysis

The reference-indexer MCP Server by arbgjr is an open-source Categories.community integration for Claude and other AI agents, enabling seamless task automation and capability expansion.

Ideal Agent Persona

Perfect for Knowledge Retrieval Agents needing advanced document indexing and search capabilities for PDF and Word files.

Core Value

Empowers agents to manage external reference documents, extracting text, creating automatic summaries, and updating the RAG corpus using commands like /ref-add and /ref-search, supporting file formats like PDF and Word.

Capabilities Granted for reference-indexer MCP Server

Indexing legal documents for quick reference
Searching through external knowledge bases for specific queries
Automating the addition of new reference materials to the RAG corpus

! Prerequisites & Limits

  • Requires filesystem access to read and process documents
  • Limited to PDF and Word file formats for text extraction
  • Dependent on the RAG corpus for indexing and search functionality
Project
SKILL.md
4.2 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8

# Tags

[No tags]
SKILL.md
Readonly

Reference Indexer Skill

Proposito

Esta skill gerencia documentos de referencia externa, indexando-os para uso no RAG.

Comandos

/ref-add {path}

Adiciona documento ao indice de referencias:

bash
1/ref-add .agentic_sdlc/references/legal/lei-13775-2018.pdf

Acoes:

  1. Valida o arquivo
  2. Extrai texto (se PDF/Word)
  3. Cria resumo automatico
  4. Adiciona ao corpus RAG
  5. Atualiza indice

/ref-search {query}

Busca nos documentos de referencia:

bash
1/ref-search "prazo de aceite duplicata"

Retorna:

  • Documentos relevantes
  • Trechos com contexto
  • Score de relevancia

/ref-list

Lista todos os documentos indexados:

bash
1/ref-list

Mostra:

  • Documentos por categoria
  • Status de indexacao
  • Data de adicao

/ref-remove {path}

Remove documento do indice:

bash
1/ref-remove .agentic_sdlc/references/legal/documento-antigo.pdf

Formatos Suportados

FormatoExtensaoMetodo de Extracao
PDF.pdfpdftotext / PyPDF2
Word.docxpython-docx
Markdown.mdDireto
Texto.txtDireto
HTML.htmlBeautifulSoup

Estrutura de Referencias

.agentic_sdlc/references/
├── legal/              # Leis, regulamentos, normas
├── technical/          # RFCs, especificacoes tecnicas
├── business/           # Regras de negocio, manuais
├── internal/           # Documentos internos
└── _index.yml          # Indice de documentos

Indice de Documentos

Arquivo _index.yml:

yaml
1index: 2 version: 1 3 updated_at: "2026-01-12T..." 4 5documents: 6 - id: "ref-001" 7 path: "legal/lei-13775-2018.pdf" 8 title: "Lei 13.775/2018 - Duplicatas Eletrônicas" 9 category: legal 10 added_at: "2026-01-12T..." 11 indexed: true 12 summary: "Lei que regulamenta as duplicatas escriturais..." 13 keywords: 14 - duplicata 15 - escritural 16 - eletronica 17 page_count: 5 18 19 - id: "ref-002" 20 path: "technical/icp-brasil.pdf" 21 title: "Padrões ICP-Brasil" 22 category: technical 23 added_at: "2026-01-12T..." 24 indexed: true

Extracao de Texto

PDF

bash
1# Usando pdftotext (poppler-utils) 2pdftotext -layout input.pdf output.txt 3 4# Usando Python 5python3 << 'EOF' 6import PyPDF2 7 8with open('input.pdf', 'rb') as f: 9 reader = PyPDF2.PdfReader(f) 10 text = '' 11 for page in reader.pages: 12 text += page.extract_text() + '\n' 13 print(text) 14EOF

Word (docx)

python
1from docx import Document 2 3doc = Document('input.docx') 4text = '\n'.join([p.text for p in doc.paragraphs]) 5print(text)

Integracao com RAG

Documentos indexados sao adicionados ao corpus RAG:

yaml
1corpus_entry: 2 id: "ref-001" 3 source: "references/legal/lei-13775-2018.pdf" 4 type: "reference" 5 category: "legal" 6 content: "{texto extraido}" 7 embeddings: [...] # Gerado pelo RAG 8 metadata: 9 title: "Lei 13.775/2018" 10 page: 1 11 section: "Art. 1"

Workflow de Indexacao

yaml
1indexing_workflow: 2 1_validate: 3 - Verificar formato suportado 4 - Verificar tamanho (max 50MB) 5 - Verificar permissoes 6 7 2_extract: 8 - Extrair texto do documento 9 - Limpar formatacao 10 - Dividir em chunks 11 12 3_analyze: 13 - Gerar resumo automatico 14 - Extrair keywords 15 - Classificar categoria 16 17 4_index: 18 - Adicionar ao corpus RAG 19 - Gerar embeddings 20 - Atualizar indice 21 22 5_verify: 23 - Testar busca 24 - Verificar qualidade

Configuracao

No settings.json:

json
1{ 2 "memory": { 3 "rag_corpus": ".agentic_sdlc/corpus", 4 "max_document_size_mb": 50, 5 "chunk_size": 1000, 6 "chunk_overlap": 200 7 } 8}

Boas Praticas

  1. Nomeie arquivos descritivamente: lei-13775-2018-duplicatas.pdf
  2. Organize por categoria: legal, technical, business
  3. Mantenha versoes: Nao sobrescreva, versione
  4. Documente a fonte: Adicione de onde veio
  5. Resuma docs longos: Crie resumos para PDFs grandes

Troubleshooting

PDF nao extrai texto

Alguns PDFs sao imagens escaneadas. Use OCR:

bash
1ocrmypdf input.pdf output.pdf 2pdftotext output.pdf -

Documento muito grande

Divida em partes menores ou aumente max_document_size_mb.

Encoding incorreto

Force UTF-8 na extracao:

bash
1pdftotext -enc UTF-8 input.pdf output.txt

Related Skills

Looking for an alternative to reference-indexer or building a Categories.community AI Agent? Explore these related open-source MCP Servers.

View All

widget-generator

Logo of f
f

widget-generator is an open-source AI agent skill for creating widget plugins that are injected into prompt feeds on prompts.chat. It supports two rendering modes: standard prompt widgets using default PromptCard styling and custom render widgets built as full React components.

149.6k
0
Design

chat-sdk

Logo of lobehub
lobehub

chat-sdk is a unified TypeScript SDK for building chat bots across multiple platforms, providing a single interface for deploying bot logic.

73.0k
0
Communication

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication