KS
Killer-Skills

doc-scraper — how to use doc-scraper how to use doc-scraper, doc-scraper setup guide, Snowflake documentation scraper, doc-scraper vs dbt, doc-scraper install, what is doc-scraper, doc-scraper alternative, doc-scraper and Snowflake Data Cloud, doc-scraper Python script, doc-scraper SQLite caching

v1.0.0
GitHub

About this Skill

Perfect for Data Analysis Agents needing streamlined Snowflake documentation management and Markdown conversion. doc-scraper is a Python-based tool that scrapes Snowflake documentation into Markdown format, utilizing SQLite caching for efficient data storage and retrieval.

Features

Scrapes docs.snowflake.com sections to Markdown format
Utilizes SQLite caching with a 7-day expiration policy
Supports customizable output directories via the --output-dir command option
Allows for adjustable spider depth using the --spider-depth command option
Auto-installs required dependencies, including uv and doc-scraper, on first-time setup
Enables subsequent runs with simplified command-line interfaces

# Core Topics

sfc-gh-dflippo sfc-gh-dflippo
[0]
[0]
Updated: 3/6/2026

Quality Score

Top 5%
51
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
Cursor IDE Windsurf IDE VS Code IDE
> npx killer-skills add sfc-gh-dflippo/snowflake-dbt-demo/doc-scraper

Agent Capability Analysis

The doc-scraper MCP Server by sfc-gh-dflippo is an open-source Categories.community integration for Claude and other AI agents, enabling seamless task automation and capability expansion. Optimized for how to use doc-scraper, doc-scraper setup guide, Snowflake documentation scraper.

Ideal Agent Persona

Perfect for Data Analysis Agents needing streamlined Snowflake documentation management and Markdown conversion.

Core Value

Empowers agents to scrape and cache Snowflake documentation from docs.snowflake.com to Markdown using SQLite, facilitating efficient data cloud management and dbt development with protocols like SQL and data formats like Markdown.

Capabilities Granted for doc-scraper MCP Server

Scraping Snowflake documentation for offline access
Caching SQL reference guides for faster lookup
Converting documentation to Markdown for easier integration with dbt projects

! Prerequisites & Limits

  • Requires Python 3 and uv library installation
  • 7-day caching expiration limit
  • Limited to scraping docs.snowflake.com sections
Project
SKILL.md
1.8 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8

# Tags

[No tags]
SKILL.md
Readonly

Snowflake Documentation Scraper

Scrapes docs.snowflake.com sections to Markdown with SQLite caching (7-day expiration).

Usage

First time setup (auto-installs uv and doc-scraper):

bash
1python3 .claude/skills/doc-scraper/scripts/doc_scraper.py

Subsequent runs:

bash
1doc-scraper --output-dir=./snowflake-docs 2doc-scraper --output-dir=./snowflake-docs --base-path="/en/sql-reference/" 3doc-scraper --output-dir=./snowflake-docs --spider-depth=2

Command Options

OptionDefaultDescription
--output-dirRequiredOutput directory for scraped docs
--base-path/en/migrations/URL section to scrape
--spider-depth1Link depth: 0=seeds, 1=+links, 2=+2nd
--limitNoneCap URLs (for testing)
--dry-run-Preview without writing

Output

sql
1output-dir/ 2├── SKILL.md # Auto-generated index 3├── scraper_config.yaml # Editable config (auto-created) 4├── .cache/ # SQLite cache (auto-managed) 5└── en/migrations/*.md # Scraped pages with frontmatter

Configuration

Auto-created at {output-dir}/scraper_config.yaml:

yaml
1rate_limiting: 2 max_concurrent_threads: 4 3spider: 4 max_pages: 1000 5 allowed_paths: ["/en/"] 6scraped_pages: 7 expiration_days: 7

Troubleshooting

IssueSolution
Too many pagesLower --spider-depth or edit config
Missing pagesIncrease --spider-depth
Cache corruptionDelete {output-dir}/.cache/ (rare)

Related Skills

Looking for an alternative to doc-scraper or building a Categories.community AI Agent? Explore these related open-source MCP Servers.

View All

widget-generator

Logo of f
f

widget-generator is an open-source AI agent skill for creating widget plugins that are injected into prompt feeds on prompts.chat. It supports two rendering modes: standard prompt widgets using default PromptCard styling and custom render widgets built as full React components.

149.6k
0
Design

chat-sdk

Logo of lobehub
lobehub

chat-sdk is a unified TypeScript SDK for building chat bots across multiple platforms, providing a single interface for deploying bot logic.

73.0k
0
Communication

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication