Snowflake Documentation Scraper
Scrapes docs.snowflake.com sections to Markdown with SQLite caching (7-day expiration).
Usage
First time setup (auto-installs uv and doc-scraper):
bash1python3 .claude/skills/doc-scraper/scripts/doc_scraper.py
Subsequent runs:
bash1doc-scraper --output-dir=./snowflake-docs 2doc-scraper --output-dir=./snowflake-docs --base-path="/en/sql-reference/" 3doc-scraper --output-dir=./snowflake-docs --spider-depth=2
Command Options
| Option | Default | Description |
|---|---|---|
--output-dir | Required | Output directory for scraped docs |
--base-path | /en/migrations/ | URL section to scrape |
--spider-depth | 1 | Link depth: 0=seeds, 1=+links, 2=+2nd |
--limit | None | Cap URLs (for testing) |
--dry-run | - | Preview without writing |
Output
sql1output-dir/ 2├── SKILL.md # Auto-generated index 3├── scraper_config.yaml # Editable config (auto-created) 4├── .cache/ # SQLite cache (auto-managed) 5└── en/migrations/*.md # Scraped pages with frontmatter
Configuration
Auto-created at {output-dir}/scraper_config.yaml:
yaml1rate_limiting: 2 max_concurrent_threads: 4 3spider: 4 max_pages: 1000 5 allowed_paths: ["/en/"] 6scraped_pages: 7 expiration_days: 7
Troubleshooting
| Issue | Solution |
|---|---|
| Too many pages | Lower --spider-depth or edit config |
| Missing pages | Increase --spider-depth |
| Cache corruption | Delete {output-dir}/.cache/ (rare) |