dribl-crawling — how to use dribl-crawling how to use dribl-crawling, dribl-crawling setup guide, dribl-crawling vs web scraping, dribl-crawling install, what is dribl-crawling, dribl-crawling alternative, playwright-core tutorial, Sanity.io integration, Next.js static website

v1.0.0
GitHub

About this Skill

Perfect for Web Scraping Agents needing to extract data from Cloudflare-protected websites like fv.dribl.com dribl-crawling is a technique using playwright-core for real browser automation to extract data from fv.dribl.com

Features

Extracts clubs and fixtures data from https://fv.dribl.com/fixtures/ using playwright-core
Uses a two-phase workflow: extraction and transformation of raw API data
Stores raw data in data/external/fixtures/{team}/ format
Transforms data into validated, merged format in data/matches/
Utilizes Cloudflare protection for secure data extraction
Powers static websites built with Next and Sanity

# Core Topics

dejanvasic85 dejanvasic85
[0]
[0]
Updated: 3/11/2026

Quality Score

Top 5%
45
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
> npx killer-skills add dejanvasic85/williamstownsc/dribl-crawling
Supports 18+ Platforms
Cursor
Windsurf
VS Code
Trae
Claude
OpenClaw
+12 more

Agent Capability Analysis

The dribl-crawling MCP Server by dejanvasic85 is an open-source Community integration for Claude and other AI agents, enabling seamless task automation and capability expansion. Optimized for how to use dribl-crawling, dribl-crawling setup guide, dribl-crawling vs web scraping.

Ideal Agent Persona

Perfect for Web Scraping Agents needing to extract data from Cloudflare-protected websites like fv.dribl.com

Core Value

Empowers agents to crawl dribl.com using real browser automation with playwright-core, extracting clubs and fixtures data through a two-phase workflow of extraction and transformation, leveraging raw API data and validated, merged data

Capabilities Granted for dribl-crawling MCP Server

Automating data extraction from fv.dribl.com
Generating up-to-date clubs and fixtures data for sports websites
Transforming raw API data into validated, merged data for Williamstown SC website

! Prerequisites & Limits

  • Requires playwright-core for browser automation
  • Limited to fv.dribl.com and similar Cloudflare-protected websites
  • Needs to handle potential changes in dribl.com's API or website structure
Project
SKILL.md
10.4 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8

# Tags

[No tags]
SKILL.md
Readonly

Dribl Crawling

Overview

Extract clubs and fixtures data from https://fv.dribl.com/fixtures/ (SPA with Cloudflare protection) using real browser automation with playwright-core. Two-phase workflow: extraction (raw API data) → transformation (validated, merged data).

Purpose: Crawl dribl.com to maintain up-to-date clubs and fixtures data for Williamstown SC website.

Architecture

Data flow:

dribl API → data/external/fixtures/{team}/ (raw) → transform → data/matches/ (validated)
dribl API → data/external/clubs/ (raw) → transform → data/clubs/ (validated)

Two-phase pattern:

  1. Extraction: Playwright intercepts API requests, saves raw JSON
  2. Transformation: Read raw data, validate with Zod, transform, deduplicate, save

Key technologies:

  • playwright-core (real Chrome browser)
  • Zod validation schemas
  • TypeScript with tsx runner

Clubs Extraction

Reference: bin/crawlClubs.ts

Pattern:

typescript
1// Launch browser 2const browser = await chromium.launch({ 3 headless: false, 4 channel: 'chrome' 5}); 6 7// Custom user agent (bypass detection) 8const context = await browser.newContext({ 9 userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...', 10 viewport: { width: 1280, height: 720 } 11}); 12 13// Intercept API request 14const [clubsResponse] = await Promise.all([ 15 page.waitForResponse((response) => response.url().startsWith(clubsApiUrl) && response.ok(), { 16 timeout: 60_000 17 }), 18 page.goto(url, { waitUntil: 'domcontentloaded' }) 19]); 20 21// Validate and save 22const rawData = await clubsResponse.json(); 23const validated = externalApiResponseSchema.parse(rawData); 24writeFileSync(outputPath, JSON.stringify(validated, null, '\t') + '\n');

API endpoint:

  • URL: https://mc-api.dribl.com/api/list/clubs?disable_paging=true
  • Response: JSON with data array of club objects
  • Validation: externalApiResponseSchema (src/types/matches.ts)

Output:

  • Path: data/external/clubs/clubs.json
  • Format: Single JSON file with all clubs

CLI args:

  • --url <fixtures-page-url> (optional, defaults to standard fixtures page)

Fixtures Extraction

Pattern (implemented in bin/crawlFixtures.ts):

Steps:

  1. Navigate to https://fv.dribl.com/fixtures/
  2. Wait for SPA to load (waitUntil: 'domcontentloaded')
  3. Apply filters (REQUIRED):
    • Season (e.g., "2025")
    • Competition (e.g., "FFV")
    • League (e.g., "state-league-2-men-s-north-west")
  4. Intercept /api/fixtures responses
  5. Handle pagination:
    • Detect "Load more" button in DOM
    • Click button to load next chunk
    • Wait for new API response
    • Repeat until no more data
  6. Save each chunk as chunk-{index}.json

API endpoint:

  • URL: https://mc-api.dribl.com/api/fixtures
  • Query params: season, competition, league (from filters)
  • Response: JSON with data array, links (next/prev), meta (cursors)
  • Validation: externalFixturesApiResponseSchema

Output:

  • Path: data/external/fixtures/{team}/chunk-0.json, chunk-1.json, etc.
  • Format: Multiple JSON files (one per "Load more" click)
  • Naming: chunk-{index}.json where index starts at 0

CLI args:

  • --team <slug> (required) - Team slug for output folder (e.g., "state-league-2-men-s-north-west")
  • --league <slug> (required) - League slug for filtering (e.g., "State League 2 Men's - North-West")
  • --season <year> (optional, default to current year)
  • --competition <id> (optional, default to FFV)

Clubs Transformation

Reference: bin/syncClubs.ts

Pattern:

typescript
1// Load external data 2const externalResponse = loadExternalData(); // from data/external/clubs/ 3const validated = externalApiResponseSchema.parse(externalResponse); 4 5// Transform to internal format 6const apiClubs = externalResponse.data.map((externalClub) => transformExternalClub(externalClub)); 7 8// Load existing clubs 9const existingFile = loadExistingClubs(); // from data/clubs/ 10 11// Merge (deduplicate by externalId) 12const clubsMap = new Map<string, Club>(); 13for (const club of existingClubs) { 14 clubsMap.set(club.externalId, club); 15} 16for (const apiClub of apiClubs) { 17 clubsMap.set(apiClub.externalId, apiClub); // update or add 18} 19 20// Sort by name 21const mergedClubs = Array.from(clubsMap.values()).sort((a, b) => a.name.localeCompare(b.name)); 22 23// Save 24writeFileSync(CLUBS_FILE_PATH, JSON.stringify({ clubs: mergedClubs }, null, '\t'));

Transform service: src/lib/clubService.ts

  • transformExternalClub(): Converts external club format to internal format
  • Maps fields: id→externalId, attributes.name→name/displayName, etc.
  • Normalizes address (combines address_line_1 + address_line_2)
  • Maps socials array (name→platform, value→url)
  • Validates output with clubSchema

Output:

  • Path: data/clubs/clubs.json
  • Format: { clubs: Club[] }

Fixtures Transformation

Reference: bin/syncFixtures.ts

Pattern:

typescript
1// Read all chunk files 2const teamDir = path.join(EXTERNAL_DIR, team); 3const files = await fs.readdir(teamDir); 4const chunkFiles = files.filter((f) => f.match(/^chunk-\d+\.json$/)).sort(); // natural number sort 5 6// Load and validate each chunk 7const responses: ExternalFixturesApiResponse[] = []; 8for (const file of chunkFiles) { 9 const content = await fs.readFile(path.join(teamDir, file), 'utf-8'); 10 const validated = externalFixturesApiResponseSchema.parse(JSON.parse(content)); 11 responses.push(validated); 12} 13 14// Transform all fixtures 15const allFixtures = []; 16for (const response of responses) { 17 for (const externalFixture of response.data) { 18 const fixture = transformExternalFixture(externalFixture); 19 allFixtures.push(fixture); 20 } 21} 22 23// Deduplicate (by round + homeTeamId + awayTeamId) 24const seen = new Set<string>(); 25const deduplicated = allFixtures.filter((f) => { 26 const key = `${f.round}-${f.homeTeamId}-${f.awayTeamId}`; 27 if (seen.has(key)) return false; 28 seen.add(key); 29 return true; 30}); 31 32// Sort by round, then date 33const sorted = deduplicated.sort((a, b) => { 34 if (a.round !== b.round) return a.round - b.round; 35 return a.date.localeCompare(b.date); 36}); 37 38// Calculate metadata 39const totalRounds = Math.max(...sorted.map((f) => f.round), 0); 40 41// Save 42const fixtureData = { 43 competition: 'FFV', 44 season: 2025, 45 totalFixtures: sorted.length, 46 totalRounds, 47 fixtures: sorted 48}; 49writeFileSync(outputPath, JSON.stringify(fixtureData, null, '\t'));

Transform service: src/lib/matches/fixtureTransformService.ts

  • transformExternalFixture(): Converts external fixture format to internal format
  • Parses round number (e.g., "R1" → 1)
  • Formats date/time/day strings (ISO date, 24h time, weekday name)
  • Combines ground + field names for address
  • Finds club external IDs by matching team names/logos
  • Validates output with fixtureSchema

Output:

  • Path: data/matches/{team}.json
  • Format: { competition, season, totalFixtures, totalRounds, fixtures: Fixture[] }

CLI args:

  • --team <slug> (required) - Team slug to sync (e.g., "state-league-2-men-s-north-west")

Validation Schemas

Reference: src/types/matches.ts

External schemas (API responses):

  • externalApiResponseSchema: Clubs API response
  • externalClubSchema: Single club object
  • externalFixturesApiResponseSchema: Fixtures API response
  • externalFixtureSchema: Single fixture object

Internal schemas (transformed data):

  • clubSchema: Single club
  • clubsSchema: Clubs file ({ clubs: Club[] })
  • fixtureSchema: Single fixture
  • fixtureDataSchema: Fixtures file ({ competition, season, totalFixtures, totalRounds, fixtures })

Pattern: Always validate at boundaries (API → external schema, transform → internal schema)

CI Integration

Reference: .github/workflows/crawl-clubs.yml

Linux setup (GitHub Actions):

yaml
1- name: Install Chrome 2 run: npx playwright install --with-deps chrome 3 4- name: Crawl clubs 5 run: npm run crawl:clubs:ci -- ${{ inputs.url && format('--url "{0}"', inputs.url) || '' }}

Key points:

  • Use xvfb-run prefix on Linux for headless Chrome (e.g., xvfb-run npm run crawl:clubs)
  • Install with --with-deps flag to get system dependencies
  • Set appropriate timeout (5 min for clubs, may need more for fixtures)
  • Upload artifacts for data files

Package.json scripts pattern:

json
1{ 2 "crawl:clubs": "tsx bin/crawlClubs.ts", 3 "crawl:clubs:ci": "xvfb-run tsx bin/crawlClubs.ts", 4 "sync:clubs": "tsx bin/syncClubs.ts", 5 "sync:fixtures": "tsx bin/syncFixtures.ts" 6}

Best Practices

Logging:

  • Use emoji logging for clarity:
    • ✓ / ✅ - Success
    • ❌ - Error
    • 📂 - File operations
    • 🔄 - Processing/transformation
  • Log counts and progress for large operations

Error handling:

  • Try/catch at top level
  • Special handling for ZodError (print issues)
  • Exit with code 1 on failure
  • Close browser in finally block

File operations:

  • Always use mkdirSync(path, { recursive: true }) before writing
  • Format JSON with tabs: JSON.stringify(data, null, '\t')
  • Add newline at end of file: content + '\n'
  • Use absolute paths with resolve(__dirname, '../relative/path')

Data separation:

  • Keep raw external data in data/external/ (gitignored)
  • Keep transformed data in data/ (committed)
  • Never commit external API responses directly

Validation:

  • Validate immediately after receiving API data
  • Validate before writing transformed data
  • Use descriptive error messages with file paths

CLI arguments:

  • Use Commander library for consistent CLI parsing
  • Define options with .option() or .requiredOption()
  • Provide defaults for optional args
  • Commander auto-generates help text and validates required args

Common Patterns

Reading chunks:

typescript
1const files = await fs.readdir(dir); 2const chunks = files 3 .filter((f) => f.match(/^chunk-\d+\.json$/)) 4 .sort((a, b) => { 5 const numA = parseInt(a.match(/\d+/)?.[0] || '0', 10); 6 const numB = parseInt(b.match(/\d+/)?.[0] || '0', 10); 7 return numA - numB; 8 });

Deduplication:

typescript
1const seen = new Set<string>(); 2const unique = items.filter((item) => { 3 const key = computeKey(item); 4 if (seen.has(key)) return false; 5 seen.add(key); 6 return true; 7});

Merge with existing:

typescript
1const map = new Map<string, T>(); 2existing.forEach((item) => map.set(item.id, item)); 3incoming.forEach((item) => map.set(item.id, item)); // update or add 4const merged = Array.from(map.values());

Browser cleanup:

typescript
1let browser: Browser | undefined; 2try { 3 browser = await chromium.launch(...); 4 // work 5} finally { 6 if (browser) await browser.close(); 7}

Related Skills

Looking for an alternative to dribl-crawling or building a Community AI Agent? Explore these related open-source MCP Servers.

View All

widget-generator

Logo of f
f

widget-generator is an open-source AI agent skill for creating widget plugins that are injected into prompt feeds on prompts.chat. It supports two rendering modes: standard prompt widgets using default PromptCard styling and custom render widgets built as full React components.

149.6k
0
Design

linear

Logo of lobehub
lobehub

Linear is a workflow management system that enables multi-agent collaboration, effortless agent team design, and introduces agents as the unit of work interaction.

73.4k
0
Communication

testing

Logo of lobehub
lobehub

Testing is a process for verifying AI agent functionality using commands like bunx vitest run and optimizing workflows with targeted test runs.

73.3k
0
Communication

chat-sdk

Logo of lobehub
lobehub

chat-sdk is a unified TypeScript SDK for building chat bots across multiple platforms, providing a single interface for deploying bot logic.

73.0k
0
Communication