What is dribl-crawling?

Perfect for Web Scraping Agents needing to extract data from Cloudflare-protected websites like fv.dribl.com dribl-crawling is a technique using playwright-core for real browser automation to extract data from fv.dribl.com

How do I install dribl-crawling?

Run the command: npx killer-skills add dejanvasic85/williamstownsc/dribl-crawling. It works with Cursor, Windsurf, VS Code, Claude Code, and 15+ other IDEs.

What are the use cases for dribl-crawling?

Key use cases include: Automating data extraction from fv.dribl.com, Generating up-to-date clubs and fixtures data for sports websites, Transforming raw API data into validated, merged data for Williamstown SC website.

Which IDEs are compatible with dribl-crawling?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for dribl-crawling?

Requires playwright-core for browser automation. Limited to fv.dribl.com and similar Cloudflare-protected websites. Needs to handle potential changes in dribl.com's API or website structure.

Dribl Crawling

Name: dribl-crawling
Availability: InStock
Rating: 2.3 (1 reviews)
Author: dejanvasic85

Overview

Extract clubs and fixtures data from https://fv.dribl.com/fixtures/ (SPA with Cloudflare protection) using real browser automation with playwright-core. Two-phase workflow: extraction (raw API data) → transformation (validated, merged data).

Purpose: Crawl dribl.com to maintain up-to-date clubs and fixtures data for Williamstown SC website.

Architecture

Data flow:

dribl API → data/external/fixtures/{team}/ (raw) → transform → data/matches/ (validated)
dribl API → data/external/clubs/ (raw) → transform → data/clubs/ (validated)

Two-phase pattern:

Extraction: Playwright intercepts API requests, saves raw JSON
Transformation: Read raw data, validate with Zod, transform, deduplicate, save

Key technologies:

playwright-core (real Chrome browser)
Zod validation schemas
TypeScript with tsx runner

Clubs Extraction

Reference: bin/crawlClubs.ts

Pattern:

typescript
1// Launch browser
2const browser = await chromium.launch({
3	headless: false,
4	channel: 'chrome'
5});
6
7// Custom user agent (bypass detection)
8const context = await browser.newContext({
9	userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...',
10	viewport: { width: 1280, height: 720 }
11});
12
13// Intercept API request
14const [clubsResponse] = await Promise.all([
15	page.waitForResponse((response) => response.url().startsWith(clubsApiUrl) && response.ok(), {
16		timeout: 60_000
17	}),
18	page.goto(url, { waitUntil: 'domcontentloaded' })
19]);
20
21// Validate and save
22const rawData = await clubsResponse.json();
23const validated = externalApiResponseSchema.parse(rawData);
24writeFileSync(outputPath, JSON.stringify(validated, null, '\t') + '\n');

API endpoint:

URL: https://mc-api.dribl.com/api/list/clubs?disable_paging=true
Response: JSON with data array of club objects
Validation: externalApiResponseSchema (src/types/matches.ts)

Output:

Path: data/external/clubs/clubs.json
Format: Single JSON file with all clubs

CLI args:

--url <fixtures-page-url> (optional, defaults to standard fixtures page)

Fixtures Extraction

Pattern (implemented in bin/crawlFixtures.ts):

Steps:

Navigate to https://fv.dribl.com/fixtures/
Wait for SPA to load (waitUntil: 'domcontentloaded')
Apply filters (REQUIRED):
- Season (e.g., "2025")
- Competition (e.g., "FFV")
- League (e.g., "state-league-2-men-s-north-west")
Intercept /api/fixtures responses
Handle pagination:
- Detect "Load more" button in DOM
- Click button to load next chunk
- Wait for new API response
- Repeat until no more data
Save each chunk as chunk-{index}.json

API endpoint:

URL: https://mc-api.dribl.com/api/fixtures
Query params: season, competition, league (from filters)
Response: JSON with data array, links (next/prev), meta (cursors)
Validation: externalFixturesApiResponseSchema

Output:

Path: data/external/fixtures/{team}/chunk-0.json, chunk-1.json, etc.
Format: Multiple JSON files (one per "Load more" click)
Naming: chunk-{index}.json where index starts at 0

CLI args:

--team <slug> (required) - Team slug for output folder (e.g., "state-league-2-men-s-north-west")
--league <slug> (required) - League slug for filtering (e.g., "State League 2 Men's - North-West")
--season <year> (optional, default to current year)
--competition <id> (optional, default to FFV)

Clubs Transformation

Reference: bin/syncClubs.ts

Pattern:

typescript
1// Load external data
2const externalResponse = loadExternalData(); // from data/external/clubs/
3const validated = externalApiResponseSchema.parse(externalResponse);
4
5// Transform to internal format
6const apiClubs = externalResponse.data.map((externalClub) => transformExternalClub(externalClub));
7
8// Load existing clubs
9const existingFile = loadExistingClubs(); // from data/clubs/
10
11// Merge (deduplicate by externalId)
12const clubsMap = new Map<string, Club>();
13for (const club of existingClubs) {
14	clubsMap.set(club.externalId, club);
15}
16for (const apiClub of apiClubs) {
17	clubsMap.set(apiClub.externalId, apiClub); // update or add
18}
19
20// Sort by name
21const mergedClubs = Array.from(clubsMap.values()).sort((a, b) => a.name.localeCompare(b.name));
22
23// Save
24writeFileSync(CLUBS_FILE_PATH, JSON.stringify({ clubs: mergedClubs }, null, '\t'));

Transform service: src/lib/clubService.ts

transformExternalClub(): Converts external club format to internal format
Maps fields: id→externalId, attributes.name→name/displayName, etc.
Normalizes address (combines address_line_1 + address_line_2)
Maps socials array (name→platform, value→url)
Validates output with clubSchema

Output:

Path: data/clubs/clubs.json
Format: { clubs: Club[] }

Fixtures Transformation

Reference: bin/syncFixtures.ts

Pattern:

typescript
1// Read all chunk files
2const teamDir = path.join(EXTERNAL_DIR, team);
3const files = await fs.readdir(teamDir);
4const chunkFiles = files.filter((f) => f.match(/^chunk-\d+\.json$/)).sort(); // natural number sort
5
6// Load and validate each chunk
7const responses: ExternalFixturesApiResponse[] = [];
8for (const file of chunkFiles) {
9	const content = await fs.readFile(path.join(teamDir, file), 'utf-8');
10	const validated = externalFixturesApiResponseSchema.parse(JSON.parse(content));
11	responses.push(validated);
12}
13
14// Transform all fixtures
15const allFixtures = [];
16for (const response of responses) {
17	for (const externalFixture of response.data) {
18		const fixture = transformExternalFixture(externalFixture);
19		allFixtures.push(fixture);
20	}
21}
22
23// Deduplicate (by round + homeTeamId + awayTeamId)
24const seen = new Set<string>();
25const deduplicated = allFixtures.filter((f) => {
26	const key = `${f.round}-${f.homeTeamId}-${f.awayTeamId}`;
27	if (seen.has(key)) return false;
28	seen.add(key);
29	return true;
30});
31
32// Sort by round, then date
33const sorted = deduplicated.sort((a, b) => {
34	if (a.round !== b.round) return a.round - b.round;
35	return a.date.localeCompare(b.date);
36});
37
38// Calculate metadata
39const totalRounds = Math.max(...sorted.map((f) => f.round), 0);
40
41// Save
42const fixtureData = {
43	competition: 'FFV',
44	season: 2025,
45	totalFixtures: sorted.length,
46	totalRounds,
47	fixtures: sorted
48};
49writeFileSync(outputPath, JSON.stringify(fixtureData, null, '\t'));

Transform service: src/lib/matches/fixtureTransformService.ts

transformExternalFixture(): Converts external fixture format to internal format
Parses round number (e.g., "R1" → 1)
Formats date/time/day strings (ISO date, 24h time, weekday name)
Combines ground + field names for address
Finds club external IDs by matching team names/logos
Validates output with fixtureSchema

Output:

Path: data/matches/{team}.json
Format: { competition, season, totalFixtures, totalRounds, fixtures: Fixture[] }

CLI args:

--team <slug> (required) - Team slug to sync (e.g., "state-league-2-men-s-north-west")

Validation Schemas

Reference: src/types/matches.ts

External schemas (API responses):

externalApiResponseSchema: Clubs API response
externalClubSchema: Single club object
externalFixturesApiResponseSchema: Fixtures API response
externalFixtureSchema: Single fixture object

Internal schemas (transformed data):

clubSchema: Single club
clubsSchema: Clubs file ({ clubs: Club[] })
fixtureSchema: Single fixture
fixtureDataSchema: Fixtures file ({ competition, season, totalFixtures, totalRounds, fixtures })

Pattern: Always validate at boundaries (API → external schema, transform → internal schema)

CI Integration

Reference: .github/workflows/crawl-clubs.yml

Linux setup (GitHub Actions):

yaml
1- name: Install Chrome
2  run: npx playwright install --with-deps chrome
3
4- name: Crawl clubs
5  run: npm run crawl:clubs:ci -- ${{ inputs.url && format('--url "{0}"', inputs.url) || '' }}

Key points:

Use xvfb-run prefix on Linux for headless Chrome (e.g., xvfb-run npm run crawl:clubs)
Install with --with-deps flag to get system dependencies
Set appropriate timeout (5 min for clubs, may need more for fixtures)
Upload artifacts for data files

Package.json scripts pattern:

json
1{
2	"crawl:clubs": "tsx bin/crawlClubs.ts",
3	"crawl:clubs:ci": "xvfb-run tsx bin/crawlClubs.ts",
4	"sync:clubs": "tsx bin/syncClubs.ts",
5	"sync:fixtures": "tsx bin/syncFixtures.ts"
6}

Best Practices

Logging:

Use emoji logging for clarity:
- ✓ / ✅ - Success
- ❌ - Error
- 📂 - File operations
- 🔄 - Processing/transformation
Log counts and progress for large operations

Error handling:

Try/catch at top level
Special handling for ZodError (print issues)
Exit with code 1 on failure
Close browser in finally block

File operations:

Always use mkdirSync(path, { recursive: true }) before writing
Format JSON with tabs: JSON.stringify(data, null, '\t')
Add newline at end of file: content + '\n'
Use absolute paths with resolve(__dirname, '../relative/path')

Data separation:

Keep raw external data in data/external/ (gitignored)
Keep transformed data in data/ (committed)
Never commit external API responses directly

Validation:

Validate immediately after receiving API data
Validate before writing transformed data
Use descriptive error messages with file paths

CLI arguments:

Use Commander library for consistent CLI parsing
Define options with .option() or .requiredOption()
Provide defaults for optional args
Commander auto-generates help text and validates required args

Common Patterns

Reading chunks:

typescript
1const files = await fs.readdir(dir);
2const chunks = files
3	.filter((f) => f.match(/^chunk-\d+\.json$/))
4	.sort((a, b) => {
5		const numA = parseInt(a.match(/\d+/)?.[0] || '0', 10);
6		const numB = parseInt(b.match(/\d+/)?.[0] || '0', 10);
7		return numA - numB;
8	});

Deduplication:

typescript
1const seen = new Set<string>();
2const unique = items.filter((item) => {
3	const key = computeKey(item);
4	if (seen.has(key)) return false;
5	seen.add(key);
6	return true;
7});

Merge with existing:

typescript
1const map = new Map<string, T>();
2existing.forEach((item) => map.set(item.id, item));
3incoming.forEach((item) => map.set(item.id, item)); // update or add
4const merged = Array.from(map.values());

Browser cleanup:

typescript
1let browser: Browser | undefined;
2try {
3  browser = await chromium.launch(...);
4  // work
5} finally {
6  if (browser) await browser.close();
7}

dribl-crawling — how to use dribl-crawling how to use dribl-crawling, dribl-crawling setup guide, dribl-crawling vs web scraping, dribl-crawling install, what is dribl-crawling, dribl-crawling alternative, playwright-core tutorial, Sanity.io integration, Next.js static website

# Core Topics

↓ Quality Score

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for dribl-crawling MCP Server

! Prerequisites & Limits

# Tags

Dribl Crawling

Overview

Architecture

Clubs Extraction

Fixtures Extraction

Clubs Transformation

Fixtures Transformation

Validation Schemas

CI Integration

Best Practices

Common Patterns

Related Skills

Looking for an alternative to dribl-crawling or building a Community AI Agent? Explore these related open-source MCP Servers.

widget-generator

linear

testing

chat-sdk

dribl-crawling — how to use dribl-crawling how to use dribl-crawling, dribl-crawling setup guide, dribl-crawling vs web scraping, dribl-crawling install, what is dribl-crawling, dribl-crawling alternative, playwright-core tutorial, Sanity.io integration, Next.js static website

About this Skill

Features

# Core Topics

↓ Quality Score

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for dribl-crawling MCP Server

! Prerequisites & Limits

# Tags

Dribl Crawling

Overview

Architecture

Clubs Extraction

Fixtures Extraction

Clubs Transformation

Fixtures Transformation

Validation Schemas

CI Integration

Best Practices

Common Patterns

Related Skills

Looking for an alternative to dribl-crawling or building a Community AI Agent? Explore these related open-source MCP Servers.

widget-generator

linear

testing

chat-sdk