KS
Killer-Skills

clickhouse-io — how to use clickhouse-io how to use clickhouse-io, clickhouse-io vs PostgreSQL, clickhouse-io setup guide, what is clickhouse-io, clickhouse-io alternative to MySQL, clickhouse-io install on Ubuntu, clickhouse-io query optimization techniques, clickhouse-io data engineering best practices

Verified
v1.0.0
GitHub

About this Skill

Essential for Data Engineering Agents specializing in high-performance analytical database operations. clickhouse-io is a skill that provides expert guidance on ClickHouse database patterns, query optimization, and data engineering for high-performance analytics and data engineering workloads.

Features

Designs ClickHouse table schemas using MergeTree engine selection
Writes optimized analytical queries with aggregations, window functions, and joins
Optimizes query performance using partition pruning, projections, and materialized views
Ingests large volumes of data using batch inserts and Kafka integration
Migrates from PostgreSQL/MySQL to ClickHouse for analytics

# Core Topics

affaan-m affaan-m
[62.0k]
[7678]
Updated: 3/6/2026

Quality Score

Top 5%
68
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
Cursor IDE Windsurf IDE VS Code IDE
> npx killer-skills add affaan-m/everything-claude-code/clickhouse-io

Agent Capability Analysis

The clickhouse-io MCP Server by affaan-m is an open-source Categories.official integration for Claude and other AI agents, enabling seamless task automation and capability expansion. Optimized for how to use clickhouse-io, clickhouse-io vs PostgreSQL, clickhouse-io setup guide.

Ideal Agent Persona

Essential for Data Engineering Agents specializing in high-performance analytical database operations.

Core Value

Enables agents to design optimized MergeTree table schemas, implement query performance optimizations through partition pruning and projections, and manage large-scale data ingestion via batch inserts and Kafka integration. Provides specialized expertise for migrating analytical workloads from PostgreSQL/MySQL to ClickHouse.

Capabilities Granted for clickhouse-io MCP Server

Optimizing analytical query performance with partition pruning
Designing MergeTree engine table schemas for time-series data
Implementing Kafka integration for high-volume data ingestion
Creating materialized views for pre-aggregated analytics
Migrating PostgreSQL/MySQL analytical workloads to ClickHouse

! Prerequisites & Limits

  • Specialized for ClickHouse database only
  • Requires understanding of analytical query patterns
  • Optimized for batch processing rather than transactional workloads
Project
SKILL.md
10.0 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8
SKILL.md
Readonly

ClickHouse Analytics Patterns

ClickHouse-specific patterns for high-performance analytics and data engineering.

When to Activate

  • Designing ClickHouse table schemas (MergeTree engine selection)
  • Writing analytical queries (aggregations, window functions, joins)
  • Optimizing query performance (partition pruning, projections, materialized views)
  • Ingesting large volumes of data (batch inserts, Kafka integration)
  • Migrating from PostgreSQL/MySQL to ClickHouse for analytics
  • Implementing real-time dashboards or time-series analytics

Overview

ClickHouse is a column-oriented database management system (DBMS) for online analytical processing (OLAP). It's optimized for fast analytical queries on large datasets.

Key Features:

  • Column-oriented storage
  • Data compression
  • Parallel query execution
  • Distributed queries
  • Real-time analytics

Table Design Patterns

MergeTree Engine (Most Common)

sql
1CREATE TABLE markets_analytics ( 2 date Date, 3 market_id String, 4 market_name String, 5 volume UInt64, 6 trades UInt32, 7 unique_traders UInt32, 8 avg_trade_size Float64, 9 created_at DateTime 10) ENGINE = MergeTree() 11PARTITION BY toYYYYMM(date) 12ORDER BY (date, market_id) 13SETTINGS index_granularity = 8192;

ReplacingMergeTree (Deduplication)

sql
1-- For data that may have duplicates (e.g., from multiple sources) 2CREATE TABLE user_events ( 3 event_id String, 4 user_id String, 5 event_type String, 6 timestamp DateTime, 7 properties String 8) ENGINE = ReplacingMergeTree() 9PARTITION BY toYYYYMM(timestamp) 10ORDER BY (user_id, event_id, timestamp) 11PRIMARY KEY (user_id, event_id);

AggregatingMergeTree (Pre-aggregation)

sql
1-- For maintaining aggregated metrics 2CREATE TABLE market_stats_hourly ( 3 hour DateTime, 4 market_id String, 5 total_volume AggregateFunction(sum, UInt64), 6 total_trades AggregateFunction(count, UInt32), 7 unique_users AggregateFunction(uniq, String) 8) ENGINE = AggregatingMergeTree() 9PARTITION BY toYYYYMM(hour) 10ORDER BY (hour, market_id); 11 12-- Query aggregated data 13SELECT 14 hour, 15 market_id, 16 sumMerge(total_volume) AS volume, 17 countMerge(total_trades) AS trades, 18 uniqMerge(unique_users) AS users 19FROM market_stats_hourly 20WHERE hour >= toStartOfHour(now() - INTERVAL 24 HOUR) 21GROUP BY hour, market_id 22ORDER BY hour DESC;

Query Optimization Patterns

Efficient Filtering

sql
1-- ✅ GOOD: Use indexed columns first 2SELECT * 3FROM markets_analytics 4WHERE date >= '2025-01-01' 5 AND market_id = 'market-123' 6 AND volume > 1000 7ORDER BY date DESC 8LIMIT 100; 9 10-- ❌ BAD: Filter on non-indexed columns first 11SELECT * 12FROM markets_analytics 13WHERE volume > 1000 14 AND market_name LIKE '%election%' 15 AND date >= '2025-01-01';

Aggregations

sql
1-- ✅ GOOD: Use ClickHouse-specific aggregation functions 2SELECT 3 toStartOfDay(created_at) AS day, 4 market_id, 5 sum(volume) AS total_volume, 6 count() AS total_trades, 7 uniq(trader_id) AS unique_traders, 8 avg(trade_size) AS avg_size 9FROM trades 10WHERE created_at >= today() - INTERVAL 7 DAY 11GROUP BY day, market_id 12ORDER BY day DESC, total_volume DESC; 13 14-- ✅ Use quantile for percentiles (more efficient than percentile) 15SELECT 16 quantile(0.50)(trade_size) AS median, 17 quantile(0.95)(trade_size) AS p95, 18 quantile(0.99)(trade_size) AS p99 19FROM trades 20WHERE created_at >= now() - INTERVAL 1 HOUR;

Window Functions

sql
1-- Calculate running totals 2SELECT 3 date, 4 market_id, 5 volume, 6 sum(volume) OVER ( 7 PARTITION BY market_id 8 ORDER BY date 9 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 10 ) AS cumulative_volume 11FROM markets_analytics 12WHERE date >= today() - INTERVAL 30 DAY 13ORDER BY market_id, date;

Data Insertion Patterns

Bulk Insert (Recommended)

typescript
1import { ClickHouse } from 'clickhouse' 2 3const clickhouse = new ClickHouse({ 4 url: process.env.CLICKHOUSE_URL, 5 port: 8123, 6 basicAuth: { 7 username: process.env.CLICKHOUSE_USER, 8 password: process.env.CLICKHOUSE_PASSWORD 9 } 10}) 11 12// ✅ Batch insert (efficient) 13async function bulkInsertTrades(trades: Trade[]) { 14 const values = trades.map(trade => `( 15 '${trade.id}', 16 '${trade.market_id}', 17 '${trade.user_id}', 18 ${trade.amount}, 19 '${trade.timestamp.toISOString()}' 20 )`).join(',') 21 22 await clickhouse.query(` 23 INSERT INTO trades (id, market_id, user_id, amount, timestamp) 24 VALUES ${values} 25 `).toPromise() 26} 27 28// ❌ Individual inserts (slow) 29async function insertTrade(trade: Trade) { 30 // Don't do this in a loop! 31 await clickhouse.query(` 32 INSERT INTO trades VALUES ('${trade.id}', ...) 33 `).toPromise() 34}

Streaming Insert

typescript
1// For continuous data ingestion 2import { createWriteStream } from 'fs' 3import { pipeline } from 'stream/promises' 4 5async function streamInserts() { 6 const stream = clickhouse.insert('trades').stream() 7 8 for await (const batch of dataSource) { 9 stream.write(batch) 10 } 11 12 await stream.end() 13}

Materialized Views

Real-time Aggregations

sql
1-- Create materialized view for hourly stats 2CREATE MATERIALIZED VIEW market_stats_hourly_mv 3TO market_stats_hourly 4AS SELECT 5 toStartOfHour(timestamp) AS hour, 6 market_id, 7 sumState(amount) AS total_volume, 8 countState() AS total_trades, 9 uniqState(user_id) AS unique_users 10FROM trades 11GROUP BY hour, market_id; 12 13-- Query the materialized view 14SELECT 15 hour, 16 market_id, 17 sumMerge(total_volume) AS volume, 18 countMerge(total_trades) AS trades, 19 uniqMerge(unique_users) AS users 20FROM market_stats_hourly 21WHERE hour >= now() - INTERVAL 24 HOUR 22GROUP BY hour, market_id;

Performance Monitoring

Query Performance

sql
1-- Check slow queries 2SELECT 3 query_id, 4 user, 5 query, 6 query_duration_ms, 7 read_rows, 8 read_bytes, 9 memory_usage 10FROM system.query_log 11WHERE type = 'QueryFinish' 12 AND query_duration_ms > 1000 13 AND event_time >= now() - INTERVAL 1 HOUR 14ORDER BY query_duration_ms DESC 15LIMIT 10;

Table Statistics

sql
1-- Check table sizes 2SELECT 3 database, 4 table, 5 formatReadableSize(sum(bytes)) AS size, 6 sum(rows) AS rows, 7 max(modification_time) AS latest_modification 8FROM system.parts 9WHERE active 10GROUP BY database, table 11ORDER BY sum(bytes) DESC;

Common Analytics Queries

Time Series Analysis

sql
1-- Daily active users 2SELECT 3 toDate(timestamp) AS date, 4 uniq(user_id) AS daily_active_users 5FROM events 6WHERE timestamp >= today() - INTERVAL 30 DAY 7GROUP BY date 8ORDER BY date; 9 10-- Retention analysis 11SELECT 12 signup_date, 13 countIf(days_since_signup = 0) AS day_0, 14 countIf(days_since_signup = 1) AS day_1, 15 countIf(days_since_signup = 7) AS day_7, 16 countIf(days_since_signup = 30) AS day_30 17FROM ( 18 SELECT 19 user_id, 20 min(toDate(timestamp)) AS signup_date, 21 toDate(timestamp) AS activity_date, 22 dateDiff('day', signup_date, activity_date) AS days_since_signup 23 FROM events 24 GROUP BY user_id, activity_date 25) 26GROUP BY signup_date 27ORDER BY signup_date DESC;

Funnel Analysis

sql
1-- Conversion funnel 2SELECT 3 countIf(step = 'viewed_market') AS viewed, 4 countIf(step = 'clicked_trade') AS clicked, 5 countIf(step = 'completed_trade') AS completed, 6 round(clicked / viewed * 100, 2) AS view_to_click_rate, 7 round(completed / clicked * 100, 2) AS click_to_completion_rate 8FROM ( 9 SELECT 10 user_id, 11 session_id, 12 event_type AS step 13 FROM events 14 WHERE event_date = today() 15) 16GROUP BY session_id;

Cohort Analysis

sql
1-- User cohorts by signup month 2SELECT 3 toStartOfMonth(signup_date) AS cohort, 4 toStartOfMonth(activity_date) AS month, 5 dateDiff('month', cohort, month) AS months_since_signup, 6 count(DISTINCT user_id) AS active_users 7FROM ( 8 SELECT 9 user_id, 10 min(toDate(timestamp)) OVER (PARTITION BY user_id) AS signup_date, 11 toDate(timestamp) AS activity_date 12 FROM events 13) 14GROUP BY cohort, month, months_since_signup 15ORDER BY cohort, months_since_signup;

Data Pipeline Patterns

ETL Pattern

typescript
1// Extract, Transform, Load 2async function etlPipeline() { 3 // 1. Extract from source 4 const rawData = await extractFromPostgres() 5 6 // 2. Transform 7 const transformed = rawData.map(row => ({ 8 date: new Date(row.created_at).toISOString().split('T')[0], 9 market_id: row.market_slug, 10 volume: parseFloat(row.total_volume), 11 trades: parseInt(row.trade_count) 12 })) 13 14 // 3. Load to ClickHouse 15 await bulkInsertToClickHouse(transformed) 16} 17 18// Run periodically 19setInterval(etlPipeline, 60 * 60 * 1000) // Every hour

Change Data Capture (CDC)

typescript
1// Listen to PostgreSQL changes and sync to ClickHouse 2import { Client } from 'pg' 3 4const pgClient = new Client({ connectionString: process.env.DATABASE_URL }) 5 6pgClient.query('LISTEN market_updates') 7 8pgClient.on('notification', async (msg) => { 9 const update = JSON.parse(msg.payload) 10 11 await clickhouse.insert('market_updates', [ 12 { 13 market_id: update.id, 14 event_type: update.operation, // INSERT, UPDATE, DELETE 15 timestamp: new Date(), 16 data: JSON.stringify(update.new_data) 17 } 18 ]) 19})

Best Practices

1. Partitioning Strategy

  • Partition by time (usually month or day)
  • Avoid too many partitions (performance impact)
  • Use DATE type for partition key

2. Ordering Key

  • Put most frequently filtered columns first
  • Consider cardinality (high cardinality first)
  • Order impacts compression

3. Data Types

  • Use smallest appropriate type (UInt32 vs UInt64)
  • Use LowCardinality for repeated strings
  • Use Enum for categorical data

4. Avoid

  • SELECT * (specify columns)
  • FINAL (merge data before query instead)
  • Too many JOINs (denormalize for analytics)
  • Small frequent inserts (batch instead)

5. Monitoring

  • Track query performance
  • Monitor disk usage
  • Check merge operations
  • Review slow query log

Remember: ClickHouse excels at analytical workloads. Design tables for your query patterns, batch inserts, and leverage materialized views for real-time aggregations.

Related Skills

Looking for an alternative to clickhouse-io or building a Categories.official AI Agent? Explore these related open-source MCP Servers.

View All

flags

Logo of facebook
facebook

flags is a feature flag management system that enables developers to check flag states, compare channels, and debug feature behavior differences across release channels.

243.6k
0
Design

extract-errors

Logo of facebook
facebook

extract-errors is a skill that assists in extracting and managing error codes in React applications using yarn extract-errors command.

243.6k
0
Design

fix

Logo of facebook
facebook

fix is a technical skill that resolves lint errors, formatting issues, and ensures code quality in declarative, frontend, and UI projects

243.6k
0
Design

flow

Logo of facebook
facebook

Flow is a type checking system for JavaScript, used to validate React code and ensure consistency across applications

243.6k
0
Design