Codebase Onboarding
Systematically analyze an unfamiliar codebase and produce a structured onboarding guide. Designed for developers joining a new project or setting up Claude Code in an existing repo for the first time.
When to Use
- First time opening a project with Claude Code
- Joining a new team or repository
- User asks "help me understand this codebase"
- User asks to generate a CLAUDE.md for a project
- User says "onboard me" or "walk me through this repo"
How It Works
Phase 1: Reconnaissance
Gather raw signals about the project without reading every file. Run these checks in parallel:
1. Package manifest detection
→ package.json, go.mod, Cargo.toml, pyproject.toml, pom.xml, build.gradle,
Gemfile, composer.json, mix.exs, pubspec.yaml
2. Framework fingerprinting
→ next.config.*, nuxt.config.*, angular.json, vite.config.*,
django settings, flask app factory, fastapi main, rails config
3. Entry point identification
→ main.*, index.*, app.*, server.*, cmd/, src/main/
4. Directory structure snapshot
→ Top 2 levels of the directory tree, ignoring node_modules, vendor,
.git, dist, build, __pycache__, .next
5. Config and tooling detection
→ .eslintrc*, .prettierrc*, tsconfig.json, Makefile, Dockerfile,
docker-compose*, .github/workflows/, .env.example, CI configs
6. Test structure detection
→ tests/, test/, __tests__/, *_test.go, *.spec.ts, *.test.js,
pytest.ini, jest.config.*, vitest.config.*
Phase 2: Architecture Mapping
From the reconnaissance data, identify:
Tech Stack
- Language(s) and version constraints
- Framework(s) and major libraries
- Database(s) and ORMs
- Build tools and bundlers
- CI/CD platform
Architecture Pattern
- Monolith, monorepo, microservices, or serverless
- Frontend/backend split or full-stack
- API style: REST, GraphQL, gRPC, tRPC
Key Directories Map the top-level directories to their purpose:
<!-- Example for a React project — replace with detected directories -->src/components/ → React UI components
src/api/ → API route handlers
src/lib/ → Shared utilities
src/db/ → Database models and migrations
tests/ → Test suites
scripts/ → Build and deployment scripts
Data Flow Trace one request from entry to response:
- Where does a request enter? (router, handler, controller)
- How is it validated? (middleware, schemas, guards)
- Where is business logic? (services, models, use cases)
- How does it reach the database? (ORM, raw queries, repositories)
Phase 3: Convention Detection
Identify patterns the codebase already follows:
Naming Conventions
- File naming: kebab-case, camelCase, PascalCase, snake_case
- Component/class naming patterns
- Test file naming:
*.test.ts,*.spec.ts,*_test.go
Code Patterns
- Error handling style: try/catch, Result types, error codes
- Dependency injection or direct imports
- State management approach
- Async patterns: callbacks, promises, async/await, channels
Git Conventions
- Branch naming from recent branches
- Commit message style from recent commits
- PR workflow (squash, merge, rebase)
- If the repo has no commits yet or only a shallow history (e.g.
git clone --depth 1), skip this section and note "Git history unavailable or too shallow to detect conventions"
Phase 4: Generate Onboarding Artifacts
Produce two outputs:
Output 1: Onboarding Guide
markdown1# Onboarding Guide: [Project Name] 2 3## Overview 4[2-3 sentences: what this project does and who it serves] 5 6## Tech Stack 7<!-- Example for a Next.js project — replace with detected stack --> 8| Layer | Technology | Version | 9|-------|-----------|---------| 10| Language | TypeScript | 5.x | 11| Framework | Next.js | 14.x | 12| Database | PostgreSQL | 16 | 13| ORM | Prisma | 5.x | 14| Testing | Jest + Playwright | - | 15 16## Architecture 17[Diagram or description of how components connect] 18 19## Key Entry Points 20<!-- Example for a Next.js project — replace with detected paths --> 21- **API routes**: `src/app/api/` — Next.js route handlers 22- **UI pages**: `src/app/(dashboard)/` — authenticated pages 23- **Database**: `prisma/schema.prisma` — data model source of truth 24- **Config**: `next.config.ts` — build and runtime config 25 26## Directory Map 27[Top-level directory → purpose mapping] 28 29## Request Lifecycle 30[Trace one API request from entry to response] 31 32## Conventions 33- [File naming pattern] 34- [Error handling approach] 35- [Testing patterns] 36- [Git workflow] 37 38## Common Tasks 39<!-- Example for a Node.js project — replace with detected commands --> 40- **Run dev server**: `npm run dev` 41- **Run tests**: `npm test` 42- **Run linter**: `npm run lint` 43- **Database migrations**: `npx prisma migrate dev` 44- **Build for production**: `npm run build` 45 46## Where to Look 47<!-- Example for a Next.js project — replace with detected paths --> 48| I want to... | Look at... | 49|--------------|-----------| 50| Add an API endpoint | `src/app/api/` | 51| Add a UI page | `src/app/(dashboard)/` | 52| Add a database table | `prisma/schema.prisma` | 53| Add a test | `tests/` matching the source path | 54| Change build config | `next.config.ts` |
Output 2: Starter CLAUDE.md
Generate or update a project-specific CLAUDE.md based on detected conventions. If CLAUDE.md already exists, read it first and enhance it — preserve existing project-specific instructions and clearly call out what was added or changed.
markdown1# Project Instructions 2 3## Tech Stack 4[Detected stack summary] 5 6## Code Style 7- [Detected naming conventions] 8- [Detected patterns to follow] 9 10## Testing 11- Run tests: `[detected test command]` 12- Test pattern: [detected test file convention] 13- Coverage: [if configured, the coverage command] 14 15## Build & Run 16- Dev: `[detected dev command]` 17- Build: `[detected build command]` 18- Lint: `[detected lint command]` 19 20## Project Structure 21[Key directory → purpose map] 22 23## Conventions 24- [Commit style if detectable] 25- [PR workflow if detectable] 26- [Error handling patterns]
Best Practices
- Don't read everything — reconnaissance should use Glob and Grep, not Read on every file. Read selectively only for ambiguous signals.
- Verify, don't guess — if a framework is detected from config but the actual code uses something different, trust the code.
- Respect existing CLAUDE.md — if one already exists, enhance it rather than replacing it. Call out what's new vs existing.
- Stay concise — the onboarding guide should be scannable in 2 minutes. Details belong in the code, not the guide.
- Flag unknowns — if a convention can't be confidently detected, say so rather than guessing. "Could not determine test runner" is better than a wrong answer.
Anti-Patterns to Avoid
- Generating a CLAUDE.md that's longer than 100 lines — keep it focused
- Listing every dependency — highlight only the ones that shape how you write code
- Describing obvious directory names —
src/doesn't need an explanation - Copying the README — the onboarding guide adds structural insight the README lacks
Examples
Example 1: First time in a new repo
User: "Onboard me to this codebase"
Action: Run full 4-phase workflow → produce Onboarding Guide + Starter CLAUDE.md
Output: Onboarding Guide printed directly to the conversation, plus a CLAUDE.md written to the project root
Example 2: Generate CLAUDE.md for existing project
User: "Generate a CLAUDE.md for this project"
Action: Run Phases 1-3, skip Onboarding Guide, produce only CLAUDE.md
Output: Project-specific CLAUDE.md with detected conventions
Example 3: Enhance existing CLAUDE.md
User: "Update the CLAUDE.md with current project conventions"
Action: Read existing CLAUDE.md, run Phases 1-3, merge new findings
Output: Updated CLAUDE.md with additions clearly marked