Perfect for Academic Analysis Agents needing advanced PDF processing capabilities. Tools for coding, teaching, and presentations with AI assistance

How do I install split-pdf?

Run the command: npx killer-skills add scunning1975/MixtapeTools/split-pdf. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for split-pdf?

Key use cases include: Automating PDF splitting for efficient analysis, Retrieving academic papers using specific search queries or titles, Generating summaries of large documents by processing split PDF files.

Which IDEs are compatible with split-pdf?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for split-pdf?

Requires Python environment with PyPDF2 library. Needs WebSearch and WebFetch capabilities for paper retrieval. Limited to processing PDF files, not other document formats.

Split-PDF: Download, Split, and Deep-Read Academic Papers

Name: split-pdf
Availability: InStock
Author: scunning1975

CRITICAL RULE: Never read a full PDF. Never. Only read the 4-page split files, and only 3 splits at a time (~12 pages). Reading a full PDF will either crash the session with an unrecoverable "prompt too long" error — destroying all context — or produce shallow, hallucinated output. There are no exceptions.

When This Skill Is Invoked

The user wants you to read, review, or summarize an academic paper. The input is either:

A file path to a local PDF (e.g., ~/Documents/papers/smith_2024.pdf)
A search query or paper title (e.g., "Gentzkow Shapiro Sinkinson 2014 competition newspapers")

Important: You cannot search for a paper you don't know exists. The user MUST provide either a file path or a specific search query — an author name, a title, keywords, a year, or some combination that identifies the paper. If the user invokes this skill without specifying what paper to read, ask them. Do not guess.

Step 1: Acquire the PDF

If a local file path is provided:

Verify the file exists
Use the PDF in place. The working directory is the folder containing the PDF.
Proceed to Step 2

If a search query or paper title is provided:

Use WebSearch to find the paper
Use WebFetch or Bash (curl/wget) to download the PDF
Save it to the current working directory (create the directory if needed)
Proceed to Step 2

CRITICAL: Always preserve the original PDF. The source PDF must NEVER be deleted, moved, or overwritten at any point in this workflow. The split files are derivatives; the original is the permanent artifact. Do not clean up, do not remove, do not tidy. The original stays.

Step 2: Split the PDF

Before splitting, check for an existing extract. Look for <basename>_text.md in the same folder as the PDF.

If found, ask:

"An extract from a previous deep-read exists (<basename>_text.md). Use it for this request, or re-read the PDF from scratch?"

Use extract: read <basename>_text.md and use it as the source notes — skip the rest of Steps 2 and 3 entirely
Re-read: proceed with splitting below

This prevents redundant re-reading of papers you have already processed. The _text.md file is a structured plain-text extraction that is far cheaper to read than re-processing the PDF page images.

If no extract exists, check for existing splits. Determine the build directory:

python
1import os
2folder_path = os.path.dirname(os.path.abspath(pdf_path))
3foldername  = os.path.basename(folder_path)
4pdf_basename = os.path.splitext(os.path.basename(pdf_path))[0]
5build_dir = os.path.join(folder_path, foldername + '_build')
6split_dir = os.path.join(build_dir, 'split_' + pdf_basename)

If split_dir already exists and contains .pdf files, ask:

"Splits already exist for <pdf-basename> (N chunks in <foldername>_build/split_<pdf-basename>/). Reuse existing splits, or re-split from scratch?"

Reuse: skip splitting, proceed to Step 3 using the existing files in split_dir
Re-split: delete the existing split folder, then proceed with splitting below

Create splits in <foldername>_build/split_<pdf-basename>/ and run the splitting script:

python
1from PyPDF2 import PdfReader, PdfWriter
2import os, sys
3
4def split_pdf(input_path, output_dir, pages_per_chunk=4):
5    os.makedirs(output_dir, exist_ok=True)
6    reader = PdfReader(input_path)
7    total = len(reader.pages)
8    prefix = os.path.splitext(os.path.basename(input_path))[0]
9
10    for start in range(0, total, pages_per_chunk):
11        end = min(start + pages_per_chunk, total)
12        writer = PdfWriter()
13        for i in range(start, end):
14            writer.add_page(reader.pages[i])
15
16        out_name = f"{prefix}_pp{start+1}-{end}.pdf"
17        out_path = os.path.join(output_dir, out_name)
18        with open(out_path, "wb") as f:
19            writer.write(f)
20
21    print(f"Split {total} pages into {-(-total // pages_per_chunk)} chunks in {output_dir}")

Directory convention:

articles/                             # any working folder
├── smith_2024.pdf                    # original PDF — NEVER DELETE THIS
├── smith_2024_text.md                # structured extract — created after deep-read
└── articles_build/                   # <foldername>_build/ — shared build folder
    └── split_smith_2024/             # split_<pdf-basename>/
        ├── smith_2024_pp1-4.pdf
        ├── smith_2024_pp5-8.pdf
        ├── smith_2024_pp9-12.pdf
        ├── notes.md                  # working copy — source for _text.md
        └── ...

The build directory convention (<foldername>_build/) keeps split artifacts, compilation intermediates, and other working files separate from the source material and finished outputs. Multiple PDFs in the same folder share one build directory, each with its own split_<basename>/ subdirectory inside it.

The original PDF remains permanently. The splits are working copies. If anything goes wrong, you can always re-split from the original.

If PyPDF2 is not installed, install it: pip install PyPDF2

Step 3: Read in Batches of 3 Splits

Read exactly 3 split files at a time (~12 pages). After each batch:

Read the 3 split PDFs using the Read tool
Update the running notes file (notes.md in the split subdirectory)
Pause and tell the user:

"I have finished reading splits [X-Y] and updated the notes. I have [N] more splits remaining. Would you like me to continue with the next 3?"

Wait for the user to confirm before reading the next batch

Do NOT read ahead. Do NOT read all splits at once. The pause-and-confirm protocol is mandatory.

Step 4: Structured Extraction

As you read, collect information along these dimensions and write them into notes.md:

Research question — What is the paper asking and why does it matter?
Audience — Which sub-community of researchers cares about this?
Method — How do they answer the question? What is the identification strategy?
Data — What data do they use? Where precisely did they find it? What is the unit of observation? Sample size? Time period?
Statistical methods — What econometric or statistical techniques do they use? What are the key specifications?
Findings — What are the main results? Key coefficient estimates and standard errors?
Contributions — What is learned from this exercise that we didn't know before?
Replication feasibility — Is the data publicly available? Is there a replication archive? A data appendix? URLs for the underlying data?

These questions extract what a researcher needs to build on or replicate the work — a structured extraction more detailed and specific than a typical summary.

The Notes File

The working notes file is notes.md in the split subdirectory, updated incrementally after each batch. Structure it with clear headers for each of the 8 dimensions. After each batch, update whichever dimensions have new information — do not rewrite from scratch.

By the time all splits are read, the notes should contain specific data sources, variable names, equation references, sample sizes, coefficient estimates, and standard errors. Not a summary — a structured extraction.

After all batches are complete, write the final notes to <basename>_text.md in the same folder as the source PDF:

articles/smith_2024_text.md

Then notify the user:

"Extract saved to smith_2024_text.md alongside the source PDF. Future requests on this paper can reuse it without re-reading."

This file is the persistent, reusable artifact. The notes.md in the build directory is the working copy. Both are kept — never delete either.

Agent Isolation Protocol

When split-pdf is invoked by another skill or workflow (any process that continues working after the PDF has been read), the PDF reading MUST run inside a subagent to prevent context bloat in the parent conversation.

Why: Each PDF page rendered by the Read tool produces image data in the conversation context. A 35-page PDF (9 chunks) can add 10-20MB of image data that accumulates permanently. After reading one or two large PDFs on top of prior work, the conversation hits the API request size limit and becomes unrecoverable: no subsequent Read calls succeed, and rewinding does not free sufficient space.

Pattern:

The parent skill handles splitting (Step 2's Python script) in its own context; this is lightweight. Then it launches an Agent to perform all the reading:

Read PDF split files and produce structured extraction notes.

Split directory: <split_dir>
Files (read in this order, 3 at a time): <file_list>
Notes output: <notes_path>
Text output: <text_path>

Process:
1. Read 3 PDF files at a time using the Read tool
2. After each batch, update the notes file with extracted content
3. Extract: research question, audience, method, data (sources, sample size, time period),
   statistical methods, findings, contributions, replication feasibility
4. Write the final structured extraction to the text output path

Report when done: pages read, figures/tables found, one-sentence content summary.

After the agent returns, the parent reads the output files (plain markdown, not PDF images) and continues its workflow.

Standalone invocations (user calls /split-pdf directly) use the interactive protocol above with reads in the main conversation and the pause-and-confirm protocol.

When NOT to Split

Papers shorter than ~15 pages: read directly (still use the Read tool, not Bash)
Policy briefs or non-technical documents: a rough summary is fine
Triage only: read just the first split (pages 1-4) for abstract and introduction

Quick Reference

Step	Action
Acquire	Download to the current working directory or use existing local file in place
Check	Look for existing `_text.md` extract or existing splits — offer to reuse
Split	4-page chunks into `<foldername>_build/split_<pdf-basename>/`
Read	3 splits at a time, pause after each batch
Write	Update `notes.md` with structured extraction
Persist	Save final extraction to `<basename>_text.md` alongside the source PDF
Confirm	Ask user before continuing to next batch

Acknowledgments

The in-place PDF handling, persistent _text.md extraction, split reuse, build directory convention, and agent isolation protocol were inspired by improvements identified by Ben Bentzin (Associate Professor of Instruction, McCombs School of Business, University of Texas at Austin), who adapted the original skill for his own workflows and shared his findings (April 2026). His version demonstrated that subagent isolation prevents context bloat when reading multiple large PDFs in a single session — a critical reliability improvement. The implementation here is independently written but the ideas are his.

For detailed explanation of why the batched-reading method works, see methodology.md.

split-pdf — community split-pdf, MixtapeTools, community, ide skills, Claude Code, Cursor, Windsurf

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for split-pdf

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

split-pdf

Split-PDF: Download, Split, and Deep-Read Academic Papers

When This Skill Is Invoked

Step 1: Acquire the PDF

Step 2: Split the PDF

Step 3: Read in Batches of 3 Splits

Step 4: Structured Extraction

The Notes File

Agent Isolation Protocol

When NOT to Split

Quick Reference

Acknowledgments

FAQ & Installation Steps

? Frequently Asked Questions

What is split-pdf?

How do I install split-pdf?

What are the use cases for split-pdf?

Which IDEs are compatible with split-pdf?

Are there any limitations for split-pdf?

↓ How To Install

Related Skills

Looking for an alternative to split-pdf or another community skill for your workflow? Explore these related open-source skills.

openclaw-release-maintainer

widget-generator

flags

pr-review

split-pdf — community split-pdf, MixtapeTools, community, ide skills, Claude Code, Cursor, Windsurf

About this Skill

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for split-pdf

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

split-pdf

Split-PDF: Download, Split, and Deep-Read Academic Papers

When This Skill Is Invoked

Step 1: Acquire the PDF

Step 2: Split the PDF

Step 3: Read in Batches of 3 Splits

Step 4: Structured Extraction

The Notes File

Agent Isolation Protocol

When NOT to Split

Quick Reference

Acknowledgments

FAQ & Installation Steps

? Frequently Asked Questions

What is split-pdf?

How do I install split-pdf?

What are the use cases for split-pdf?

Which IDEs are compatible with split-pdf?

Are there any limitations for split-pdf?

↓ How To Install

Related Skills

Looking for an alternative to split-pdf or another community skill for your workflow? Explore these related open-source skills.

openclaw-release-maintainer

widget-generator

flags

pr-review