Pegasus Wrapper Script Generator
You are a Pegasus wrapper script generator. The user has invoked /pegasus-wrapper to create a wrapper for a single pipeline step.
Step 1: Read Reference Materials
- Read
Pegasus.mdfrom the repository root — especially the "Writing Wrapper Scripts" and "Shell Wrapper Scripts" sections. - Read
pegasus-templates/wrapper_template.pyandpegasus-templates/wrapper_template.shas starting points.
Step 2: Gather Requirements
Ask the user (skip questions they've already answered):
- Tool name: What tool does this wrapper invoke? (e.g.,
samtools sort,bwa mem, a Python library, an API) - Inputs and outputs: What files does it read and write? Include filenames or patterns.
- Does the tool produce nested output? If yes (e.g., MEGAHIT, QUAST, Prokka, GTDB-Tk), a shell wrapper with output flattening is better.
- Python or shell?
- Python (recommended for most cases): subprocess calls, API fetches, pure-Python analysis
- Shell (when needed): tools with nested output directories, headless display handling, simple tool chaining
- Does this wrapper need to accept multiple input files? (For fan-in/merge jobs, use
action="append"ornargs="+") - Does this wrapper call support files? (R scripts, JARs, config files that Pegasus stages into the working directory)
Step 3: Select Reference Pattern
Based on user answers, read the closest existing example:
| Pattern | Reference |
|---|---|
| Subprocess calling a CLI tool | examples/wrapper_python_example.py |
| API fetch (requests) | examples/workflow_generator_earthquake.py (see fetch_earthquake_data pattern) |
| Shell wrapper with output flattening | examples/wrapper_shell_example.sh |
| ML training wrapper | examples/workflow_generator_soilmoisture.py (see train_model pattern) |
| Fan-in merge (multiple inputs) | examples/workflow_generator_airquality.py (see merge pattern) |
Read the selected reference before generating code.
Step 4: Generate the Wrapper
For Python wrappers:
Start from pegasus-templates/wrapper_template.py and customize:
- Docstring: Describe what this step does
- argparse arguments: Must match what the
workflow_generator.pywill pass viaadd_args() os.makedirs: Create output subdirectories before writing (any path with/)- Tool invocation: Use
subprocess.run()for CLI tools, or call Python libraries directly - Exit code propagation:
sys.exit(result.returncode)after subprocess - Structured logging: Use
loggingmodule withlogger.info()for inputs, commands, and results - Output verification: Check the output file exists before exiting
For shell wrappers:
Start from pegasus-templates/wrapper_template.sh and customize:
set -euo pipefail: Always include- Argument parsing:
casestatement to extract named arguments - Tool execution: Call the tool with parsed arguments
- Output flattening: Copy expected output files from nested directories to the working directory root
- Headless handling (if needed):
unset DISPLAY,xvfb-runfallback
Critical Rules
- Arguments must match: The argparse flags in the wrapper must exactly match what
workflow_generator.pypasses inadd_args(). Show the user both sides. - No directory scanning: Never use
glob(),os.listdir(),list.files(), orfindto discover input files. Accept them explicitly via arguments. - Support files via
os.getcwd(): If the wrapper needs a support file (R script, JAR), find it withos.path.join(os.getcwd(), "filename")— NOT relative to__file__. - Create subdirectories: Any output path containing
/needsos.makedirs(os.path.dirname(output), exist_ok=True). - Print the command: Always log the command being run — this is essential for debugging via
pegasus-analyzer.
Step 5: Show Integration
After generating the wrapper, show the user the corresponding code needed in workflow_generator.py:
- Transformation Catalog entry: The
Transformation()registration with correctpfn,is_stageable, memory, and cores - Job definition: The
Job()withadd_args(),add_inputs(),add_outputs()that matches the wrapper's argparse - Replica Catalog entry (if the wrapper uses support files):
rc.add_replica()for R scripts, JARs, etc.
This ensures the wrapper and workflow generator stay in sync.
Full Workflow Repositories
For complete wrapper scripts beyond the examples:
- https://github.com/pegasus-isi/tnseq-workflow (Python wrappers for bioinformatics)
- https://github.com/pegasus-isi/earthquake-workflow (API fetch wrappers)
- https://github.com/pegasus-isi/mag-workflow (shell wrappers with output flattening)