SMF Works — AI Solutions for Small Business
← Back to all skills
Freepdf-toolkit

PDF Toolkit

A comprehensive PDF manipulation toolkit for everyday document needs. Merge multiple PDFs into one, split large documents into chapters, extract specific pages, compress files for email, and rotate pages to the correct orientation.

Key Features

  • Merge multiple PDFs into one
  • Split PDFs by page range or bookmarks
  • Extract text and images from PDFs
  • Compress PDFs for smaller file sizes
  • Rotate individual pages or entire documents

Common Use Cases

  • Combine scanned documents into one file
  • Extract specific chapters from reports
  • Compress large PDFs for email attachments
  • Fix orientation on scanned documents

Custom Workflow Integration

This skill can be customized for your specific workflow as part of an SMF Works services engagement. Whether you need custom automation rules, integrations with your existing tools, or specialized configurations for your team, we can tailor this skill to fit your exact requirements.

Explore Services

Installation

# Install the skill (via TUI or CLI)

smfw install pdf-toolkit

# Get help

smfw run pdf-toolkit --help

💡 Tip: Install via the OpenClaw TUI skill manager for an interactive experience, or use the CLI command above.

Setup Guide

PDF Toolkit — Setup Guide

Estimated setup time: 5–10 minutes
Difficulty: Easy
Tier: Free — no subscription, no API keys required


What You'll Need

RequirementDetailsCost
Python 3.8+Built into macOS 12+; python3 on LinuxFree
pipPython package manager (comes with Python)Free
PyPDF2Python library for PDF manipulationFree
smfworks-skills repositoryCloned via gitFree
A PDF fileFor testing during setupFree

Step 1 — Verify Python Is Installed

python3 --version

Expected output:

Python 3.11.4

Any version 3.8 or newer works. If Python is missing, download it from python.org or use your system's package manager.


Step 2 — Verify pip Is Available

pip --version

Expected output:

pip 23.1.2 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)

If pip is missing, install it with:

python3 -m ensurepip --upgrade

Step 3 — Install PyPDF2

This is the only external dependency PDF Toolkit needs.

pip install PyPDF2

Expected output:

Collecting PyPDF2
  Downloading PyPDF2-3.0.1-py3-none-any.whl (232 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 232.7/232.7 kB 2.4 MB/s eta 0:00:00
Installing collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1

If you see Successfully installed, PyPDF2 is ready.


Step 4 — Get the Skills Repository

If you haven't already cloned the smfworks-skills repository:

git clone https://github.com/smfworks/smfworks-skills ~/smfworks-skills

If you already have it, update:

cd ~/smfworks-skills && git pull

Step 5 — Navigate to the Skill

cd ~/smfworks-skills/skills/pdf-toolkit

List the files:

ls

You should see:

HOWTO.md   README.md   SETUP.md   main.py

Step 6 — Run the Skill to Verify

python3 main.py

Expected output:

Usage: python main.py <command> [options]

Commands:
  merge <output.pdf> <input1.pdf> <input2.pdf> ...   - Merge PDFs
  split <input.pdf> <output_dir>                    - Split all pages
  extract <input.pdf> <start> <end> <output.pdf>  - Extract page range
  info <input.pdf>                                   - Show PDF info
  compress <input.pdf> <output.pdf>                - Compress PDF
  rotate <input.pdf> <output.pdf> <degrees>        - Rotate PDF

Examples:
  python main.py merge combined.pdf doc1.pdf doc2.pdf
  python main.py split document.pdf ./pages/
  python main.py extract report.pdf 1 5 summary.pdf
  python main.py info contract.pdf

If you see this, setup is complete.


Verify Your Setup

Run a real test with any PDF on your system. If you don't have one handy, create a minimal test using any PDF from your Downloads folder.

python3 main.py info ~/Downloads/any-file.pdf

Expected output (values will vary by file):

📄 PDF Information:
   Title: Sample Document
   Pages: 4
   Author: Unknown
   Subject:
   Size: 123,456 bytes
   Encrypted: False

If you see page count and file size, everything is working correctly.


Configuration Options

PDF Toolkit requires no configuration file or environment variables. All options are passed as command-line arguments at runtime. There is nothing to configure after installation.


Troubleshooting Setup Issues

pip: command not found
Use pip3 instead of pip, or run python3 -m pip install PyPDF2.

PyPDF2 not installed. Run: pip install PyPDF2
The package wasn't installed in the Python environment you're running. If you use virtual environments or conda, activate your environment first, then run pip install PyPDF2.

No such file or directory: main.py
You're not in the skill directory. Run cd ~/smfworks-skills/skills/pdf-toolkit first.

ModuleNotFoundError: No module named 'PyPDF2'
Same as above — pip and python3 may be pointing to different installations. Try python3 -m pip install PyPDF2 to be sure they're linked correctly.

Permission denied when installing PyPDF2
Try pip install --user PyPDF2 to install it into your user directory instead of system-wide.


Next Steps

Setup is complete. Head to HOWTO.md for walkthroughs:

  • How to merge PDFs
  • How to split a PDF into individual pages
  • How to extract specific pages
  • How to check PDF metadata
  • How to automate PDF tasks with cron
cat HOWTO.md

How-To Guide

PDF Toolkit — How-To Guide

Prerequisites: Setup complete (see SETUP.md). PyPDF2 installed.


Table of Contents

  1. How to Merge Multiple PDFs into One
  2. How to Split a PDF into Individual Pages
  3. How to Extract a Range of Pages
  4. How to Check a PDF's Info and Page Count
  5. How to Compress a Large PDF
  6. How to Rotate a Sideways-Scanned Document
  7. Automating with Cron
  8. Combining with Other Skills
  9. Troubleshooting Common Issues
  10. Tips & Best Practices

1. How to Merge Multiple PDFs into One

What this does: Combines two or more PDF files into a single file, in the order you list them.

When to use it: You have quarterly reports, chapters, or sections as separate files and need one consolidated document.

Steps

Step 1 — Navigate to the skill directory.

cd ~/smfworks-skills/skills/pdf-toolkit

Step 2 — List your source PDFs to confirm they exist.

ls ~/Documents/reports/

Example output:

q1-2024.pdf  q2-2024.pdf  q3-2024.pdf  q4-2024.pdf

Step 3 — Run the merge command.
The output file comes first, then the input files in the order you want them merged.

python3 main.py merge ~/Documents/annual-2024.pdf ~/Documents/reports/q1-2024.pdf ~/Documents/reports/q2-2024.pdf ~/Documents/reports/q3-2024.pdf ~/Documents/reports/q4-2024.pdf

Expected output:

✅ Merged 4 PDFs into /home/user/Documents/annual-2024.pdf
   Output size: 1,847,296 bytes

Step 4 — Verify the merged file.

python3 main.py info ~/Documents/annual-2024.pdf

The page count should equal the total pages from all four source files.

Result: One PDF at ~/Documents/annual-2024.pdf containing all four quarterly reports in order.


2. How to Split a PDF into Individual Pages

What this does: Takes a multi-page PDF and creates one separate file per page. A 10-page PDF becomes 10 one-page PDFs.

When to use it: You need to distribute individual pages separately, or you want to selectively share specific pages without manually picking them out.

Steps

Step 1 — Navigate to the skill directory.

cd ~/smfworks-skills/skills/pdf-toolkit

Step 2 — Check how many pages your PDF has.
This helps you know how many files to expect.

python3 main.py info ~/Downloads/presentation.pdf

Output:

📄 PDF Information:
   Title: Q3 Strategy Presentation
   Pages: 8
   Author: Marketing Team
   Subject:
   Size: 2,097,152 bytes
   Encrypted: False

Step 3 — Create a directory for the output files.
The skill can create the directory, but it's good practice to create it first.

mkdir ~/Downloads/presentation-pages

Step 4 — Run the split command.

python3 main.py split ~/Downloads/presentation.pdf ~/Downloads/presentation-pages/

Expected output:

✅ Split 8 pages into 8 files
   Output directory: /home/user/Downloads/presentation-pages/

Step 5 — Verify the output.

ls ~/Downloads/presentation-pages/
presentation_page_1.pdf  presentation_page_3.pdf  presentation_page_5.pdf  presentation_page_7.pdf
presentation_page_2.pdf  presentation_page_4.pdf  presentation_page_6.pdf  presentation_page_8.pdf

Result: 8 individual PDF files, one per page of the original presentation.


3. How to Extract a Range of Pages

What this does: Pulls a contiguous block of pages from a PDF into a new, smaller PDF.

When to use it: You have a large report and need to send only the executive summary (pages 1–4) or a specific section (pages 15–22).

Steps

Step 1 — Navigate to the skill directory.

cd ~/smfworks-skills/skills/pdf-toolkit

Step 2 — Find out how many pages the PDF has and which pages you need.

python3 main.py info ~/Documents/full-report.pdf

Output:

📄 PDF Information:
   Title: Annual Performance Report 2024
   Pages: 48
   Author: Analytics Team
   Subject: Annual Report
   Size: 8,388,608 bytes
   Encrypted: False

Step 3 — Extract the pages you need.
Page numbers are 1-indexed (first page = 1).

python3 main.py extract ~/Documents/full-report.pdf 1 4 ~/Documents/exec-summary.pdf

Expected output:

✅ Extracted 4 pages to /home/user/Documents/exec-summary.pdf

Step 4 — Verify the extracted file.

python3 main.py info ~/Documents/exec-summary.pdf
📄 PDF Information:
   Title: Annual Performance Report 2024
   Pages: 4
   Author: Analytics Team
   Subject: Annual Report
   Size: 524,288 bytes
   Encrypted: False

Result: A 4-page PDF containing only the executive summary, ready to email or share.


4. How to Check a PDF's Info and Page Count

What this does: Displays a PDF's metadata — title, author, page count, file size, and whether it's encrypted — without opening it in a PDF viewer.

When to use it: You received a PDF and want to know what's inside before opening it. Or you're scripting PDF operations and need page counts programmatically.

Steps

Step 1 — Navigate to the skill directory.

cd ~/smfworks-skills/skills/pdf-toolkit

Step 2 — Run the info command.

python3 main.py info ~/Downloads/contract.pdf

Output:

📄 PDF Information:
   Title: Master Service Agreement 2024
   Pages: 22
   Author: Legal Department
   Subject: Service Contract
   Size: 716,800 bytes
   Encrypted: False

Step 3 — Check for encrypted PDFs.
If Encrypted: True, you cannot use this skill to process the file until the password is removed.

Result: You know exactly what's in the file: 22 pages, not encrypted, authored by the Legal Department. You can now confidently run extract, split, or merge operations on it.


5. How to Compress a Large PDF

What this does: Rewrites a PDF, which can reduce file size by removing redundant internal structures.

When to use it: You have a PDF that's too large to email (most email clients limit attachments to 10–25 MB) and want to try reducing its size.

Important note: This compression works on internal PDF structure, not on embedded images. Heavily image-based PDFs may see minimal reduction. Results vary — always check the output size.

Steps

Step 1 — Navigate to the skill directory.

cd ~/smfworks-skills/skills/pdf-toolkit

Step 2 — Check the original file size.

python3 main.py info ~/Documents/large-report.pdf

Output:

📄 PDF Information:
   Title: Technical Documentation
   Pages: 45
   Author: Engineering
   Subject:
   Size: 15,728,640 bytes
   Encrypted: False

Step 3 — Run compress.

python3 main.py compress ~/Documents/large-report.pdf ~/Documents/large-report-small.pdf

Output:

✅ Compressed PDF: 22.3% reduction
   Original: 15,728,640 bytes
   New: 12,218,956 bytes

Step 4 — Check the result meets your needs.
If the reduction wasn't enough, the file may be dominated by embedded images. In that case, consider using Ghostscript (gs) for deeper compression.

Result: A compressed copy of your PDF at a reduced file size, with the original unchanged.


6. How to Rotate a Sideways-Scanned Document

What this does: Rotates every page in a PDF by 90, 180, or 270 degrees, creating a correctly-oriented copy.

When to use it: You scanned documents on a copier that saved them sideways, or received a PDF where all pages are rotated.

Steps

Step 1 — Navigate to the skill directory.

cd ~/smfworks-skills/skills/pdf-toolkit

Step 2 — Determine the rotation needed.

  • If pages are rotated 90° clockwise (tilted to the right), use 270 to correct
  • If pages are rotated 90° counterclockwise (tilted to the left), use 90 to correct
  • If pages are upside down, use 180

Step 3 — Run rotate.

python3 main.py rotate ~/Documents/scan-sideways.pdf ~/Documents/scan-fixed.pdf 270

Output:

✅ Rotated 12 pages by 270°
   Output: /home/user/Documents/scan-fixed.pdf

Step 4 — Open the output to verify orientation.
Use your system's PDF viewer to confirm all pages are now correctly oriented.

Result: A corrected PDF at the output path. Your original scan is unchanged.


7. Automating with Cron

You can schedule PDF Toolkit to run automatically — for example, merging monthly reports into a quarterly file on the first of each month.

Open the cron editor

crontab -e

Cron Expression Reference

ExpressionMeaning
0 9 1 * *First day of each month at 9 AM
0 8 * * 1Every Monday at 8 AM
0 22 * * *Every day at 10 PM
0 6 1 1,4,7,10 *First day of each quarter at 6 AM

Example: Merge all monthly reports on the first of each month

0 9 1 * * cd /home/yourname/smfworks-skills/skills/pdf-toolkit && python3 main.py merge /home/yourname/Reports/monthly-$(date +\%Y-\%m).pdf /home/yourname/Reports/week1.pdf /home/yourname/Reports/week2.pdf /home/yourname/Reports/week3.pdf /home/yourname/Reports/week4.pdf >> /home/yourname/logs/pdf-toolkit.log 2>&1

Example: Compress a new report every night

0 22 * * * cd /home/yourname/smfworks-skills/skills/pdf-toolkit && python3 main.py compress /home/yourname/Reports/daily-report.pdf /home/yourname/Reports/daily-report-compressed.pdf >> /home/yourname/logs/pdf-toolkit.log 2>&1

Create the log directory

mkdir -p ~/logs

Check logs after a run

cat ~/logs/pdf-toolkit.log

8. Combining with Other Skills

PDF Toolkit + File Organizer: Use File Organizer to move all PDFs into one folder, then merge them:

# Step 1: Organize Downloads, moving PDFs to Documents subfolder
python3 ~/smfworks-skills/skills/file-organizer/main.py organize-type ~/Downloads

# Step 2: Merge all PDFs that were just organized
python3 ~/smfworks-skills/skills/pdf-toolkit/main.py merge ~/combined.pdf ~/Downloads/Documents/*.pdf

PDF Toolkit + Report Generator: Generate a report, then immediately compress it for email delivery:

# After generating a report:
python3 ~/smfworks-skills/skills/pdf-toolkit/main.py compress ~/Reports/generated-report.pdf ~/Reports/generated-report-email.pdf

9. Troubleshooting Common Issues

PyPDF2 not installed. Run: pip install PyPDF2

The package is missing from your Python environment.
Fix: pip install PyPDF2 — or if that doesn't work: python3 -m pip install PyPDF2


Invalid page range. PDF has 10 pages (1-10).

You specified a page number outside the document's actual range.
Fix: Run python3 main.py info your-file.pdf first to check the page count, then use valid numbers.


Need at least 2 PDFs to merge

You provided only one input file to the merge command.
Fix: The merge command format is merge output.pdf input1.pdf input2.pdf. Make sure you have at least two input files listed after the output.


File too large: X bytes (max: 104857600)

The input PDF is over 100 MB.
Fix: Split the file with another tool first, or use Ghostscript for large file handling: gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=smaller.pdf input.pdf


Rotation must be one of: [90, 180, 270]

You used a rotation value other than 90, 180, or 270.
Fix: Use 270 for counterclockwise 90°. There is no -90 option.


The output PDF looks identical — compression didn't work

For image-heavy PDFs, PyPDF2-based compression has minimal effect.
Fix: Use Ghostscript for deeper compression: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -sOutputFile=output.pdf input.pdf


10. Tips & Best Practices

Always check page count with info before using extract or split. Running extract with incorrect page numbers will fail — know your document's size first.

For the merge command, the output file is listed FIRST. This is different from most tools. The format is: merge OUTPUT.pdf input1.pdf input2.pdf — not merge input1.pdf input2.pdf OUTPUT.pdf.

Keep your originals. PDF Toolkit never modifies input files, but it's still good practice to keep originals until you've verified the output looks correct.

Compress before emailing, not before archiving. For long-term storage, keep the original quality PDF. Use the compressed version only for transmission. Re-compressing an already-compressed PDF yields diminishing returns and can degrade quality.

For pages 1 through N, extract is easier than split + select. Use extract when you know the range. Use split when you need individual pages and will pick what you need from the results.

Test your cron jobs manually first. Before scheduling an automated merge, run the exact command manually to verify it produces the expected output. Only then add it to crontab.

Use absolute paths in cron. Cron doesn't expand ~. Always use full paths like /home/yourname/ instead of ~/ in crontab entries.