Quick Start Guide¶
Get started with PyForge CLI in just 5 minutes! This guide will walk you through your first file conversion.
Step 1: Install PyForge CLI¶
If you haven't already installed PyForge CLI:
Verify the installation:
Step 2: Get Sample Files¶
The easiest way to get started is with our curated sample datasets:
# Install all sample datasets
pyforge install sample-datasets
# Or install to a specific directory
pyforge install sample-datasets ./test-data
# List available releases
pyforge install sample-datasets --list-releases
# Install specific formats only
pyforge install sample-datasets --formats pdf,excel
This gives you 23 curated datasets across all supported formats!
Create a simple text file and save it as sample.txt
:
You can use any Excel file you have, or create one with: - Sheet1: Some data with headers - Sheet2: More data
PyForge CLI works with: - PDF files (.pdf) - Excel files (.xlsx) - Access databases (.mdb, .accdb) - DBF files (.dbf) - XML files (.xml) - CSV files (.csv) - MDF files (.mdf)
Step 3: Your First Conversion¶
Let's start with the most common operations:
Convert PDF to Text¶
# Using sample datasets
pyforge convert sample-datasets/pdf/small/NIST-CSWP-04162018.pdf
# Convert entire PDF to text
pyforge convert document.pdf
# Convert with specific pages
pyforge convert document.pdf --pages "1-5"
# Convert with metadata
pyforge convert document.pdf --metadata
Example Output:
Converting document.pdf...
✓ Extracted text from 5 pages
✓ Saved to document.txt
📊 Conversion completed in 1.2 seconds
Convert Excel to Parquet¶
# Using sample datasets
pyforge convert sample-datasets/excel/small/financial-sample.xlsx
# Convert all sheets
pyforge convert spreadsheet.xlsx
# Convert specific sheets
pyforge convert spreadsheet.xlsx --sheets "Sheet1,Data"
# Interactive sheet selection
pyforge convert spreadsheet.xlsx --interactive
Example Output:
Converting spreadsheet.xlsx...
📋 Found 3 sheets: Sheet1, Sheet2, Summary
✓ Converted Sheet1 (1,250 rows)
✓ Converted Sheet2 (890 rows)
✓ Converted Summary (45 rows)
📊 Total: 2,185 rows converted
📁 Saved to spreadsheet_combined.parquet
Convert Database Files¶
# Using sample datasets
pyforge convert sample-datasets/access/small/Northwind_2007_VBNet.accdb
pyforge convert sample-datasets/dbf/small/census-tiger-sample.dbf
# Convert Access database
pyforge convert database.mdb
# Convert DBF file
pyforge convert data.dbf
Step 4: Explore Options¶
Get File Information¶
Before converting, check what's in your file:
# Show file metadata
pyforge info document.pdf
# Excel file details
pyforge info spreadsheet.xlsx
# Database file info
pyforge info database.mdb
Example Output:
📄 File: spreadsheet.xlsx
📊 Type: Excel Workbook
📏 Size: 2.4 MB
📋 Sheets: 3
┌─────────┬──────┬─────────┬────────────────┐
│ Sheet │ Rows │ Columns │ Sample Columns │
├─────────┼──────┼─────────┼────────────────┤
│ Sheet1 │ 1250 │ 8 │ ID, Name, Date │
│ Sheet2 │ 890 │ 12 │ Product, Price │
│ Summary │ 45 │ 5 │ Total, Count │
└─────────┴──────┴─────────┴────────────────┘
List Supported Formats¶
Validate Files¶
Check if a file can be processed:
Step 5: Common Options¶
Here are the most useful options for each converter:
PDF Options¶
# Page ranges
pyforge convert doc.pdf --pages "1-10" # Pages 1 to 10
pyforge convert doc.pdf --pages "5-" # Page 5 to end
pyforge convert doc.pdf --pages "-10" # First 10 pages
# Include metadata
pyforge convert doc.pdf --metadata
# Custom output
pyforge convert doc.pdf output.txt
Excel Options¶
# Sheet selection
pyforge convert file.xlsx --sheets "Sheet1,Sheet3"
# Combine sheets with matching columns
pyforge convert file.xlsx --combine
# Keep sheets separate
pyforge convert file.xlsx --separate
# Compression
pyforge convert file.xlsx --compression gzip
Database Options¶
# With encoding (for DBF files)
pyforge convert data.dbf --encoding cp1252
# Specific tables (for MDB files)
pyforge convert db.mdb --tables "customers,orders"
# Custom output directory
pyforge convert database.mdb output_folder/
MDF Files (Requires Tools)¶
For SQL Server MDF files, you first need to install the required tools:
# Step 1: Install MDF processing tools (one-time setup)
pyforge install mdf-tools
# Step 2: Verify installation
pyforge mdf-tools status
# Step 3: Convert MDF files (coming soon)
# pyforge convert database.mdf --format parquet
# Manage SQL Server container
pyforge mdf-tools start # Start when needed
pyforge mdf-tools stop # Stop when finished
pyforge mdf-tools test # Test connectivity
Step 6: Check Your Output¶
After conversion, you'll find your files in the same directory:
# List files
ls -la
# Check Parquet file (if you have pandas installed)
python -c "import pandas as pd; print(pd.read_parquet('output.parquet').head())"
Common Workflows¶
Batch Processing¶
Convert multiple files at once:
# Convert all PDFs in a directory
for file in *.pdf; do
pyforge convert "$file"
done
# Convert all Excel files
for file in *.xlsx; do
pyforge convert "$file" --combine
done
With Progress and Verbose Output¶
# Verbose mode for detailed output
pyforge convert large_file.xlsx --verbose
# Force overwrite existing files
pyforge convert file.pdf --force
What's Next?¶
Now that you've completed your first conversion:
- Explore Converters - Learn about each format in detail
- CLI Reference - Complete command documentation
- Tutorials - Real-world examples and workflows
- Troubleshooting - Solutions to common issues
Quick Reference Card¶
Task | Command |
---|---|
Install Datasets | pyforge install sample-datasets |
Convert PDF | pyforge convert document.pdf |
Convert Excel | pyforge convert spreadsheet.xlsx |
Convert Database | pyforge convert database.mdb |
Get File Info | pyforge info filename |
Show Help | pyforge --help |
List Formats | pyforge formats |
Validate File | pyforge validate filename |
Need Help?¶
Congratulations! You've successfully completed your first file conversion with PyForge CLI. 🎉