CLI Command Reference¶

Complete reference for all PyForge CLI commands, options, and usage patterns.

Version 1.0.9 - Enhanced with Databricks Serverless support and Unity Catalog volume handling.

Main Commands¶

`pyforge install`¶

Install prerequisites and sample datasets for PyForge CLI.

pyforge install <component> [options]

Available Components¶

sample-datasetsmdf-tools

Install curated test datasets for all supported formats.

pyforge install sample-datasets [target_directory] [options]

Examples:

# Install all datasets to default location
pyforge install sample-datasets

# Install to custom directory
pyforge install sample-datasets ./test-data

# Install to Unity Catalog volume
pyforge install sample-datasets /Volumes/catalog/schema/volume/datasets

# Install specific formats only
pyforge install sample-datasets --formats pdf,excel,xml

# Install small datasets only
pyforge install sample-datasets --sizes small

# List available releases
pyforge install sample-datasets --list-releases

# Show installed datasets
pyforge install sample-datasets --list-installed

# Uninstall datasets
pyforge install sample-datasets --uninstall --force

Options:

Option	Type	Description
`--version <version>`	string	Specific release version (e.g., v1.0.0)
`--formats <list>`	string	Comma-separated format list (pdf,excel,xml,access,dbf,mdf,csv)
`--sizes <list>`	string	Size categories (small,medium,large)
`--list-releases`	flag	List all available dataset releases
`--list-installed`	flag	Show currently installed datasets
`--force`	flag	Force overwrite existing datasets
`--uninstall`	flag	Remove installed datasets

Install Docker Desktop and SQL Server Express for MDF file processing.

pyforge install mdf-tools [options]

Examples:

# Interactive installation
pyforge install mdf-tools

# Custom SQL Server password
pyforge install mdf-tools --password "MySecure123!"

# Custom port
pyforge install mdf-tools --port 1433

`pyforge convert`¶

Convert files between different formats with enhanced support for Databricks Serverless environments and Unity Catalog volumes.

pyforge convert <input_file> [output_file] [options]

Examples¶

# Basic conversion
pyforge convert document.pdf

# Using sample datasets
pyforge convert sample-datasets/pdf/small/NIST-CSWP-04162018.pdf
pyforge convert sample-datasets/excel/small/financial-sample.xlsx

# With custom output
pyforge convert document.pdf extracted_text.txt

# PDF with page range
pyforge convert report.pdf --pages "1-10"

# Excel with specific sheets
pyforge convert data.xlsx --sheets "Sheet1,Summary"

# XML with intelligent flattening
pyforge convert api_response.xml --flatten-strategy aggressive

# Database conversion
pyforge convert database.mdb output_directory/

# Database with specific backend
pyforge convert database.mdb --backend subprocess --verbose

# Unity Catalog volume files
pyforge convert /Volumes/catalog/schema/volume/data.mdb --volume-path

# CSV with auto-detection
pyforge convert data.csv --compression gzip

Options¶

Option	Type	Description	Applies To
`--pages <range>`	string	Page range to convert (e.g., "1-10")	PDF
`--metadata`	flag	Include file metadata in output	PDF
`--sheets <names>`	string	Comma-separated sheet names	Excel
`--combine`	flag	Combine sheets into single output	Excel
`--separate`	flag	Keep sheets as separate files	Excel
`--interactive`	flag	Interactive sheet selection	Excel
`--compression <type>`	string	Compression type (gzip, snappy, lz4)	Parquet outputs
`--encoding <encoding>`	string	Character encoding (e.g., cp1252)	DBF
`--tables <names>`	string	Comma-separated table names	MDB/ACCDB
`--password <password>`	string	Database password	MDB/ACCDB
`--flatten-strategy <strategy>`	string	XML flattening: conservative, moderate, aggressive	XML
`--array-handling <mode>`	string	XML array handling: expand, concatenate, json_string	XML
`--namespace-handling <mode>`	string	XML namespace handling: preserve, strip, prefix	XML
`--preview-schema`	flag	Preview XML structure before conversion	XML
`--backend <backend>`	string	Backend to use: subprocess, shell (for MDB/ACCDB)	MDB/ACCDB
`--volume-path`	flag	Enable Unity Catalog volume path handling	All
`--force`	flag	Overwrite existing output files	All
`--verbose`	flag	Enable detailed output	All

`pyforge info`¶

Display detailed information about a file.

pyforge info <input_file> [options]

Examples¶

# Basic file information
pyforge info document.pdf

# Detailed information
pyforge info spreadsheet.xlsx --verbose

# JSON output format
pyforge info database.mdb --format json

Options¶

Option	Type	Description
`--format <type>`	string	Output format: table, json, yaml
`--verbose`	flag	Show detailed information

`pyforge validate`¶

Validate if a file can be processed by PyForge CLI.

pyforge validate <input_file> [options]

Examples¶

# Validate PDF file
pyforge validate document.pdf

# Validate with detailed output
pyforge validate spreadsheet.xlsx --verbose

# Batch validate files
for file in *.xlsx; do pyforge validate "$file"; done

Options¶

Option	Type	Description
`--verbose`	flag	Show detailed validation information

`pyforge formats`¶

List all supported input and output formats.

pyforge formats [options]

Examples¶

# List all formats
pyforge formats

# Show format details
pyforge formats --verbose

# Filter by input format
pyforge formats --input pdf

Options¶

Option	Type	Description
`--input <format>`	string	Filter by input format
`--output <format>`	string	Filter by output format
`--verbose`	flag	Show detailed format information

Global Options¶

These options work with all commands:

Option	Description	Example
`--help, -h`	Show help message	`pyforge --help`
`--version`	Show version information	`pyforge --version`
`--verbose, -v`	Enable verbose output	`pyforge convert file.pdf --verbose`

Environment Variables¶

PyForge CLI recognizes these environment variables:

Variable	Description	Default
`PYFORGE_OUTPUT_DIR`	Default output directory	Current directory
`PYFORGE_TEMP_DIR`	Temporary file directory	System temp
`PYFORGE_MAX_MEMORY`	Maximum memory usage (MB)	Auto-detect
`PYFORGE_COMPRESSION`	Default compression for Parquet	snappy
`IS_SERVERLESS`	Databricks Serverless environment flag	false
`SPARK_CONNECT_MODE_ENABLED`	Databricks Spark Connect mode	false
`DATABRICKS_RUNTIME_VERSION`	Databricks runtime version	none

Exit Codes¶

Code	Meaning	Description
0	Success	Operation completed successfully
1	General Error	Unknown or general error
2	File Not Found	Input file does not exist
3	Permission Error	Cannot read input or write output
4	Format Error	Unsupported or corrupted file format
5	Validation Error	File failed validation
6	Memory Error	Insufficient memory for operation

Configuration File¶

PyForge CLI can use a configuration file for default settings:

Location: ~/.pyforge/config.yaml

# Default settings
defaults:
  compression: gzip
  output_dir: ~/conversions
  verbose: false

# PDF-specific settings
pdf:
  include_metadata: true

# Excel-specific settings
excel:
  combine_sheets: false
  compression: snappy

# Database settings
database:
  encoding: utf-8

# XML-specific settings
xml:
  flatten_strategy: conservative
  array_handling: expand
  namespace_handling: preserve
  preview_schema: false

Advanced Usage Patterns¶

Databricks Serverless Usage¶

# Unity Catalog volume file conversion
pyforge convert /Volumes/catalog/schema/volume/data.mdb --volume-path --verbose

# Large database with subprocess backend
pyforge convert /Volumes/catalog/schema/volume/large.accdb \
  --backend subprocess --volume-path --tables "customers,orders"

# Batch processing in Databricks notebooks
%sh
for file in /Volumes/catalog/schema/volume/*.mdb; do
    pyforge convert "$file" --backend subprocess --volume-path --force
done

# Using dbutils for file operations
dbutils.fs.ls("/Volumes/catalog/schema/volume/")
pyforge convert /Volumes/catalog/schema/volume/data.mdb --volume-path
dbutils.fs.ls("/Volumes/catalog/schema/volume/output/")

# Install and use in Databricks Serverless
%pip install /Volumes/catalog/schema/pkgs/pyforge_cli-1.0.9-py3-none-any.whl \
  --no-cache-dir --quiet --index-url https://pypi.org/simple/ --trusted-host pypi.org
dbutils.library.restartPython()

# Validate installation
pyforge --version
pyforge convert /Volumes/catalog/schema/volume/test.mdb --backend subprocess --verbose

Batch Processing¶

# Process all PDFs in directory
find . -name "*.pdf" -exec pyforge convert {} \;

# Convert with consistent naming
for file in *.xlsx; do
    pyforge convert "$file" "${file%.xlsx}.parquet"
done

# Parallel processing
ls *.pdf | xargs -P 4 -I {} pyforge convert {}

# Batch convert XML files with consistent strategy
for file in *.xml; do
    pyforge convert "$file" "${file%.xml}.parquet" --flatten-strategy moderate
done

# Process XML files with different strategies based on size
find . -name "*.xml" -size +10M -exec pyforge convert {} --flatten-strategy conservative \;
find . -name "*.xml" -size -10M -exec pyforge convert {} --flatten-strategy aggressive \;

Pipeline Integration¶

# Use in shell pipeline
pyforge info *.xlsx | grep "Sheets:" | wc -l

# With other tools
find /data -name "*.mdb" | while read file; do
    pyforge convert "$file" && echo "Converted: $file"
done

# Databricks Unity Catalog volume processing
find /Volumes/catalog/schema/volume -name "*.mdb" | while read file; do
    pyforge convert "$file" --volume-path --backend subprocess
done

# Databricks notebook integration
%sh
# Check available MDB files
dbutils.fs.ls("/Volumes/catalog/schema/volume/")

# Convert all MDB files with proper backend
for file in /Volumes/catalog/schema/volume/*.mdb; do
    pyforge convert "$file" --backend subprocess --volume-path --verbose
done

# Verify outputs
dbutils.fs.ls("/Volumes/catalog/schema/volume/output/")

Error Handling¶

# Check exit code
if pyforge convert file.pdf; then
    echo "Conversion successful"
else
    echo "Conversion failed with code $?"
fi

# Conditional processing
pyforge validate file.xlsx && pyforge convert file.xlsx

Format-Specific Examples¶

PDF Processing¶

# Extract specific pages
pyforge convert manual.pdf chapter1.txt --pages "1-25"

# Include metadata and page markers
pyforge convert report.pdf --metadata --pages "1-10"

# Process multiple page ranges
pyforge convert book.pdf intro.txt --pages "1-5"
pyforge convert book.pdf content.txt --pages "6-200"
pyforge convert book.pdf appendix.txt --pages "201-"

Excel Processing¶

# Interactive sheet selection
pyforge convert workbook.xlsx --interactive

# Specific sheets with compression
pyforge convert data.xlsx --sheets "Data,Summary" --compression gzip

# Combine all sheets
pyforge convert financial.xlsx combined.parquet --combine

# Separate files for each sheet
pyforge convert report.xlsx --separate

Database Processing¶

# Convert with password
pyforge convert secure.mdb --password "secret123"

# Specific tables only
pyforge convert database.mdb --tables "customers,orders,products"

# Custom output directory
pyforge convert large.accdb /output/database/

# Databricks Serverless with subprocess backend
pyforge convert database.mdb --backend subprocess

# Unity Catalog volume path
pyforge convert /Volumes/catalog/schema/volume/data.mdb --volume-path

# Combined Databricks options
pyforge convert /Volumes/catalog/schema/volume/secure.mdb \
  --backend subprocess --volume-path --password "secret123"

XML Processing¶

# Conservative flattening (default)
pyforge convert api_response.xml --flatten-strategy conservative

# Aggressive flattening for analytics
pyforge convert catalog.xml --flatten-strategy aggressive

# Handle arrays as concatenated strings
pyforge convert orders.xml --array-handling concatenate

# Strip namespaces for cleaner columns
pyforge convert soap_response.xml --namespace-handling strip

# Preview structure before conversion
pyforge convert complex.xml --preview-schema

# Convert compressed XML files
pyforge convert data.xml.gz --verbose

# Combined options for data analysis
pyforge convert api_data.xml analysis.parquet \
  --flatten-strategy aggressive \
  --array-handling expand \
  --namespace-handling strip \
  --compression gzip

DBF Processing¶

# With specific encoding
pyforge convert legacy.dbf --encoding cp1252

# Force processing corrupted files
pyforge convert damaged.dbf --force

# Verbose output for debugging
pyforge convert complex.dbf --verbose

Troubleshooting Commands¶

Debug Information¶

# System information
pyforge --version
python --version
pip show pyforge-cli

# File analysis
pyforge info problematic_file.pdf --verbose
pyforge validate problematic_file.pdf --verbose

# Test with minimal options
pyforge convert test_file.pdf --verbose

# Databricks environment debugging
echo "IS_SERVERLESS: $IS_SERVERLESS"
echo "SPARK_CONNECT_MODE_ENABLED: $SPARK_CONNECT_MODE_ENABLED"
echo "DATABRICKS_RUNTIME_VERSION: $DATABRICKS_RUNTIME_VERSION"

# Test MDB backend availability
pyforge convert test.mdb --backend subprocess --verbose

Common Issues¶

# Permission problems
sudo chown $USER output_directory/
chmod 755 output_directory/

# Memory issues
PYFORGE_MAX_MEMORY=1024 pyforge convert large_file.xlsx

# Encoding problems
pyforge convert file.dbf --encoding utf-8 --verbose

# Databricks volume path issues
pyforge convert /Volumes/catalog/schema/volume/file.mdb --volume-path --verbose

# MDB backend selection
pyforge convert database.mdb --backend subprocess --verbose

# Unity Catalog volume permissions
dbutils.fs.ls("/Volumes/catalog/schema/volume/")

Performance Monitoring¶

Timing Commands¶

# Time conversion
time pyforge convert large_file.xlsx

# Monitor memory usage
/usr/bin/time -v pyforge convert file.mdb

# Progress tracking
pyforge convert large_file.pdf --verbose

Optimization¶

# Use compression for large outputs
pyforge convert file.xlsx --compression gzip

# Process in chunks
pyforge convert large.pdf chunk1.txt --pages "1-100"
pyforge convert large.pdf chunk2.txt --pages "101-200"

# Parallel processing
ls *.dbf | xargs -P $(nproc) -I {} pyforge convert {}

Integration Examples¶

Makefile Integration¶

%.txt: %.pdf
    pyforge convert $< $@

%.parquet: %.xlsx
    pyforge convert $< $@ --combine

all-pdfs: $(patsubst %.pdf,%.txt,$(wildcard *.pdf))

Python Subprocess¶

import subprocess
import json

def convert_file(input_path, output_path=None, **options):
    cmd = ["pyforge", "convert", input_path]
    if output_path:
        cmd.append(output_path)

    for key, value in options.items():
        cmd.append(f"--{key.replace('_', '-')}")
        if value is not True:
            cmd.append(str(value))

    return subprocess.run(cmd, capture_output=True, text=True)

def get_file_info(file_path):
    result = subprocess.run(
        ["pyforge", "info", file_path, "--format", "json"],
        capture_output=True, text=True
    )
    return json.loads(result.stdout) if result.returncode == 0 else None

Installation Commands¶

`pyforge install`¶

Install prerequisites for specific file format converters.

pyforge install <tool>

Available Tools¶

`pyforge install mdf-tools`¶

Install Docker Desktop and SQL Server Express for MDF file processing.

pyforge install mdf-tools [options]

Options: - --password <password>: Custom SQL Server password (default: PyForge@2024!) - --port <port>: Custom SQL Server port (default: 1433) - --non-interactive: Run in non-interactive mode for automation

Examples:

# Default installation
pyforge install mdf-tools

# Custom password and port
pyforge install mdf-tools --password "MySecure123!" --port 1433

# Non-interactive mode (for scripts)
pyforge install mdf-tools --non-interactive

MDF Tools Management¶

`pyforge mdf-tools`¶

Manage SQL Server Express container for MDF file processing.

`pyforge mdf-tools status`¶

Check Docker and SQL Server status.

pyforge mdf-tools status

Sample Output:

                      MDF Tools Status                       
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Component             ┃ Status ┃ Details                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Docker Installed      │ ✓ OK   │ Docker command available │
│ Docker Running        │ ✓ OK   │ Docker daemon responsive │
│ SQL Container Exists  │ ✓ OK   │ Container created        │
│ SQL Container Running │ ✓ OK   │ Container active         │
│ SQL Server Responding │ ✓ OK   │ Database accessible      │
│ Configuration File    │ ✓ OK   │ Settings saved           │
└───────────────────────┴────────┴──────────────────────────┘

✅ All systems operational - ready for MDF processing!

`pyforge mdf-tools start`¶

Start the SQL Server Express container.

pyforge mdf-tools start

`pyforge mdf-tools stop`¶

Stop the SQL Server Express container.

pyforge mdf-tools stop

`pyforge mdf-tools restart`¶

Restart the SQL Server Express container.

pyforge mdf-tools restart

`pyforge mdf-tools logs`¶

View SQL Server container logs.

pyforge mdf-tools logs [options]

Options: - --lines N, -n N: Number of log lines to show (default: 50)

Examples:

# Show last 50 lines (default)
pyforge mdf-tools logs

# Show last 100 lines
pyforge mdf-tools logs --lines 100

# Show last 10 lines
pyforge mdf-tools logs -n 10

`pyforge mdf-tools config`¶

Display current MDF tools configuration.

pyforge mdf-tools config

`pyforge mdf-tools test`¶

Test SQL Server connectivity and responsiveness.

pyforge mdf-tools test

Sample Output:

🔍 Testing SQL Server connection...
✅ SQL Server connection successful!

`pyforge mdf-tools uninstall`¶

Remove SQL Server container and clean up all data.

pyforge mdf-tools uninstall

Warning: This command permanently removes the SQL Server container, all data volumes, and configuration files.

MDF Tools Usage Examples¶

Complete MDF Processing Workflow¶

# Step 1: Install MDF processing tools (one-time setup)
pyforge install mdf-tools

# Step 2: Verify installation
pyforge mdf-tools status

# Step 3: Test connectivity
pyforge mdf-tools test

# Step 4: Convert MDF files (when converter is available)
# pyforge convert database.mdf --format parquet

# Container lifecycle management
pyforge mdf-tools start      # Start SQL Server
pyforge mdf-tools stop       # Stop SQL Server
pyforge mdf-tools restart    # Restart SQL Server
pyforge mdf-tools logs       # View logs
pyforge mdf-tools config     # Show configuration
pyforge mdf-tools uninstall  # Complete removal

Automation and Scripting¶

# Non-interactive installation for CI/CD
pyforge install mdf-tools --non-interactive

# Check if ready for processing
if pyforge mdf-tools status | grep -q "All systems operational"; then
    echo "Ready for MDF processing"
else
    echo "MDF tools not ready"
    exit 1
fi

# Automated container management
pyforge mdf-tools start && \
pyforge mdf-tools test && \
echo "SQL Server is ready for MDF processing"

Databricks Serverless Specific Features¶

Backend Selection for MDB/ACCDB Files¶

PyForge CLI v1.0.9 supports multiple backends for MDB/ACCDB conversion:

# Default backend (auto-selected)
pyforge convert database.mdb

# Force subprocess backend (recommended for Databricks Serverless)
pyforge convert database.mdb --backend subprocess

# Force shell backend (for local environments)
pyforge convert database.mdb --backend shell

Backend Selection Logic: - Serverless Environment: Automatically uses subprocess backend - Local Environment: Prefers shell backend, falls back to subprocess - Manual Override: Use --backend flag to force specific backend

Unity Catalog Volume Path Handling¶

# Enable volume path handling for Unity Catalog
pyforge convert /Volumes/catalog/schema/volume/data.mdb --volume-path

# Without volume path flag (may fail on volumes)
pyforge convert /Volumes/catalog/schema/volume/data.mdb  # Not recommended

# Volume path with other options
pyforge convert /Volumes/catalog/schema/volume/secure.mdb \
  --volume-path --backend subprocess --password "secret" --verbose

Installation in Databricks Serverless¶

# Install PyForge CLI in Databricks notebook
%pip install /Volumes/catalog/schema/pkgs/pyforge_cli-1.0.9-py3-none-any.whl \
  --no-cache-dir --quiet --index-url https://pypi.org/simple/ --trusted-host pypi.org

# Restart Python kernel
dbutils.library.restartPython()

# Verify installation
%sh pyforge --version

Environment Detection¶

PyForge CLI automatically detects Databricks Serverless environment using:

# Environment variables checked
IS_SERVERLESS=TRUE
SPARK_CONNECT_MODE_ENABLED=1
DATABRICKS_RUNTIME_VERSION=client.14.3.x-scala2.12

Troubleshooting Databricks Issues¶

# Check environment detection
echo "IS_SERVERLESS: $IS_SERVERLESS"
echo "SPARK_CONNECT_MODE_ENABLED: $SPARK_CONNECT_MODE_ENABLED"

# Test backend availability
pyforge convert test.mdb --backend subprocess --verbose

# Volume path testing
dbutils.fs.ls("/Volumes/catalog/schema/volume/")
pyforge convert /Volumes/catalog/schema/volume/test.mdb --volume-path --verbose

# Debug Java availability
java -version  # Should work in Databricks runtime

CLI Command Reference¶

Main Commands¶

pyforge install¶

Available Components¶

pyforge convert¶

Examples¶

Options¶

pyforge info¶

Examples¶

Options¶

pyforge validate¶

Examples¶

Options¶

pyforge formats¶

Examples¶

Options¶

Global Options¶

Environment Variables¶

Exit Codes¶

Configuration File¶

Advanced Usage Patterns¶

Databricks Serverless Usage¶

Batch Processing¶

Pipeline Integration¶

Error Handling¶

Format-Specific Examples¶

PDF Processing¶

Excel Processing¶

Database Processing¶

XML Processing¶

DBF Processing¶

Troubleshooting Commands¶

Debug Information¶

Common Issues¶

Performance Monitoring¶

Timing Commands¶

Optimization¶

Integration Examples¶

Makefile Integration¶

Python Subprocess¶

Installation Commands¶

pyforge install¶

Available Tools¶

pyforge install mdf-tools¶

MDF Tools Management¶

pyforge mdf-tools¶

pyforge mdf-tools status¶

pyforge mdf-tools start¶

pyforge mdf-tools stop¶

pyforge mdf-tools restart¶

pyforge mdf-tools logs¶

pyforge mdf-tools config¶

pyforge mdf-tools test¶

pyforge mdf-tools uninstall¶

MDF Tools Usage Examples¶

Complete MDF Processing Workflow¶

Automation and Scripting¶

Databricks Serverless Specific Features¶

Backend Selection for MDB/ACCDB Files¶

Unity Catalog Volume Path Handling¶

Installation in Databricks Serverless¶

Environment Detection¶

Troubleshooting Databricks Issues¶

See Also¶

`pyforge install`¶

`pyforge convert`¶

`pyforge info`¶

`pyforge validate`¶

`pyforge formats`¶

`pyforge install`¶

`pyforge install mdf-tools`¶

`pyforge mdf-tools`¶

`pyforge mdf-tools status`¶

`pyforge mdf-tools start`¶

`pyforge mdf-tools stop`¶

`pyforge mdf-tools restart`¶

`pyforge mdf-tools logs`¶

`pyforge mdf-tools config`¶

`pyforge mdf-tools test`¶

`pyforge mdf-tools uninstall`¶