Binary File Format

Version: 1.0 Date: October 08, 2025 SPDX-License-Identifier: BSD-3-Clause License File: See the LICENSE file in the project root. Copyright: © 2025 Michael Gardner, A Bit of Help, Inc. Authors: Michael Gardner, Claude Code Status: Active

Overview

The Adaptive Pipeline uses a custom binary file format (.adapipe) to store processed files with complete recovery metadata and integrity verification. This format enables perfect restoration of original files while maintaining processing history and security.

Key Features:

Complete Recovery: All metadata needed to restore original files
Integrity Verification: SHA-256 checksums for both input and output
Processing History: Complete record of all processing steps
Format Versioning: Backward compatibility through version management
Security: Supports encryption with nonce management

File Format Specification

Binary Layout

The .adapipe format uses a reverse-header design for efficient processing:

┌─────────────────────────────────────────────┐
│          PROCESSED CHUNK DATA               │
│         (variable length)                   │
│  - Compressed and/or encrypted chunks       │
│  - Each chunk: [NONCE][LENGTH][DATA]        │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│          JSON HEADER                        │
│         (variable length)                   │
│  - Processing metadata                      │
│  - Recovery information                     │
│  - Checksums                                │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│      HEADER_LENGTH (4 bytes, u32 LE)        │
│  - Length of JSON header in bytes           │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│    FORMAT_VERSION (2 bytes, u16 LE)         │
│  - Current version: 1                       │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│      MAGIC_BYTES (8 bytes)                  │
│  - "ADAPIPE\0" (0x4144415049504500)         │
└─────────────────────────────────────────────┘

Why Reverse Header?

Efficient Reading: Read magic bytes and version first
Validation: Quickly validate format without reading entire file
Streaming: Process chunk data while reading header
Metadata Location: Header location calculated from end of file

Magic Bytes

#![allow(unused)]
fn main() {
pub const MAGIC_BYTES: [u8; 8] = [
    0x41, 0x44, 0x41, 0x50, // "ADAP"
    0x49, 0x50, 0x45, 0x00  // "IPE\0"
];
}

Purpose:

Identify files in .adapipe format
Prevent accidental processing of wrong file types
Enable format detection tools

Format Version

#![allow(unused)]
fn main() {
pub const CURRENT_FORMAT_VERSION: u16 = 1;
}

Version History:

Version 1: Initial format with compression, encryption, checksum support

Future Versions:

Version 2: Enhanced metadata, additional algorithms
Version 3: Streaming optimizations, compression improvements

File Header Structure

Header Fields

The JSON header contains comprehensive metadata:

#![allow(unused)]
fn main() {
pub struct FileHeader {
    /// Application version (e.g., "0.1.0")
    pub app_version: String,

    /// File format version (1)
    pub format_version: u16,

    /// Original input filename
    pub original_filename: String,

    /// Original file size in bytes
    pub original_size: u64,

    /// SHA-256 checksum of original file
    pub original_checksum: String,

    /// SHA-256 checksum of processed file
    pub output_checksum: String,

    /// Processing steps applied (in order)
    pub processing_steps: Vec<ProcessingStep>,

    /// Chunk size used (bytes)
    pub chunk_size: u32,

    /// Number of chunks
    pub chunk_count: u32,

    /// Processing timestamp (RFC3339)
    pub processed_at: DateTime<Utc>,

    /// Pipeline ID
    pub pipeline_id: String,

    /// Additional metadata
    pub metadata: HashMap<String, String>,
}
}

Processing Steps

Each processing step records transformation details:

#![allow(unused)]
fn main() {
pub struct ProcessingStep {
    /// Step type (compression, encryption, etc.)
    pub step_type: ProcessingStepType,

    /// Algorithm used (e.g., "brotli", "aes-256-gcm")
    pub algorithm: String,

    /// Algorithm-specific parameters
    pub parameters: HashMap<String, String>,

    /// Application order (0-based)
    pub order: u32,
}

pub enum ProcessingStepType {
    Compression,
    Encryption,
    Checksum,
    PassThrough,
    Custom(String),
}
}

Example Processing Steps:

{
  "processing_steps": [
    {
      "step_type": "Compression",
      "algorithm": "brotli",
      "parameters": {
        "level": "6"
      },
      "order": 0
    },
    {
      "step_type": "Encryption",
      "algorithm": "aes-256-gcm",
      "parameters": {
        "key_derivation": "argon2"
      },
      "order": 1
    },
    {
      "step_type": "Checksum",
      "algorithm": "sha256",
      "parameters": {},
      "order": 2
    }
  ]
}

Chunk Format

Chunk Structure

Each chunk in the processed data section follows this format:

┌────────────────────────────────────┐
│   NONCE (12 bytes)                 │
│  - Unique for each chunk           │
│  - Used for encryption IV          │
└────────────────────────────────────┘
┌────────────────────────────────────┐
│   DATA_LENGTH (4 bytes, u32 LE)    │
│  - Length of encrypted data        │
└────────────────────────────────────┘
┌────────────────────────────────────┐
│   ENCRYPTED_DATA (variable)        │
│  - Compressed and encrypted        │
│  - Includes authentication tag     │
└────────────────────────────────────┘

Rust Structure:

#![allow(unused)]
fn main() {
pub struct ChunkFormat {
    /// Encryption nonce (12 bytes for AES-GCM)
    pub nonce: [u8; 12],

    /// Length of encrypted data
    pub data_length: u32,

    /// Encrypted (and possibly compressed) chunk data
    pub encrypted_data: Vec<u8>,
}
}

Chunk Processing

Forward Processing (Compress → Encrypt):

1. Read original chunk
2. Compress chunk data
3. Generate unique nonce
4. Encrypt compressed data
5. Write: [NONCE][LENGTH][ENCRYPTED_DATA]

Reverse Processing (Decrypt → Decompress):

1. Read: [NONCE][LENGTH][ENCRYPTED_DATA]
2. Decrypt using nonce
3. Decompress decrypted data
4. Verify checksum
5. Write original chunk

Creating Binary Files

Basic File Creation

#![allow(unused)]
fn main() {
use adaptive_pipeline_domain::value_objects::{FileHeader, ProcessingStep};
use std::fs::File;
use std::io::Write;

fn create_adapipe_file(
    input_data: &[u8],
    output_path: &str,
    processing_steps: Vec<ProcessingStep>,
) -> Result<(), PipelineError> {
    // Create header
    let original_checksum = calculate_sha256(input_data);
    let mut header = FileHeader::new(
        "input.txt".to_string(),
        input_data.len() as u64,
        original_checksum,
    );

    // Add processing steps
    header.processing_steps = processing_steps;
    header.chunk_count = calculate_chunk_count(input_data.len(), header.chunk_size);

    // Process chunks
    let processed_data = process_chunks(input_data, &header.processing_steps)?;

    // Calculate output checksum
    header.output_checksum = calculate_sha256(&processed_data);

    // Serialize header to JSON
    let json_header = serde_json::to_vec(&header)?;
    let header_length = json_header.len() as u32;

    // Write file in reverse order
    let mut file = File::create(output_path)?;

    // 1. Write processed data
    file.write_all(&processed_data)?;

    // 2. Write JSON header
    file.write_all(&json_header)?;

    // 3. Write header length
    file.write_all(&header_length.to_le_bytes())?;

    // 4. Write format version
    file.write_all(&CURRENT_FORMAT_VERSION.to_le_bytes())?;

    // 5. Write magic bytes
    file.write_all(&MAGIC_BYTES)?;

    Ok(())
}
}

Adding Processing Steps

#![allow(unused)]
fn main() {
impl FileHeader {
    /// Add compression step
    pub fn add_compression_step(mut self, algorithm: &str, level: u32) -> Self {
        let mut parameters = HashMap::new();
        parameters.insert("level".to_string(), level.to_string());

        self.processing_steps.push(ProcessingStep {
            step_type: ProcessingStepType::Compression,
            algorithm: algorithm.to_string(),
            parameters,
            order: self.processing_steps.len() as u32,
        });

        self
    }

    /// Add encryption step
    pub fn add_encryption_step(
        mut self,
        algorithm: &str,
        key_derivation: &str
    ) -> Self {
        let mut parameters = HashMap::new();
        parameters.insert("key_derivation".to_string(), key_derivation.to_string());

        self.processing_steps.push(ProcessingStep {
            step_type: ProcessingStepType::Encryption,
            algorithm: algorithm.to_string(),
            parameters,
            order: self.processing_steps.len() as u32,
        });

        self
    }

    /// Add checksum step
    pub fn add_checksum_step(mut self, algorithm: &str) -> Self {
        self.processing_steps.push(ProcessingStep {
            step_type: ProcessingStepType::Checksum,
            algorithm: algorithm.to_string(),
            parameters: HashMap::new(),
            order: self.processing_steps.len() as u32,
        });

        self
    }
}
}

Reading Binary Files

Basic File Reading

#![allow(unused)]
fn main() {
use std::fs::File;
use std::io::{Read, Seek, SeekFrom};

fn read_adapipe_file(path: &str) -> Result<FileHeader, PipelineError> {
    let mut file = File::open(path)?;

    // Read from end of file (reverse header)
    file.seek(SeekFrom::End(-8))?;

    // 1. Read and validate magic bytes
    let mut magic = [0u8; 8];
    file.read_exact(&mut magic)?;

    if magic != MAGIC_BYTES {
        return Err(PipelineError::InvalidFormat(
            "Not an .adapipe file".to_string()
        ));
    }

    // 2. Read format version
    file.seek(SeekFrom::End(-10))?;
    let mut version_bytes = [0u8; 2];
    file.read_exact(&mut version_bytes)?;
    let version = u16::from_le_bytes(version_bytes);

    if version > CURRENT_FORMAT_VERSION {
        return Err(PipelineError::UnsupportedVersion(version));
    }

    // 3. Read header length
    file.seek(SeekFrom::End(-14))?;
    let mut length_bytes = [0u8; 4];
    file.read_exact(&mut length_bytes)?;
    let header_length = u32::from_le_bytes(length_bytes) as usize;

    // 4. Read JSON header
    file.seek(SeekFrom::End(-(14 + header_length as i64)))?;
    let mut json_data = vec![0u8; header_length];
    file.read_exact(&mut json_data)?;

    // 5. Deserialize header
    let header: FileHeader = serde_json::from_slice(&json_data)?;

    Ok(header)
}
}

Reading Chunk Data

#![allow(unused)]
fn main() {
fn read_chunks(
    file: &mut File,
    header: &FileHeader
) -> Result<Vec<ChunkFormat>, PipelineError> {
    let mut chunks = Vec::with_capacity(header.chunk_count as usize);

    // Seek to start of chunk data
    file.seek(SeekFrom::Start(0))?;

    for _ in 0..header.chunk_count {
        // Read nonce
        let mut nonce = [0u8; 12];
        file.read_exact(&mut nonce)?;

        // Read data length
        let mut length_bytes = [0u8; 4];
        file.read_exact(&mut length_bytes)?;
        let data_length = u32::from_le_bytes(length_bytes);

        // Read encrypted data
        let mut encrypted_data = vec![0u8; data_length as usize];
        file.read_exact(&mut encrypted_data)?;

        chunks.push(ChunkFormat {
            nonce,
            data_length,
            encrypted_data,
        });
    }

    Ok(chunks)
}
}

File Recovery

Complete Recovery Process

#![allow(unused)]
fn main() {
fn restore_original_file(
    input_path: &str,
    output_path: &str,
    password: Option<&str>,
) -> Result<(), PipelineError> {
    // 1. Read header
    let header = read_adapipe_file(input_path)?;

    // 2. Read chunks
    let mut file = File::open(input_path)?;
    let chunks = read_chunks(&mut file, &header)?;

    // 3. Process chunks in reverse order
    let mut restored_data = Vec::new();

    for chunk in chunks {
        let mut chunk_data = chunk.encrypted_data;

        // Reverse processing steps
        for step in header.processing_steps.iter().rev() {
            chunk_data = match step.step_type {
                ProcessingStepType::Encryption => {
                    decrypt_chunk(chunk_data, &chunk.nonce, &step, password)?
                }
                ProcessingStepType::Compression => {
                    decompress_chunk(chunk_data, &step)?
                }
                ProcessingStepType::Checksum => {
                    verify_chunk_checksum(&chunk_data, &step)?;
                    chunk_data
                }
                _ => chunk_data,
            };
        }

        restored_data.extend_from_slice(&chunk_data);
    }

    // 4. Verify restored data
    let restored_checksum = calculate_sha256(&restored_data);
    if restored_checksum != header.original_checksum {
        return Err(PipelineError::IntegrityError(
            "Restored data checksum mismatch".to_string()
        ));
    }

    // 5. Write restored file
    let mut output = File::create(output_path)?;
    output.write_all(&restored_data)?;

    Ok(())
}
}

Processing Step Reversal

#![allow(unused)]
fn main() {
fn reverse_processing_step(
    data: Vec<u8>,
    step: &ProcessingStep,
    password: Option<&str>,
) -> Result<Vec<u8>, PipelineError> {
    match step.step_type {
        ProcessingStepType::Compression => {
            // Decompress
            match step.algorithm.as_str() {
                "brotli" => decompress_brotli(data),
                "gzip" => decompress_gzip(data),
                "zstd" => decompress_zstd(data),
                "lz4" => decompress_lz4(data),
                _ => Err(PipelineError::UnsupportedAlgorithm(
                    step.algorithm.clone()
                )),
            }
        }
        ProcessingStepType::Encryption => {
            // Decrypt
            let password = password.ok_or(PipelineError::MissingPassword)?;
            match step.algorithm.as_str() {
                "aes-256-gcm" => decrypt_aes_256_gcm(data, password, step),
                "chacha20-poly1305" => decrypt_chacha20(data, password, step),
                _ => Err(PipelineError::UnsupportedAlgorithm(
                    step.algorithm.clone()
                )),
            }
        }
        ProcessingStepType::Checksum => {
            // Verify checksum (no transformation)
            verify_checksum(&data, step)?;
            Ok(data)
        }
        _ => Ok(data),
    }
}
}

Integrity Verification

File Validation

#![allow(unused)]
fn main() {
fn validate_adapipe_file(path: &str) -> Result<ValidationReport, PipelineError> {
    let mut report = ValidationReport::new();

    // 1. Read and validate header
    let header = match read_adapipe_file(path) {
        Ok(h) => {
            report.add_check("Header format", true, "Valid");
            h
        }
        Err(e) => {
            report.add_check("Header format", false, &e.to_string());
            return Ok(report);
        }
    };

    // 2. Validate format version
    if header.format_version <= CURRENT_FORMAT_VERSION {
        report.add_check("Format version", true, &format!("v{}", header.format_version));
    } else {
        report.add_check(
            "Format version",
            false,
            &format!("Unsupported: v{}", header.format_version)
        );
    }

    // 3. Validate processing steps
    for (i, step) in header.processing_steps.iter().enumerate() {
        let is_supported = match step.step_type {
            ProcessingStepType::Compression => {
                matches!(step.algorithm.as_str(), "brotli" | "gzip" | "zstd" | "lz4")
            }
            ProcessingStepType::Encryption => {
                matches!(step.algorithm.as_str(), "aes-256-gcm" | "chacha20-poly1305")
            }
            _ => true,
        };

        report.add_check(
            &format!("Step {} ({:?})", i, step.step_type),
            is_supported,
            &step.algorithm
        );
    }

    // 4. Verify output checksum
    let mut file = File::open(path)?;
    let data_length = file.metadata()?.len() - 14 - header.json_size() as u64;
    let mut processed_data = vec![0u8; data_length as usize];
    file.read_exact(&mut processed_data)?;

    let calculated_checksum = calculate_sha256(&processed_data);
    let checksums_match = calculated_checksum == header.output_checksum;

    report.add_check(
        "Output checksum",
        checksums_match,
        if checksums_match { "Valid" } else { "Mismatch" }
    );

    Ok(report)
}

pub struct ValidationReport {
    checks: Vec<(String, bool, String)>,
}

impl ValidationReport {
    pub fn new() -> Self {
        Self { checks: Vec::new() }
    }

    pub fn add_check(&mut self, name: &str, passed: bool, message: &str) {
        self.checks.push((name.to_string(), passed, message.to_string()));
    }

    pub fn is_valid(&self) -> bool {
        self.checks.iter().all(|(_, passed, _)| *passed)
    }
}
}

Checksum Verification

#![allow(unused)]
fn main() {
fn verify_file_integrity(path: &str) -> Result<bool, PipelineError> {
    let header = read_adapipe_file(path)?;

    // Calculate actual checksum
    let mut file = File::open(path)?;
    let data_length = file.metadata()?.len() - 14 - header.json_size() as u64;
    let mut data = vec![0u8; data_length as usize];
    file.read_exact(&mut data)?;

    let calculated = calculate_sha256(&data);

    // Compare with stored checksum
    Ok(calculated == header.output_checksum)
}

fn calculate_sha256(data: &[u8]) -> String {
    let mut hasher = Sha256::new();
    hasher.update(data);
    format!("{:x}", hasher.finalize())
}
}

Version Management

Format Versioning

#![allow(unused)]
fn main() {
pub fn check_format_compatibility(version: u16) -> Result<(), PipelineError> {
    match version {
        1 => Ok(()), // Current version
        v if v < CURRENT_FORMAT_VERSION => {
            // Older version - attempt migration
            migrate_format(v, CURRENT_FORMAT_VERSION)
        }
        v => Err(PipelineError::UnsupportedVersion(v)),
    }
}
}

Format Migration

#![allow(unused)]
fn main() {
fn migrate_format(from: u16, to: u16) -> Result<(), PipelineError> {
    match (from, to) {
        (1, 2) => {
            // Migration from v1 to v2
            // Add new fields with defaults
            Ok(())
        }
        _ => Err(PipelineError::MigrationUnsupported(from, to)),
    }
}
}

Backward Compatibility

#![allow(unused)]
fn main() {
fn read_any_version(path: &str) -> Result<FileHeader, PipelineError> {
    let version = read_format_version(path)?;

    match version {
        1 => read_v1_format(path),
        2 => read_v2_format(path),
        v => Err(PipelineError::UnsupportedVersion(v)),
    }
}
}

Best Practices

File Creation

Always set checksums:

#![allow(unused)]
fn main() {
// ✅ GOOD: Set both checksums
let original_checksum = calculate_sha256(&input_data);
let header = FileHeader::new(filename, size, original_checksum);
// ... process data ...
header.output_checksum = calculate_sha256(&processed_data);
}

Record all processing steps:

#![allow(unused)]
fn main() {
// ✅ GOOD: Record every transformation
header = header
    .add_compression_step("brotli", 6)
    .add_encryption_step("aes-256-gcm", "argon2")
    .add_checksum_step("sha256");
}

File Reading

Always validate format:

#![allow(unused)]
fn main() {
// ✅ GOOD: Validate before processing
let header = read_adapipe_file(path)?;

if header.format_version > CURRENT_FORMAT_VERSION {
    return Err(PipelineError::UnsupportedVersion(
        header.format_version
    ));
}
}

Verify checksums:

#![allow(unused)]
fn main() {
// ✅ GOOD: Verify integrity
let restored_checksum = calculate_sha256(&restored_data);
if restored_checksum != header.original_checksum {
    return Err(PipelineError::IntegrityError(
        "Checksum mismatch".to_string()
    ));
}
}

Error Handling

Handle all error cases:

#![allow(unused)]
fn main() {
match read_adapipe_file(path) {
    Ok(header) => process_file(header),
    Err(PipelineError::InvalidFormat(msg)) => {
        eprintln!("Not a valid .adapipe file: {}", msg);
    }
    Err(PipelineError::UnsupportedVersion(v)) => {
        eprintln!("Unsupported format version: {}", v);
    }
    Err(e) => {
        eprintln!("Error reading file: {}", e);
    }
}
}

Security Considerations

Nonce Management

Never reuse nonces:

#![allow(unused)]
fn main() {
// ✅ GOOD: Generate unique nonce per chunk
fn generate_nonce() -> [u8; 12] {
    let mut nonce = [0u8; 12];
    use rand::RngCore;
    rand::thread_rng().fill_bytes(&mut nonce);
    nonce
}
}

Key Derivation

Use strong key derivation:

#![allow(unused)]
fn main() {
// ✅ GOOD: Argon2 for password-based encryption
fn derive_key(password: &str, salt: &[u8]) -> Vec<u8> {
    use argon2::{Argon2, PasswordHasher};

    let argon2 = Argon2::default();
    let hash = argon2.hash_password(password.as_bytes(), salt)
        .unwrap();

    hash.hash.unwrap().as_bytes().to_vec()
}
}

Integrity Protection

Verify at every step:

#![allow(unused)]
fn main() {
// ✅ GOOD: Verify after each transformation
fn process_with_verification(
    data: Vec<u8>,
    step: &ProcessingStep
) -> Result<Vec<u8>, PipelineError> {
    let processed = apply_transformation(data, step)?;
    verify_transformation(&processed, step)?;
    Ok(processed)
}
}

Next Steps

Now that you understand the binary file format:

Chunking Strategy - Efficient chunk processing
File I/O - File reading and writing patterns
Integrity Verification - Checksum algorithms
Encryption - Encryption implementation details

Pipeline Developer Guide