File Deduplication
dedup.flexfs identifies files with identical content within a flexFS volume and optionally replaces duplicates with hard links, reclaiming storage space without altering the visible file structure.
How It Works
Section titled “How It Works”- Candidate discovery: The metadata server identifies files that share the same size and block count — a fast, metadata-only operation.
- Checksum grouping: For groups of 3 or more candidates,
dedup.flexfscomputes xxhash64 checksums in parallel to sub-group files. Pairs skip this step. - Byte verification: Every candidate is compared byte-for-byte against the retained file to eliminate false positives.
- Retention heuristic: The file with the oldest birth time is retained. Ties are broken by the highest hard link count.
- Hard link replacement: With
--fix, each verified duplicate is atomically replaced with a hard link to the retained file.
Basic Usage
Section titled “Basic Usage”Dry Run (Report Only)
Section titled “Dry Run (Report Only)”dedup.flexfs /mnt/flexfs/dataOutput shows duplicate groups with file paths and a summary:
Bytes: 4194304, Blocks: 1, Checksum: 3a2f1b9c7d4e8f01 /mnt/flexfs/data/file-a.bin (primary) /mnt/flexfs/data/backup/file-a.bin (duplicate)3 duplicate files foundRun with --fix to deduplicate with hard linksApply Deduplication
Section titled “Apply Deduplication”sudo dedup.flexfs --fix /mnt/flexfs/dataRoot privileges are required for --fix because replacing files with hard links modifies inode link counts.
Filtering by Size and Blocks
Section titled “Filtering by Size and Blocks”Focus on large files to maximize space savings:
# Only files 1 MiB or largerdedup.flexfs --minSize 1048576 /mnt/flexfs/data
# Only files with 2-100 blocksdedup.flexfs --minBlocks 2 --maxBlocks 100 /mnt/flexfs/dataSpace Savings
Section titled “Space Savings”Space is reclaimed when a duplicate’s inode has a link count of 1 (it is the sole link to its data blocks). If a file already has multiple hard links, replacing one link does not free the underlying blocks.
Limitations
Section titled “Limitations”- Requires a flexFS mount: The path must be within an active flexFS mount. The tool discovers the metadata server by reading
.flexfs/volumeat the mount root. - Time-travel mounts:
--fixis not supported on time-travel mounts (read-only). - Cross-volume: Deduplication operates within a single volume. Cross-volume deduplication is not supported.
- Content changes: If a file is modified between the metadata scan and the byte comparison, the byte comparison will detect the difference and skip it.
Scheduling
Section titled “Scheduling”For ongoing deduplication, run dedup.flexfs periodically via cron:
#!/bin/bash/usr/sbin/dedup.flexfs --fix --minSize 1048576 /mnt/flexfs/data >> /var/log/flexfs-dedup.log 2>&1Complete Flag Reference
Section titled “Complete Flag Reference”| Flag | Type | Default | Description |
|---|---|---|---|
--fix | bool | false | Replace duplicates with hard links (requires root) |
--maxBlocks | uint64 | 0 | Maximum blocks filter (0 = no limit) |
--maxSize | uint64 | 0 | Maximum byte size filter (0 = no limit) |
--minBlocks | uint64 | 0 | Minimum blocks filter |
--minSize | uint64 | 0 | Minimum byte size filter |
See the dedup.flexfs CLI reference for the complete flag listing including internal flags.