File Deduplication
dedup.flexfs identifies files with identical content within a flexFS volume and optionally replaces duplicates with hard links, reclaiming storage space without altering the visible file structure.
How It Works
Section titled “How It Works”- Candidate discovery: The metadata server identifies files that share the same size and block count — a fast, metadata-only operation. All paths are returned, including multiple hard links to the same inode.
- Checksum grouping: For groups of 3 or more unique inodes,
dedup.flexfscomputes xxhash64 checksums in parallel to sub-group candidates. Pairs skip this step. - Byte verification: Each candidate inode is compared byte-for-byte against the retained inode to eliminate false positives. Verification runs once per unique inode, not per path.
- Retention heuristic: The inode with the oldest birth time is retained. Ties are broken by the highest hard link count.
- Hard link replacement: With
--fix, all paths for each verified duplicate inode are atomically replaced with hard links to the retained inode.
Basic Usage
Section titled “Basic Usage”Dry Run (Report Only)
Section titled “Dry Run (Report Only)”dedup.flexfs /mnt/flexfs/dataOutput shows duplicate groups with file paths and a summary:
Bytes: 4194304, Blocks: 1 /mnt/flexfs/data/file-a.bin (primary) /mnt/flexfs/data/backup/file-a.bin (duplicate)3 duplicate files foundRun with --fix to deduplicate with hard linksApply Deduplication
Section titled “Apply Deduplication”sudo dedup.flexfs --fix /mnt/flexfs/dataRoot privileges are required for --fix because replacing files with hard links modifies inode link counts.
Filtering by Size and Blocks
Section titled “Filtering by Size and Blocks”Focus on large files to maximize space savings:
# Only files 1 MiB or largerdedup.flexfs --minSize 1048576 /mnt/flexfs/data
# Only files with 2-100 blocksdedup.flexfs --minBlocks 2 --maxBlocks 100 /mnt/flexfs/dataSpace Savings
Section titled “Space Savings”When --fix is applied, all paths referencing a duplicate inode are replaced with hard links to the retained inode. Once all paths are replaced, the duplicate inode’s link count reaches zero and its data blocks are freed.
Important Considerations for --fix
Section titled “Important Considerations for --fix”- Always dry-run first: Run without
--fixto review which files will be affected before making changes. - Permissions and ownership change: When a duplicate is replaced with a hard link, it inherits the retained file’s ownership, permissions, and timestamps. If duplicates had different owners or permissions, those are silently lost. This is especially important in multi-user environments where files may be owned by different users but happen to have identical content.
- Shared inode side effects: After deduplication, all hard-linked paths point to the same inode and the same data blocks. A write to any path modifies the data seen by all paths. Similarly,
chmod,chown, andtruncateaffect every linked path. If independent copies are needed, copy the file to a new path rather than linking.
Limitations
Section titled “Limitations”- Requires a flexFS mount: The path must be within an active flexFS mount. The tool discovers the metadata server by reading
.flexfs/volumeat the mount root. - Time-travel mounts:
--fixis not supported on time-travel mounts (read-only). - Cross-volume: Deduplication operates within a single volume. Cross-volume deduplication is not supported.
- Content changes: If a file is modified between the metadata scan and the byte comparison, the byte comparison will detect the difference and skip it.
Scheduling
Section titled “Scheduling”For ongoing deduplication, run dedup.flexfs periodically via cron:
#!/bin/bash/usr/sbin/dedup.flexfs --fix --minSize 1048576 /mnt/flexfs/data >> /var/log/flexfs-dedup.log 2>&1Complete Flag Reference
Section titled “Complete Flag Reference”| Flag | Type | Default | Description |
|---|---|---|---|
--fix | bool | false | Replace duplicates with hard links (requires root) |
--limit | uint64 | 0 | Maximum number of duplicate groups (0 = unlimited). Largest groups first. |
--maxBlocks | uint64 | 0 | Maximum blocks filter (0 = no limit) |
--maxSize | uint64 | 0 | Maximum byte size filter (0 = no limit) |
--minBlocks | uint64 | 0 | Minimum blocks filter |
--minSize | uint64 | 0 | Minimum byte size filter |
See the dedup.flexfs CLI reference for the complete flag listing including internal flags.