Skip to content

File Deduplication

dedup.flexfs identifies files with identical content within a flexFS volume and optionally replaces duplicates with hard links, reclaiming storage space without altering the visible file structure.

  1. Candidate discovery: The metadata server identifies files that share the same size and block count — a fast, metadata-only operation. All paths are returned, including multiple hard links to the same inode.
  2. Checksum grouping: For groups of 3 or more unique inodes, dedup.flexfs computes xxhash64 checksums in parallel to sub-group candidates. Pairs skip this step.
  3. Byte verification: Each candidate inode is compared byte-for-byte against the retained inode to eliminate false positives. Verification runs once per unique inode, not per path.
  4. Retention heuristic: The inode with the oldest birth time is retained. Ties are broken by the highest hard link count.
  5. Hard link replacement: With --fix, all paths for each verified duplicate inode are atomically replaced with hard links to the retained inode.
Terminal window
dedup.flexfs /mnt/flexfs/data

Output shows duplicate groups with file paths and a summary:

Bytes: 4194304, Blocks: 1
/mnt/flexfs/data/file-a.bin (primary)
/mnt/flexfs/data/backup/file-a.bin (duplicate)
3 duplicate files found
Run with --fix to deduplicate with hard links
Terminal window
sudo dedup.flexfs --fix /mnt/flexfs/data

Root privileges are required for --fix because replacing files with hard links modifies inode link counts.

Focus on large files to maximize space savings:

Terminal window
# Only files 1 MiB or larger
dedup.flexfs --minSize 1048576 /mnt/flexfs/data
# Only files with 2-100 blocks
dedup.flexfs --minBlocks 2 --maxBlocks 100 /mnt/flexfs/data

When --fix is applied, all paths referencing a duplicate inode are replaced with hard links to the retained inode. Once all paths are replaced, the duplicate inode’s link count reaches zero and its data blocks are freed.

  • Always dry-run first: Run without --fix to review which files will be affected before making changes.
  • Permissions and ownership change: When a duplicate is replaced with a hard link, it inherits the retained file’s ownership, permissions, and timestamps. If duplicates had different owners or permissions, those are silently lost. This is especially important in multi-user environments where files may be owned by different users but happen to have identical content.
  • Shared inode side effects: After deduplication, all hard-linked paths point to the same inode and the same data blocks. A write to any path modifies the data seen by all paths. Similarly, chmod, chown, and truncate affect every linked path. If independent copies are needed, copy the file to a new path rather than linking.
  • Requires a flexFS mount: The path must be within an active flexFS mount. The tool discovers the metadata server by reading .flexfs/volume at the mount root.
  • Time-travel mounts: --fix is not supported on time-travel mounts (read-only).
  • Cross-volume: Deduplication operates within a single volume. Cross-volume deduplication is not supported.
  • Content changes: If a file is modified between the metadata scan and the byte comparison, the byte comparison will detect the difference and skip it.

For ongoing deduplication, run dedup.flexfs periodically via cron:

/etc/cron.weekly/flexfs-dedup
#!/bin/bash
/usr/sbin/dedup.flexfs --fix --minSize 1048576 /mnt/flexfs/data >> /var/log/flexfs-dedup.log 2>&1
FlagTypeDefaultDescription
--fixboolfalseReplace duplicates with hard links (requires root)
--limituint640Maximum number of duplicate groups (0 = unlimited). Largest groups first.
--maxBlocksuint640Maximum blocks filter (0 = no limit)
--maxSizeuint640Maximum byte size filter (0 = no limit)
--minBlocksuint640Minimum blocks filter
--minSizeuint640Minimum byte size filter

See the dedup.flexfs CLI reference for the complete flag listing including internal flags.