Skip to content

File Deduplication

dedup.flexfs identifies files with identical content within a flexFS volume and optionally replaces duplicates with hard links, reclaiming storage space without altering the visible file structure.

  1. Candidate discovery: The metadata server identifies files that share the same size and block count — a fast, metadata-only operation.
  2. Checksum grouping: For groups of 3 or more candidates, dedup.flexfs computes xxhash64 checksums in parallel to sub-group files. Pairs skip this step.
  3. Byte verification: Every candidate is compared byte-for-byte against the retained file to eliminate false positives.
  4. Retention heuristic: The file with the oldest birth time is retained. Ties are broken by the highest hard link count.
  5. Hard link replacement: With --fix, each verified duplicate is atomically replaced with a hard link to the retained file.
Terminal window
dedup.flexfs /mnt/flexfs/data

Output shows duplicate groups with file paths and a summary:

Bytes: 4194304, Blocks: 1, Checksum: 3a2f1b9c7d4e8f01
/mnt/flexfs/data/file-a.bin (primary)
/mnt/flexfs/data/backup/file-a.bin (duplicate)
3 duplicate files found
Run with --fix to deduplicate with hard links
Terminal window
sudo dedup.flexfs --fix /mnt/flexfs/data

Root privileges are required for --fix because replacing files with hard links modifies inode link counts.

Focus on large files to maximize space savings:

Terminal window
# Only files 1 MiB or larger
dedup.flexfs --minSize 1048576 /mnt/flexfs/data
# Only files with 2-100 blocks
dedup.flexfs --minBlocks 2 --maxBlocks 100 /mnt/flexfs/data

Space is reclaimed when a duplicate’s inode has a link count of 1 (it is the sole link to its data blocks). If a file already has multiple hard links, replacing one link does not free the underlying blocks.

  • Requires a flexFS mount: The path must be within an active flexFS mount. The tool discovers the metadata server by reading .flexfs/volume at the mount root.
  • Time-travel mounts: --fix is not supported on time-travel mounts (read-only).
  • Cross-volume: Deduplication operates within a single volume. Cross-volume deduplication is not supported.
  • Content changes: If a file is modified between the metadata scan and the byte comparison, the byte comparison will detect the difference and skip it.

For ongoing deduplication, run dedup.flexfs periodically via cron:

/etc/cron.weekly/flexfs-dedup
#!/bin/bash
/usr/sbin/dedup.flexfs --fix --minSize 1048576 /mnt/flexfs/data >> /var/log/flexfs-dedup.log 2>&1
FlagTypeDefaultDescription
--fixboolfalseReplace duplicates with hard links (requires root)
--maxBlocksuint640Maximum blocks filter (0 = no limit)
--maxSizeuint640Maximum byte size filter (0 = no limit)
--minBlocksuint640Minimum blocks filter
--minSizeuint640Minimum byte size filter

See the dedup.flexfs CLI reference for the complete flag listing including internal flags.