Skip to content

POSIX Compliance

FlexFS implements a comprehensive set of POSIX filesystem operations through Linux FUSE (Filesystem in Userspace). Applications interact with flexFS exactly as they would with a local filesystem — no code changes or special APIs are required.

The mount client (mount.flexfs) implements the following FUSE operations:

OperationFUSE OpDescription
LookupFUSE_LOOKUPResolve a file name within a directory to its inode and attributes
CreateFUSE_CREATECreate and open a new regular file
MkDirFUSE_MKDIRCreate a directory
MkNodFUSE_MKNODCreate a filesystem node (regular file, device special file, FIFO)
SymlinkFUSE_SYMLINKCreate a symbolic link
LinkFUSE_LINKCreate a hard link to an existing inode
UnlinkFUSE_UNLINKRemove a file
RmDirFUSE_RMDIRRemove a directory
RenameFUSE_RENAME / FUSE_RENAME2Rename or move a file or directory
OperationFUSE OpDescription
OpenFUSE_OPENOpen a file
ReadFUSE_READRead data from an open file
WriteFUSE_WRITEWrite data to an open file
FlushFUSE_FLUSHFlush file data (called on each close())
FsyncFUSE_FSYNCSynchronize file data to storage
ReleaseFUSE_RELEASEClose a file handle
FallocateFUSE_FALLOCATEPreallocate or deallocate space for a file
LseekFUSE_LSEEKSeek for data or holes (SEEK_DATA, SEEK_HOLE)
OperationFUSE OpDescription
OpenDirFUSE_OPENDIROpen a directory for reading
ReadDirFUSE_READDIRRead directory entries
ReadDirPlusFUSE_READDIRPLUSRead directory entries with pre-fetched attributes
ReleaseDirFUSE_RELEASEDIRClose a directory handle
OperationFUSE OpDescription
GetAttrFUSE_GETATTRGet file attributes (stat)
SetAttrFUSE_SETATTRSet file attributes (chmod, chown, truncate, utimes)
AccessFUSE_ACCESSCheck file access permissions
StatFsFUSE_STATFSGet filesystem statistics (df)
ReadLinkFUSE_READLINKRead the target of a symbolic link
OperationFUSE OpDescription
GetXAttrFUSE_GETXATTRGet an extended attribute value
SetXAttrFUSE_SETXATTRSet an extended attribute
ListXAttrFUSE_LISTXATTRList all extended attribute names
RemoveXAttrFUSE_REMOVEXATTRRemove an extended attribute
OperationFUSE OpDescription
GetLkFUSE_GETLKTest whether a lock could be placed
SetLkFUSE_SETLKAcquire or release a lock (non-blocking)
SetLkWFUSE_SETLKWAcquire or release a lock (blocking, with retry)
OperationFUSE OpDescription
InitFUSE_INITInitialize the FUSE session and negotiate capabilities
DestroyFUSE_DESTROYTear down the FUSE session
ForgetFUSE_FORGETRelease a cached inode reference
BatchForgetFUSE_BATCH_FORGETRelease multiple cached inode references

FlexFS supports both POSIX (fcntl) and BSD (flock) file locking semantics. Locks are coordinated through the metadata server, making them effective across all mount clients for a volume.

POSIX-style byte-range locks are supported via fcntl(F_SETLK), fcntl(F_SETLKW), and fcntl(F_GETLK). These support shared (read) and exclusive (write) locks on arbitrary byte ranges.

Block alignment: Because flexFS cannot guarantee atomic operations within a single block across concurrent mounts, POSIX byte-range locks are aligned to block boundaries. A lock on bytes 100-200 of a file with a 4 MiB block size will effectively lock the entire first block (bytes 0 through 4,194,303). This is a best-effort approach that trades strict byte-range precision for correctness in the distributed case.

BSD-style whole-file locks are supported via flock(). These are implemented as full-range POSIX locks under the hood, with the lock owner identified by the file handle rather than by the fcntl owner field.

SetLkW (the blocking variant of SetLk) retries the lock acquisition in a polling loop with a 250 ms interval until the lock is granted or an error occurs.

All lock state is stored on the metadata server. When mount client A holds a lock, mount client B will see the conflict via GetLk and will block (or receive EAGAIN) on SetLk. Locks are released when the file handle is closed or explicitly unlocked.

FlexFS fully supports hard links via the Link operation. Multiple directory entries can reference the same inode, and the file’s link count (nlink) is maintained by the metadata server. The file’s data blocks are shared across all links; deleting a link decrements the link count, and the data blocks are only freed when the link count reaches zero and no file handles remain open.

Hard link semantics are consistent across mount clients — creating a hard link on one mount client is immediately visible to all other clients.

Symbolic links are created via the Symlink operation and read via ReadLink. The link target is stored as a metadata field on the inode. When encryption is enabled, the link target is encrypted using AES-256-GCM with a deterministic nonce (SHAKE-256 hash of the target string).

The MkNod operation supports creating:

  • Regular files
  • FIFO (named pipe) nodes
  • Character and block device special files

The device major/minor numbers and file mode are preserved in the inode metadata.

Extended attributes are stored as key-value pairs on each inode in the metadata server. They are enabled with the --xAttr flag (or implicitly by --acl or --rootSquash).

OperationBehavior
getxattrReturns the value for a named attribute
setxattrSets or replaces an attribute. Supports XATTR_CREATE (fail if exists) and XATTR_REPLACE (fail if absent) flags
listxattrLists all attribute names on an inode
removexattrRemoves a named attribute

When encryption is enabled, both attribute names and values are encrypted. Names use deterministic encryption (for server-side matching); values use random nonces.

FlexFS supports POSIX extended ACLs (also known as POSIX.1e draft ACLs), which are stored as extended attributes (system.posix_acl_access and system.posix_acl_default). ACLs are enabled with the --acl flag.

The ACL implementation includes:

  • Standard POSIX permission checks (user, group, other) for all operations.
  • Extended ACL evaluation when ACL xattrs are present, supporting named user and group entries.
  • SUID/SGID handling: SUID/SGID bits influence the effective user and group for permission checks.
  • Sticky bit: The sticky bit on directories restricts deletion to the file owner, directory owner, or root.
  • Root squashing: When --rootSquash is enabled, operations by uid 0 / gid 0 are remapped to a configurable uid/gid (default 65534/65534, i.e., nobody). Root squashing implies ACL support.
FlagDefaultDescription
--aclfalseEnable extended ACL support (implies --xAttr)
--rootSquashfalseEnable root squashing (implies --acl)
--rootSquashUID65534UID to map root to when root squashing is enabled
--rootSquashGID65534GID to map root to when root squashing is enabled
--umask(none)Explicit umask override in octal notation (e.g., 0002)
--noExecfalseDisable execution of files
--noSUIDfalseDisable SUID/SGID special permissions

These flags can be set locally on the mount command line or centrally via volume flags and volume token flags in configure.flexfs.

FlexFS supports standard mount options that affect POSIX behavior:

FlagEffect
--roRead-only mount. All write operations return EROFS. Implies --noATime.
--noATimeDo not update access time on file opens.
--nonEmptyAllow mounting over a non-empty directory.
--atTime <RFC3339>Mount the filesystem at a historical point in time (read-only).
  • Byte-range lock granularity: POSIX byte-range locks are aligned to block boundaries. Sub-block locking (e.g., locking bytes 100-200 within a 4 MiB block) will effectively lock the entire block. This prevents data corruption but reduces parallelism for workloads that rely on fine-grained byte-range locking within the same block.

  • Close-to-open consistency: Data written by one mount client becomes visible to other mount clients after the writing client calls close() or fsync() and the reading client opens the file. In-progress writes that have not been flushed may not be immediately visible to other clients.

  • Attribute caching: File attributes are cached by the kernel for the duration specified by --attrValid (default: 1 hour) and --entryValid (default: 1 second). The metadata server pushes invalidation notifications when remote clients modify files, so cached entries are typically refreshed promptly. The TTL serves as a fallback for cases where an invalidation is not received.

  • Limited mmap support: Read-only memory mapping (mmap with PROT_READ) works through the kernel page cache. However, writable shared mappings (MAP_SHARED with PROT_WRITE) are not supported because flexFS does not enable FUSE writeback caching. Applications that require writable mmap can use standard read/write calls instead, or copy files to a local filesystem for memory-mapped access.

  • SEEK_SET/SEEK_CUR/SEEK_END handling: Standard seek operations (SEEK_SET, SEEK_CUR, SEEK_END) are handled by the kernel’s FUSE layer. FlexFS implements SEEK_DATA and SEEK_HOLE via the Lseek operation, which queries the metadata server for sparse file information.

  • Rename atomicity: Rename operations are atomic within the metadata server (the old and new entries are updated in a single transaction), but the associated block data is not moved — blocks remain in object storage under their original keys. This means rename is metadata-only and very fast, but the block key namespace reflects the original inode, not the file name.