POSIX Compliance
FlexFS implements a comprehensive set of POSIX filesystem operations through Linux FUSE (Filesystem in Userspace). Applications interact with flexFS exactly as they would with a local filesystem — no code changes or special APIs are required.
Supported FUSE operations
Section titled “Supported FUSE operations”The mount client (mount.flexfs) implements the following FUSE operations:
Namespace operations
Section titled “Namespace operations”| Operation | FUSE Op | Description |
|---|---|---|
| Lookup | FUSE_LOOKUP | Resolve a file name within a directory to its inode and attributes |
| Create | FUSE_CREATE | Create and open a new regular file |
| MkDir | FUSE_MKDIR | Create a directory |
| MkNod | FUSE_MKNOD | Create a filesystem node (regular file, device special file, FIFO) |
| Symlink | FUSE_SYMLINK | Create a symbolic link |
| Link | FUSE_LINK | Create a hard link to an existing inode |
| Unlink | FUSE_UNLINK | Remove a file |
| RmDir | FUSE_RMDIR | Remove a directory |
| Rename | FUSE_RENAME / FUSE_RENAME2 | Rename or move a file or directory |
File data operations
Section titled “File data operations”| Operation | FUSE Op | Description |
|---|---|---|
| Open | FUSE_OPEN | Open a file |
| Read | FUSE_READ | Read data from an open file |
| Write | FUSE_WRITE | Write data to an open file |
| Flush | FUSE_FLUSH | Flush file data (called on each close()) |
| Fsync | FUSE_FSYNC | Synchronize file data to storage |
| Release | FUSE_RELEASE | Close a file handle |
| Fallocate | FUSE_FALLOCATE | Preallocate or deallocate space for a file |
| Lseek | FUSE_LSEEK | Seek for data or holes (SEEK_DATA, SEEK_HOLE) |
Directory operations
Section titled “Directory operations”| Operation | FUSE Op | Description |
|---|---|---|
| OpenDir | FUSE_OPENDIR | Open a directory for reading |
| ReadDir | FUSE_READDIR | Read directory entries |
| ReadDirPlus | FUSE_READDIRPLUS | Read directory entries with pre-fetched attributes |
| ReleaseDir | FUSE_RELEASEDIR | Close a directory handle |
Attribute operations
Section titled “Attribute operations”| Operation | FUSE Op | Description |
|---|---|---|
| GetAttr | FUSE_GETATTR | Get file attributes (stat) |
| SetAttr | FUSE_SETATTR | Set file attributes (chmod, chown, truncate, utimes) |
| Access | FUSE_ACCESS | Check file access permissions |
| StatFs | FUSE_STATFS | Get filesystem statistics (df) |
| ReadLink | FUSE_READLINK | Read the target of a symbolic link |
Extended attribute operations
Section titled “Extended attribute operations”| Operation | FUSE Op | Description |
|---|---|---|
| GetXAttr | FUSE_GETXATTR | Get an extended attribute value |
| SetXAttr | FUSE_SETXATTR | Set an extended attribute |
| ListXAttr | FUSE_LISTXATTR | List all extended attribute names |
| RemoveXAttr | FUSE_REMOVEXATTR | Remove an extended attribute |
Locking operations
Section titled “Locking operations”| Operation | FUSE Op | Description |
|---|---|---|
| GetLk | FUSE_GETLK | Test whether a lock could be placed |
| SetLk | FUSE_SETLK | Acquire or release a lock (non-blocking) |
| SetLkW | FUSE_SETLKW | Acquire or release a lock (blocking, with retry) |
Lifecycle operations
Section titled “Lifecycle operations”| Operation | FUSE Op | Description |
|---|---|---|
| Init | FUSE_INIT | Initialize the FUSE session and negotiate capabilities |
| Destroy | FUSE_DESTROY | Tear down the FUSE session |
| Forget | FUSE_FORGET | Release a cached inode reference |
| BatchForget | FUSE_BATCH_FORGET | Release multiple cached inode references |
File locking
Section titled “File locking”FlexFS supports both POSIX (fcntl) and BSD (flock) file locking semantics. Locks are coordinated through the metadata server, making them effective across all mount clients for a volume.
POSIX locks (fcntl)
Section titled “POSIX locks (fcntl)”POSIX-style byte-range locks are supported via fcntl(F_SETLK), fcntl(F_SETLKW), and fcntl(F_GETLK). These support shared (read) and exclusive (write) locks on arbitrary byte ranges.
Block alignment: Because flexFS cannot guarantee atomic operations within a single block across concurrent mounts, POSIX byte-range locks are aligned to block boundaries. A lock on bytes 100-200 of a file with a 4 MiB block size will effectively lock the entire first block (bytes 0 through 4,194,303). This is a best-effort approach that trades strict byte-range precision for correctness in the distributed case.
BSD locks (flock)
Section titled “BSD locks (flock)”BSD-style whole-file locks are supported via flock(). These are implemented as full-range POSIX locks under the hood, with the lock owner identified by the file handle rather than by the fcntl owner field.
Blocking locks
Section titled “Blocking locks”SetLkW (the blocking variant of SetLk) retries the lock acquisition in a polling loop with a 250 ms interval until the lock is granted or an error occurs.
Cross-mount coordination
Section titled “Cross-mount coordination”All lock state is stored on the metadata server. When mount client A holds a lock, mount client B will see the conflict via GetLk and will block (or receive EAGAIN) on SetLk. Locks are released when the file handle is closed or explicitly unlocked.
Hard links
Section titled “Hard links”FlexFS fully supports hard links via the Link operation. Multiple directory entries can reference the same inode, and the file’s link count (nlink) is maintained by the metadata server. The file’s data blocks are shared across all links; deleting a link decrements the link count, and the data blocks are only freed when the link count reaches zero and no file handles remain open.
Hard link semantics are consistent across mount clients — creating a hard link on one mount client is immediately visible to all other clients.
Symbolic links
Section titled “Symbolic links”Symbolic links are created via the Symlink operation and read via ReadLink. The link target is stored as a metadata field on the inode. When encryption is enabled, the link target is encrypted using AES-256-GCM with a deterministic nonce (SHAKE-256 hash of the target string).
Special files
Section titled “Special files”The MkNod operation supports creating:
- Regular files
- FIFO (named pipe) nodes
- Character and block device special files
The device major/minor numbers and file mode are preserved in the inode metadata.
Extended attributes (xattrs)
Section titled “Extended attributes (xattrs)”Extended attributes are stored as key-value pairs on each inode in the metadata server. They are enabled with the --xAttr flag (or implicitly by --acl or --rootSquash).
| Operation | Behavior |
|---|---|
getxattr | Returns the value for a named attribute |
setxattr | Sets or replaces an attribute. Supports XATTR_CREATE (fail if exists) and XATTR_REPLACE (fail if absent) flags |
listxattr | Lists all attribute names on an inode |
removexattr | Removes a named attribute |
When encryption is enabled, both attribute names and values are encrypted. Names use deterministic encryption (for server-side matching); values use random nonces.
Access control lists (ACLs)
Section titled “Access control lists (ACLs)”FlexFS supports POSIX extended ACLs (also known as POSIX.1e draft ACLs), which are stored as extended attributes (system.posix_acl_access and system.posix_acl_default). ACLs are enabled with the --acl flag.
The ACL implementation includes:
- Standard POSIX permission checks (user, group, other) for all operations.
- Extended ACL evaluation when ACL xattrs are present, supporting named user and group entries.
- SUID/SGID handling: SUID/SGID bits influence the effective user and group for permission checks.
- Sticky bit: The sticky bit on directories restricts deletion to the file owner, directory owner, or root.
- Root squashing: When
--rootSquashis enabled, operations by uid 0 / gid 0 are remapped to a configurable uid/gid (default 65534/65534, i.e.,nobody). Root squashing implies ACL support.
ACL-related mount options
Section titled “ACL-related mount options”| Flag | Default | Description |
|---|---|---|
--acl | false | Enable extended ACL support (implies --xAttr) |
--rootSquash | false | Enable root squashing (implies --acl) |
--rootSquashUID | 65534 | UID to map root to when root squashing is enabled |
--rootSquashGID | 65534 | GID to map root to when root squashing is enabled |
--umask | (none) | Explicit umask override in octal notation (e.g., 0002) |
--noExec | false | Disable execution of files |
--noSUID | false | Disable SUID/SGID special permissions |
These flags can be set locally on the mount command line or centrally via volume flags and volume token flags in configure.flexfs.
Mount options
Section titled “Mount options”FlexFS supports standard mount options that affect POSIX behavior:
| Flag | Effect |
|---|---|
--ro | Read-only mount. All write operations return EROFS. Implies --noATime. |
--noATime | Do not update access time on file opens. |
--nonEmpty | Allow mounting over a non-empty directory. |
--atTime <RFC3339> | Mount the filesystem at a historical point in time (read-only). |
Known limitations
Section titled “Known limitations”-
Byte-range lock granularity: POSIX byte-range locks are aligned to block boundaries. Sub-block locking (e.g., locking bytes 100-200 within a 4 MiB block) will effectively lock the entire block. This prevents data corruption but reduces parallelism for workloads that rely on fine-grained byte-range locking within the same block.
-
Close-to-open consistency: Data written by one mount client becomes visible to other mount clients after the writing client calls
close()orfsync()and the reading client opens the file. In-progress writes that have not been flushed may not be immediately visible to other clients. -
Attribute caching: File attributes are cached by the kernel for the duration specified by
--attrValid(default: 1 hour) and--entryValid(default: 1 second). The metadata server pushes invalidation notifications when remote clients modify files, so cached entries are typically refreshed promptly. The TTL serves as a fallback for cases where an invalidation is not received. -
Limited
mmapsupport: Read-only memory mapping (mmapwithPROT_READ) works through the kernel page cache. However, writable shared mappings (MAP_SHAREDwithPROT_WRITE) are not supported because flexFS does not enable FUSE writeback caching. Applications that require writable mmap can use standard read/write calls instead, or copy files to a local filesystem for memory-mapped access. -
SEEK_SET/SEEK_CUR/SEEK_ENDhandling: Standard seek operations (SEEK_SET,SEEK_CUR,SEEK_END) are handled by the kernel’s FUSE layer. FlexFS implementsSEEK_DATAandSEEK_HOLEvia theLseekoperation, which queries the metadata server for sparse file information. -
Rename atomicity:
Renameoperations are atomic within the metadata server (the old and new entries are updated in a single transaction), but the associated block data is not moved — blocks remain in object storage under their original keys. This means rename is metadata-only and very fast, but the block key namespace reflects the original inode, not the file name.