Skip to main content

FAQs

Who should be using flexFS?

Anyone who wants to share a file system across multiple servers in the cloud would benefit from using flexFS. It is a general-purpose network file system. That said, flexFS becomes especially compelling when you have large files or when aggregate demand on the file system is high. Such demand typically appears when there are many users reading and writing file data from the file system at the same time, or when map-reduce/scatter-gather style distributed computing workflows are being executed. When aggregate demand is high, flexFS can help save considerable time and associated computing costs.

Is flexFS production ready?

Yes, flexFS is being used to store and analyze thousands of TBs of mission-critical data at multiple organizations.

How does flexFS differ from open-source project X/Y/Z, which is also backed by object storage?

In most cases, the answer is that most open source projects which are backed by object storage are not particularly performant or reliable. Very few even support the full POSIX standard, including important features such as ACLs, extended ACLs, xAttrs, and advisory file locking. The result is that many of your tools will simply not work with them. Certain I/O patterns (e.g. non-sequential writes) are also often not supported. And speaking of support, most don't have any commercial entity backing them. You are typically on your own when you choose to use those projects. FlexFS will work with all of your tools because it supports the full POSIX file system standard, along with some non-POSIX features, and is fully supported by Paradigm4, an ISO-27001 certified software vendor that has been in business over a decade, and which provides mission-critical products and services to some of the world's premier life science organizations.

When flexFS is using object storage as its backend, can I access my files directly via the object storage API?

Technically yes, but not really. FlexFS does not represent file data in object storage as the whole files and paths observed via the POSIX interface. Instead, files are subdivided into storage-optimized blocks, which are persisted under keys which comprise invariant inode ids and block indicies rather than potentially mutable human-readable paths. This design choice was intentional, and made for both compatibility and performance reasons. When using flexFS, you should consider the fact that it can use object storage as its file block storage backend as an implementation detail, and not as an alternative way of accessing file data.

How can I access data already stored in my object storage bucket though flexFS?

At present, flexFS has no support for accessing existing data in object storage. To access file data through flexFS, you must first copy/sync the file data into a flexFS mount.

Why should I use flexFS instead of EFS?

Despite having the word "elastic" in its name, EFS is not especially elastic when it comes to aggregate throughput. Under heavy demand, EFS can service on the order of 20 mounts before its per-mount performance starts to steadily degrade. Furthermore, if your data is actively accessed, EFS can be quite expensive relative to flexFS. In sort, flexFS is more elastic, often cheaper than EFS, and has a more predictable cost basis.

Why should I use flexFS instead of FSx for Lustre?

While potentially performant, FSx for Lustre has a variety of drawbacks. For starters, it is not elastic at all. Increasing capacity is a manual procedure and can involve downtime. And it is basically not possible to reduce capacity, except to zero. You also pay for provisioned capacity rather than what you actually use. That means an already high sticker price is effectively higher due to the need to overprovision capacity. Furthermore, aggregate throughput is proportional to provisioned capacity. So, when you need high throughput to support high aggregate demand on the file system, you may need to overprovision capacity by orders of magnitude simply to get the required throughput. With flexFS, you always get the maximum throughput a host can support regardless of the amount of data stored. Plus you only pay for what you actually use. Like with EFS, there is no concept of provisioning capacity with flexFS.

Why should I use flexFS instead of S3?

There are many reasons why you might want to use flexFS over using S3 directly. Access controls in S3 are blunt and don't provide nearly the fideltity that Linux ACLs and extended ACLs can provide. And don't forget that it is impossible to rearrage the path structure of objects in S3 without performing slow, potentially error-prone copy operations. In fact, any metadata operations are much faster in flexFS than in S3. Plus, flexFS supports all POSIX metadata utilities (e.g. find, tree, etc.), which are incredibly powerful and useful. Furthermore, if the objects being stored in S3 are files that you need to use POSIX utilities to analyze or manipulate, then you typically need to first download the object to local storage, run your utilities on it, and then potentially upload results or a modified file back to S3. This process can be very costly and carries operational overhead. More information can be found here.

Can flexFS be run in a Docker container?

Yes, as long as the container is running on a Linux host, and is privileged or has otherwise been given access to the /dev/fuse device on the underlying host.

Does flexFS support Kubernetes?

Yes, flexFS has a native CSI volume driver for Kubernetes.

Is flexFS a FUSE file system?

Yes, flexFS is a FUSE-based file system.

Aren't FUSE file systems slow?

In terms of latency, FUSE-based file systems are typically slower than file systems which use native kernel modules. However, for network file systems, round-trip time and efficiency of the metadata service are the real dominant factors. In practice, flexFS often outperforms NFS implementations on latency-sensitive operations, even though the NFS client is implemented as a kernel module. In terms of large file sequential throughput, flexFS is often faster than native file systems (e.g. EXT4) running against block devices (e.g. EBS) as well as alternative network file systems (e.g. EFS, FSx for Lustre). In short, FUSE-based network file systems can be very competitive against network file systems with native kernel modules.

What is the advantage of using FUSE?

FUSE allows for file system implementations to be built in user-space against a relatively stable API. File systems built using FUSE are very easy to install, and work seamlessly across various kernel versions without recompilation. They can also potentially be installed and run without root user privileges.

How is flexFS billed?

When using flexFS, you are simply billed a fixed GB-month rate for the amount of file data you store in flexFS. Costs are metered and calculated on an hourly basis. This single-variate approach to billing makes cost prediction and estimation trivial. More information can be found here.

Why is copying many small files into flexFS slow?

This is a well-known problem with many store systems called the small files problem. But, for flexFS, which persists file data across a network and typically into high-latency object storage systems, the impact is significant. However, you can mitigate the problem by paralellizing copies rather than using a simple cp command, which performs copies sequentially. This can be done by using a parallel sync utility such as fpsync or manually using a tool such as GNU parallel.