Grafana Dashboards
This page covers how to set up Grafana to visualize flexFS metrics collected by Prometheus.
Prerequisites
Section titled “Prerequisites”- A running Prometheus instance scraping flexFS metrics
- Grafana 9.0 or later
Adding the data source
Section titled “Adding the data source”- Navigate to Configuration > Data Sources in Grafana.
- Click Add data source and select Prometheus.
- Set the URL to your Prometheus server (e.g.,
http://prometheus:9090). - Click Save & Test to verify the connection.
Key panels
Section titled “Key panels”The following panel definitions cover the most important flexFS metrics. You can combine them into a single dashboard or create separate dashboards for the metadata server and proxy server.
Metadata server panels
Section titled “Metadata server panels”RPC operations per second
Section titled “RPC operations per second”rate(flexfs_meta_rpc_ops_total[5m])Display as a time series, grouped by method. This shows the throughput of metadata operations (Lookup, GetAttr, SetAttr, Create, Mkdir, etc.).
RPC latency (p99)
Section titled “RPC latency (p99)”histogram_quantile(0.99, rate(flexfs_meta_rpc_duration_seconds_bucket[5m]))Display as a time series, grouped by method. Alerts should trigger if p99 exceeds your SLA threshold.
Active sessions per volume
Section titled “Active sessions per volume”flexfs_meta_sessionsDisplay as a stat panel or time series grouped by volume_id. Shows how many mount clients are connected to each volume.
Volume size
Section titled “Volume size”flexfs_meta_volume_size_bytesDisplay as a time series or bar gauge grouped by volume_id. Use the flexfs_meta_volume_info metric to map volume IDs to human-readable names:
flexfs_meta_volume_size_bytes * on(volume_id) group_left(volume_name) flexfs_meta_volume_infoInode and block counts
Section titled “Inode and block counts”flexfs_meta_volume_inodesflexfs_meta_volume_blocksflexfs_meta_volume_dentriesDisplay as stat panels or time series grouped by volume_id.
Volume I/O throughput
Section titled “Volume I/O throughput”rate(flexfs_meta_volume_bytes_read_total[5m])rate(flexfs_meta_volume_bytes_written_total[5m])Display as a time series with read and write as separate series.
Database disk usage
Section titled “Database disk usage”flexfs_meta_db_disk_usage_bytesDisplay as a time series grouped by volume_id. Compare against flexfs_meta_db_folder_disk_capacity_bytes to calculate utilization:
sum(flexfs_meta_db_disk_usage_bytes) / flexfs_meta_db_folder_disk_capacity_bytesProxy server panels
Section titled “Proxy server panels”REST operations per second
Section titled “REST operations per second”rate(flexfs_proxy_rest_ops_total[5m])Display as a time series, grouped by method.
REST latency (p99)
Section titled “REST latency (p99)”histogram_quantile(0.99, rate(flexfs_proxy_rest_duration_seconds_bucket[5m]))Cache hit rate
Section titled “Cache hit rate”Approximate cache effectiveness by comparing read operations to cache block counts over time. A stable or growing flexfs_proxy_cache_clean_blocks alongside read traffic indicates good cache utilization.
Cache utilization
Section titled “Cache utilization”flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytesCompare against the disk quota:
(flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytes) / flexfs_proxy_cache_disk_quota_bytesDirty block writeback queue
Section titled “Dirty block writeback queue”flexfs_proxy_cache_dirty_blocksflexfs_proxy_cache_dirty_bytesDisplay as a time series. A growing dirty queue may indicate writeback is not keeping up with write load.
Disk capacity
Section titled “Disk capacity”flexfs_proxy_cache_disk_capacity_bytesflexfs_proxy_cache_disk_quota_bytesDisplay as a bar gauge showing quota usage against total disk capacity.
Sample dashboard JSON
Section titled “Sample dashboard JSON”Below is a minimal dashboard definition with core panels. Customize panel sizes and positions to suit your layout.
{ "dashboard": { "title": "flexFS Overview", "tags": ["flexfs"], "timezone": "browser", "panels": [ { "title": "Meta RPC ops/s", "type": "timeseries", "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }, "targets": [ { "expr": "sum(rate(flexfs_meta_rpc_ops_total[5m])) by (method)", "legendFormat": "{{ method }}" } ] }, { "title": "Meta RPC Latency p99", "type": "timeseries", "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 }, "targets": [ { "expr": "histogram_quantile(0.99, sum(rate(flexfs_meta_rpc_duration_seconds_bucket[5m])) by (le, method))", "legendFormat": "{{ method }}" } ] }, { "title": "Active Sessions", "type": "stat", "gridPos": { "h": 4, "w": 6, "x": 0, "y": 8 }, "targets": [ { "expr": "sum(flexfs_meta_sessions)", "legendFormat": "Total" } ] }, { "title": "Volume Size", "type": "bargauge", "gridPos": { "h": 8, "w": 6, "x": 6, "y": 8 }, "targets": [ { "expr": "flexfs_meta_volume_size_bytes * on(volume_id) group_left(volume_name) flexfs_meta_volume_info", "legendFormat": "{{ volume_name }}" } ], "fieldConfig": { "defaults": { "unit": "bytes" } } }, { "title": "Proxy REST ops/s", "type": "timeseries", "gridPos": { "h": 8, "w": 12, "x": 12, "y": 8 }, "targets": [ { "expr": "sum(rate(flexfs_proxy_rest_ops_total[5m])) by (method)", "legendFormat": "{{ method }}" } ] }, { "title": "Proxy Cache Utilization", "type": "gauge", "gridPos": { "h": 4, "w": 6, "x": 0, "y": 16 }, "targets": [ { "expr": "(flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytes) / flexfs_proxy_cache_disk_quota_bytes", "legendFormat": "utilization" } ], "fieldConfig": { "defaults": { "unit": "percentunit", "max": 1 } } }, { "title": "Proxy Dirty Blocks", "type": "timeseries", "gridPos": { "h": 8, "w": 6, "x": 6, "y": 16 }, "targets": [ { "expr": "flexfs_proxy_cache_dirty_blocks", "legendFormat": "dirty blocks" } ] } ], "schemaVersion": 39, "version": 1 }}Customization tips
Section titled “Customization tips”- Variable for volume: Add a Grafana variable with the query
label_values(flexfs_meta_volume_info, volume_name)to filter panels by volume. - Variable for proxy instance: Use
label_values(flexfs_proxy_rest_ops_total, instance)to select specific proxy servers. - Annotations: Add annotations from
flexfs_meta_rpc_ops_totalto mark deployment events or configuration changes. - Thresholds: Set color thresholds on latency panels (green < 10ms, yellow < 100ms, red > 100ms) to quickly spot performance issues.
Next steps
Section titled “Next steps”- Metrics reference — full catalog of all metrics
- Alerting — Prometheus alert rules
- Logging and diagnostics — mount client logs and profiling