Skip to content

Grafana Dashboards

This page covers how to set up Grafana to visualize flexFS metrics collected by Prometheus.

  1. Navigate to Configuration > Data Sources in Grafana.
  2. Click Add data source and select Prometheus.
  3. Set the URL to your Prometheus server (e.g., http://prometheus:9090).
  4. Click Save & Test to verify the connection.

The following panel definitions cover the most important flexFS metrics. You can combine them into a single dashboard or create separate dashboards for the metadata server and proxy server.

rate(flexfs_meta_rpc_ops_total[5m])

Display as a time series, grouped by method. This shows the throughput of metadata operations (Lookup, GetAttr, SetAttr, Create, Mkdir, etc.).

histogram_quantile(0.99, rate(flexfs_meta_rpc_duration_seconds_bucket[5m]))

Display as a time series, grouped by method. Alerts should trigger if p99 exceeds your SLA threshold.

flexfs_meta_sessions

Display as a stat panel or time series grouped by volume_id. Shows how many mount clients are connected to each volume.

flexfs_meta_volume_size_bytes

Display as a time series or bar gauge grouped by volume_id. Use the flexfs_meta_volume_info metric to map volume IDs to human-readable names:

flexfs_meta_volume_size_bytes * on(volume_id) group_left(volume_name) flexfs_meta_volume_info
flexfs_meta_volume_inodes
flexfs_meta_volume_blocks
flexfs_meta_volume_dentries

Display as stat panels or time series grouped by volume_id.

rate(flexfs_meta_volume_bytes_read_total[5m])
rate(flexfs_meta_volume_bytes_written_total[5m])

Display as a time series with read and write as separate series.

flexfs_meta_db_disk_usage_bytes

Display as a time series grouped by volume_id. Compare against flexfs_meta_db_folder_disk_capacity_bytes to calculate utilization:

sum(flexfs_meta_db_disk_usage_bytes) / flexfs_meta_db_folder_disk_capacity_bytes
rate(flexfs_proxy_rest_ops_total[5m])

Display as a time series, grouped by method.

histogram_quantile(0.99, rate(flexfs_proxy_rest_duration_seconds_bucket[5m]))

Approximate cache effectiveness by comparing read operations to cache block counts over time. A stable or growing flexfs_proxy_cache_clean_blocks alongside read traffic indicates good cache utilization.

flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytes

Compare against the disk quota:

(flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytes) / flexfs_proxy_cache_disk_quota_bytes
flexfs_proxy_cache_dirty_blocks
flexfs_proxy_cache_dirty_bytes

Display as a time series. A growing dirty queue may indicate writeback is not keeping up with write load.

flexfs_proxy_cache_disk_capacity_bytes
flexfs_proxy_cache_disk_quota_bytes

Display as a bar gauge showing quota usage against total disk capacity.

Below is a minimal dashboard definition with core panels. Customize panel sizes and positions to suit your layout.

{
"dashboard": {
"title": "flexFS Overview",
"tags": ["flexfs"],
"timezone": "browser",
"panels": [
{
"title": "Meta RPC ops/s",
"type": "timeseries",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
"targets": [
{
"expr": "sum(rate(flexfs_meta_rpc_ops_total[5m])) by (method)",
"legendFormat": "{{ method }}"
}
]
},
{
"title": "Meta RPC Latency p99",
"type": "timeseries",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(flexfs_meta_rpc_duration_seconds_bucket[5m])) by (le, method))",
"legendFormat": "{{ method }}"
}
]
},
{
"title": "Active Sessions",
"type": "stat",
"gridPos": { "h": 4, "w": 6, "x": 0, "y": 8 },
"targets": [
{
"expr": "sum(flexfs_meta_sessions)",
"legendFormat": "Total"
}
]
},
{
"title": "Volume Size",
"type": "bargauge",
"gridPos": { "h": 8, "w": 6, "x": 6, "y": 8 },
"targets": [
{
"expr": "flexfs_meta_volume_size_bytes * on(volume_id) group_left(volume_name) flexfs_meta_volume_info",
"legendFormat": "{{ volume_name }}"
}
],
"fieldConfig": { "defaults": { "unit": "bytes" } }
},
{
"title": "Proxy REST ops/s",
"type": "timeseries",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 8 },
"targets": [
{
"expr": "sum(rate(flexfs_proxy_rest_ops_total[5m])) by (method)",
"legendFormat": "{{ method }}"
}
]
},
{
"title": "Proxy Cache Utilization",
"type": "gauge",
"gridPos": { "h": 4, "w": 6, "x": 0, "y": 16 },
"targets": [
{
"expr": "(flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytes) / flexfs_proxy_cache_disk_quota_bytes",
"legendFormat": "utilization"
}
],
"fieldConfig": { "defaults": { "unit": "percentunit", "max": 1 } }
},
{
"title": "Proxy Dirty Blocks",
"type": "timeseries",
"gridPos": { "h": 8, "w": 6, "x": 6, "y": 16 },
"targets": [
{
"expr": "flexfs_proxy_cache_dirty_blocks",
"legendFormat": "dirty blocks"
}
]
}
],
"schemaVersion": 39,
"version": 1
}
}
  • Variable for volume: Add a Grafana variable with the query label_values(flexfs_meta_volume_info, volume_name) to filter panels by volume.
  • Variable for proxy instance: Use label_values(flexfs_proxy_rest_ops_total, instance) to select specific proxy servers.
  • Annotations: Add annotations from flexfs_meta_rpc_ops_total to mark deployment events or configuration changes.
  • Thresholds: Set color thresholds on latency panels (green < 10ms, yellow < 100ms, red > 100ms) to quickly spot performance issues.