Grafana Dashboards

This page covers how to set up Grafana to visualize flexFS metrics collected by Prometheus.

Prerequisites

A running Prometheus instance scraping flexFS metrics
Grafana 9.0 or later

Adding the data source

Navigate to Configuration > Data Sources in Grafana.
Click Add data source and select Prometheus.
Set the URL to your Prometheus server (e.g., http://prometheus:9090).
Click Save & Test to verify the connection.

Key panels

The following panel definitions cover the most important flexFS metrics. You can combine them into a single dashboard or create separate dashboards for the metadata server and proxy server.

Metadata server panels

RPC operations per second

rate(flexfs_meta_rpc_ops_total[5m])

Display as a time series, grouped by method. This shows the throughput of metadata operations (Lookup, GetAttr, SetAttr, Create, Mkdir, etc.).

RPC latency (p99)

histogram_quantile(0.99, rate(flexfs_meta_rpc_duration_seconds_bucket[5m]))

Display as a time series, grouped by method. Alerts should trigger if p99 exceeds your SLA threshold.

Active sessions per volume

flexfs_meta_sessions

Display as a stat panel or time series grouped by volume_id. Shows how many mount clients are connected to each volume.

Volume size

flexfs_meta_volume_size_bytes

Display as a time series or bar gauge grouped by volume_id. Use the flexfs_meta_volume_info metric to map volume IDs to human-readable names:

flexfs_meta_volume_size_bytes * on(volume_id) group_left(volume_name) flexfs_meta_volume_info

Inode and block counts

flexfs_meta_volume_inodes
flexfs_meta_volume_blocks
flexfs_meta_volume_dentries

Display as stat panels or time series grouped by volume_id.

Volume I/O throughput

rate(flexfs_meta_volume_bytes_read_total[5m])
rate(flexfs_meta_volume_bytes_written_total[5m])

Display as a time series with read and write as separate series.

Database disk usage

flexfs_meta_db_disk_usage_bytes

Display as a time series grouped by volume_id. Compare against flexfs_meta_db_folder_disk_capacity_bytes to calculate utilization:

sum(flexfs_meta_db_disk_usage_bytes) / flexfs_meta_db_folder_disk_capacity_bytes

Proxy server panels

REST operations per second

rate(flexfs_proxy_rest_ops_total[5m])

Display as a time series, grouped by method.

REST latency (p99)

histogram_quantile(0.99, rate(flexfs_proxy_rest_duration_seconds_bucket[5m]))

Cache hit rate

Approximate cache effectiveness by comparing read operations to cache block counts over time. A stable or growing flexfs_proxy_cache_clean_blocks alongside read traffic indicates good cache utilization.

Cache utilization

flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytes

Compare against the disk quota:

(flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytes) / flexfs_proxy_cache_disk_quota_bytes

Dirty block writeback queue

flexfs_proxy_cache_dirty_blocks
flexfs_proxy_cache_dirty_bytes

Display as a time series. A growing dirty queue may indicate writeback is not keeping up with write load.

Disk capacity

flexfs_proxy_cache_disk_capacity_bytes
flexfs_proxy_cache_disk_quota_bytes

Display as a bar gauge showing quota usage against total disk capacity.

Sample dashboard JSON

Below is a minimal dashboard definition with core panels. Customize panel sizes and positions to suit your layout.

{
  "dashboard": {
    "title": "flexFS Overview",
    "tags": ["flexfs"],
    "timezone": "browser",
    "panels": [
      {
        "title": "Meta RPC ops/s",
        "type": "timeseries",
        "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
        "targets": [
          {
            "expr": "sum(rate(flexfs_meta_rpc_ops_total[5m])) by (method)",
            "legendFormat": "{{ method }}"
          }
        ]
      },
      {
        "title": "Meta RPC Latency p99",
        "type": "timeseries",
        "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
        "targets": [
          {
            "expr": "histogram_quantile(0.99, sum(rate(flexfs_meta_rpc_duration_seconds_bucket[5m])) by (le, method))",
            "legendFormat": "{{ method }}"
          }
        ]
      },
      {
        "title": "Active Sessions",
        "type": "stat",
        "gridPos": { "h": 4, "w": 6, "x": 0, "y": 8 },
        "targets": [
          {
            "expr": "sum(flexfs_meta_sessions)",
            "legendFormat": "Total"
          }
        ]
      },
      {
        "title": "Volume Size",
        "type": "bargauge",
        "gridPos": { "h": 8, "w": 6, "x": 6, "y": 8 },
        "targets": [
          {
            "expr": "flexfs_meta_volume_size_bytes * on(volume_id) group_left(volume_name) flexfs_meta_volume_info",
            "legendFormat": "{{ volume_name }}"
          }
        ],
        "fieldConfig": { "defaults": { "unit": "bytes" } }
      },
      {
        "title": "Proxy REST ops/s",
        "type": "timeseries",
        "gridPos": { "h": 8, "w": 12, "x": 12, "y": 8 },
        "targets": [
          {
            "expr": "sum(rate(flexfs_proxy_rest_ops_total[5m])) by (method)",
            "legendFormat": "{{ method }}"
          }
        ]
      },
      {
        "title": "Proxy Cache Utilization",
        "type": "gauge",
        "gridPos": { "h": 4, "w": 6, "x": 0, "y": 16 },
        "targets": [
          {
            "expr": "(flexfs_proxy_cache_clean_bytes + flexfs_proxy_cache_dirty_bytes) / flexfs_proxy_cache_disk_quota_bytes",
            "legendFormat": "utilization"
          }
        ],
        "fieldConfig": { "defaults": { "unit": "percentunit", "max": 1 } }
      },
      {
        "title": "Proxy Dirty Blocks",
        "type": "timeseries",
        "gridPos": { "h": 8, "w": 6, "x": 6, "y": 16 },
        "targets": [
          {
            "expr": "flexfs_proxy_cache_dirty_blocks",
            "legendFormat": "dirty blocks"
          }
        ]
      }
    ],
    "schemaVersion": 39,
    "version": 1
  }
}

Customization tips

Variable for volume: Add a Grafana variable with the query label_values(flexfs_meta_volume_info, volume_name) to filter panels by volume.
Variable for proxy instance: Use label_values(flexfs_proxy_rest_ops_total, instance) to select specific proxy servers.
Annotations: Add annotations from flexfs_meta_rpc_ops_total to mark deployment events or configuration changes.
Thresholds: Set color thresholds on latency panels (green < 10ms, yellow < 100ms, red > 100ms) to quickly spot performance issues.

Next steps

Metrics reference — full catalog of all metrics
Alerting — Prometheus alert rules
Logging and diagnostics — mount client logs and profiling