Skip to content

Instantly share code, notes, and snippets.

@boris-42
Last active December 29, 2025 21:59
Show Gist options
  • Select an option

  • Save boris-42/febd4272b63f16eb21c32f6c92002de5 to your computer and use it in GitHub Desktop.

Select an option

Save boris-42/febd4272b63f16eb21c32f6c92002de5 to your computer and use it in GitHub Desktop.
This report documents the storage performance testing conducted on a K3s Kubernetes cluster to evaluate whether nodes **WITHOUT local NVMe storage** can effectively utilize Longhorn distributed storage via iSCSI protocol.

Longhorn Storage Performance Analysis Report

K3s Raspberry Pi Cluster

Date: December 29, 2025


1. Executive Summary

This report documents the storage performance testing conducted on a K3s Kubernetes cluster to evaluate whether nodes WITHOUT local NVMe storage can effectively utilize Longhorn distributed storage via iSCSI protocol.

KEY FINDING: iSCSI remote storage performs remarkably well, with sequential read performance within 4% of local NVMe storage. This validates the architectural decision to use diskless worker nodes with remote storage.


2. Environment Overview

Cluster Specifications

Component Details
Orchestration K3s v1.33.6+k3s1 (Lightweight Kubernetes)
Architecture ARM64 (aarch64)
Storage Backend Longhorn v1.7.2 (Cloud-Native Distributed)
Network 1 Gbps LAN (VLAN 100, 10.100.0.0/24)
Node Count 10 Raspberry Pi nodes (9x Pi5 + 1x Pi4)

Cluster Topology

┌─────────────────────────────────────────────────────────────────────────────┐
│                         K3s CLUSTER (v1.33.6+k3s1)                          │
│                        ARM64 / Raspberry Pi - 10 NODES                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                     CONTROL PLANE (HA - 2 nodes)                     │   │
│  │  ┌─────────────────────┐      ┌─────────────────────┐               │   │
│  │  │     rasp-ci-1       │      │     rasp-ci-2       │               │   │
│  │  │   10.100.0.11       │      │   10.100.0.12       │               │   │
│  │  │   Pi5 + 1TB NVMe    │      │   Pi5 + 1TB NVMe    │               │   │
│  │  │   etcd, API, ctrl   │      │   etcd, API, ctrl   │               │   │
│  │  └─────────────────────┘      └─────────────────────┘               │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      WORKER NODES (8 nodes)                         │   │
│  │                                                                     │   │
│  │  NVMe WORKERS:                                                      │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐                         │   │
│  │  │ rasp-ci-3 │ │ rasp-ci-4 │ │ rasp-ci-5 │                         │   │
│  │  │ 10.100.   │ │ 10.100.   │ │ 10.100.   │                         │   │
│  │  │ 0.13      │ │ 0.14      │ │ 0.15      │                         │   │
│  │  │ Pi5+NVMe  │ │ Pi5+NVMe  │ │ Pi5+NVMe  │  ◄── TESTED (NVMe)      │   │
│  │  │ 1TB       │ │ 1TB       │ │ 1TB       │                         │   │
│  │  └───────────┘ └───────────┘ └───────────┘                         │   │
│  │                                                                     │   │
│  │  SD-ONLY WORKERS (iSCSI Remote Storage):                           │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────┐│   │
│  │  │ rasp-ci-6 │ │ rasp-ci-7 │ │ rasp-ci-8 │ │ rasp-ci-9 │ │rasp-ci-││   │
│  │  │ 10.100.   │ │ 10.100.   │ │ 10.100.   │ │ 10.100.   │ │10      ││   │
│  │  │ 0.16      │ │ 0.17      │ │ 0.18      │ │ 0.19      │ │10.100. ││   │
│  │  │ Pi5+Coral │ │ Pi5       │ │ Pi5+Coral │ │ Pi5+Coral │ │0.20    ││   │
│  │  │ USB TPU   │ │ SD only   │ │ M.2 TPU   │ │ M.2 TPU   │ │Pi4     ││   │
│  │  └───────────┘ └───────────┘ └───────────┘ └───────────┘ └────────┘│   │
│  │                      ▲               ▲                              │   │
│  │                      └─── TESTED (iSCSI) ───┘                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Storage Configuration Details

Node Role Hardware Data Storage Longhorn Role
rasp-ci-1 Control Plane Raspberry Pi 5 1TB NVMe Replica Host
rasp-ci-2 Control Plane Raspberry Pi 5 1TB NVMe Replica Host
rasp-ci-3 Worker Raspberry Pi 5 1TB NVMe Replica Host
rasp-ci-4 Worker Raspberry Pi 5 1TB NVMe Replica Host
rasp-ci-5 Worker Raspberry Pi 5 1TB NVMe Replica Host
rasp-ci-6 Worker Pi5 + Coral USB SD only iSCSI Client
rasp-ci-7 Worker Raspberry Pi 5 SD only iSCSI Client
rasp-ci-8 Worker Pi5 + Coral M.2 SD only iSCSI Client
rasp-ci-9 Worker Pi5 + Coral M.2 SD only iSCSI Client
rasp-ci-10 Worker Raspberry Pi 4 SD only iSCSI Client

Total NVMe Capacity: ~5 TB across 5 nodes

Network Configuration

Setting Value
VLAN ID 100
Subnet 10.100.0.0/24
Gateway 10.100.0.1
Node IPs 10.100.0.11-20

Additional Hardware: Google Coral TPU Accelerators

Three nodes have Google Coral Edge TPU accelerators for ML inference:

Node TPU Type Interface Device
rasp-ci-6 Coral USB Accelerator USB 3.0 libusb
rasp-ci-8 Coral M.2 Accelerator PCIe /dev/apex_0
rasp-ci-9 Coral M.2 Accelerator PCIe /dev/apex_0

Longhorn Storage Architecture

Longhorn is a cloud-native distributed block storage system that:

  • Creates replicated volumes across multiple nodes for redundancy
  • Exposes volumes to pods via iSCSI protocol
  • Provides automatic failover if a replica becomes unavailable

Two Access Patterns in Our Cluster:

Pattern A: Local NVMe Access (rasp-ci-1 to 5)

┌─────────┐      ┌─────────────────┐
│   Pod   │ ───► │  Local NVMe     │   Direct disk I/O
│         │      │  (same node)    │   Lowest latency
└─────────┘      └─────────────────┘

Pattern B: iSCSI Remote Access (rasp-ci-6 to 10)

┌─────────┐      ┌─────────┐      ┌─────────────────┐
│   Pod   │ ───► │  iSCSI  │ ───► │  Remote NVMe    │
│         │      │  over   │      │  (different     │
│         │      │  network│      │   node)         │
└─────────┘      └─────────┘      └─────────────────┘

Network traversal adds latency but enables diskless nodes

3. Pre-Requisite Work Completed

Before benchmarking, the following issues were identified and resolved:

Issue: Longhorn manager pods crashing on SD-only nodes

Root Cause:

  • Nodes rasp-ci-7, 8, 9, 10 were missing the open-iscsi package
  • Without iSCSI initiator, nodes could not mount remote Longhorn volumes
  • Longhorn manager pods were in CrashLoopBackOff state

Resolution:

  • Installed open-iscsi package on all 4 affected nodes
  • Enabled and started iscsid systemd service
  • Configured Longhorn to set allowScheduling=false on SD-only nodes (prevents Longhorn from trying to store replicas on diskless nodes)

Result:

  • All 10 longhorn-manager pods now running successfully
  • SD-only nodes can mount iSCSI volumes from NVMe nodes

4. Test Methodology

Objective

Compare storage I/O performance between:

  • NVMe LOCAL: Pods on nodes with local NVMe (rasp-ci-3, rasp-ci-4)
  • iSCSI REMOTE: Pods on diskless nodes using iSCSI (rasp-ci-7, rasp-ci-8)

This determines if workloads can be scheduled on SD-only nodes without significant performance degradation.

Test Tool: FIO (Flexible I/O Tester)

FIO is an industry-standard storage benchmarking tool that simulates various I/O workloads with configurable parameters.

Parameters Used:

Parameter Value Description
--ioengine libaio Linux native async I/O
--direct 1 Bypass OS cache (O_DIRECT)
--runtime 10s Test duration per workload
--time_based yes Run for full duration
--numjobs 1 Single I/O thread
--size 256MB Test file size

Workloads Tested:

Workload Block Size Description
Sequential Write 1 MB Large file writes (backups, logs)
Sequential Read 1 MB Large file reads (media, exports)
Random Write 4 KB Database writes, transactions
Random Read 4 KB Database queries, app data

Test Execution Method

Each test node was benchmarked by:

  1. Creating a Longhorn PersistentVolumeClaim (PVC) with node affinity
  2. Deploying a pod with Alpine Linux + FIO on the target node
  3. Running all 4 FIO workloads sequentially
  4. Collecting results
  5. Cleaning up (deleting pod and PVC)

Two Test Rounds Were Conducted:

  • TEST 1 (Parallel): All 4 nodes tested simultaneously
  • TEST 2 (Sequential): Nodes tested one-at-a-time for isolation

5. Test 1 - Parallel Execution

Test Description

All 4 test nodes (rasp-ci-3, 4, 7, 8) ran FIO benchmarks at the same time. This simulates a real-world scenario with concurrent storage workloads.

Potential Impact: Network contention and shared resource competition may affect iSCSI nodes more than local NVMe nodes.

Test 1 Results

Sequential Write (1M blocks)

rasp-ci-3  [NVMe]   ██████████████████████████████████████████ 25.9 MB/s  ★
rasp-ci-4  [NVMe]   ██████████████████████████████████        21.2 MB/s
rasp-ci-7  [iSCSI]  ████████████████████████████████████████  23.4 MB/s
rasp-ci-8  [iSCSI]  ██████████████████████████████████████    22.6 MB/s
                    0        5       10       15       20       25  MB/s

Sequential Read (1M blocks)

rasp-ci-3  [NVMe]   █████████████████████████████              39.6 MB/s
rasp-ci-4  [NVMe]   █████████████████████████████████████      49.7 MB/s
rasp-ci-7  [iSCSI]  ██████████████████████████████████████████ 53.6 MB/s  ★
rasp-ci-8  [iSCSI]  █████████████████████████████████████████  52.5 MB/s
                    0       10       20       30       40       50  MB/s

Random Write (4K blocks)

rasp-ci-3  [NVMe]   ████████████████████████████████████████    950 IOPS
rasp-ci-4  [NVMe]   ████████████████████████████████████        856 IOPS
rasp-ci-7  [iSCSI]  ██████████████████████████████████████████ 1003 IOPS
rasp-ci-8  [iSCSI]  ███████████████████████████████████████████ 1048 IOPS  ★
                    0      200     400     600     800    1000  IOPS

Random Read (4K blocks)

rasp-ci-3  [NVMe]   ███████████████████████████████████████████ 1364 IOPS  ★
rasp-ci-4  [NVMe]   █████████████████████████████████████████  1281 IOPS
rasp-ci-7  [iSCSI]  ████████████████████████████████████████   1252 IOPS
rasp-ci-8  [iSCSI]  █████████████████████████████████████      1153 IOPS
                    0      300     600     900    1200   1400  IOPS

★ = Highest in category

Test 1 Analysis

SURPRISING FINDING: iSCSI nodes performed BETTER than NVMe in 2 of 4 tests!

Possible explanations:

  • NVMe nodes competing for local disk during parallel test
  • iSCSI spreading I/O across multiple remote NVMe disks
  • Longhorn's intelligent replica distribution
  • Network not being the bottleneck at these throughput levels

NOTE: Parallel testing may not show true isolated performance. Sequential testing conducted to verify.


6. Test 2 - Sequential (Isolated) Execution

Test Description

Each node tested individually with no other storage I/O in the cluster. This provides clean, isolated measurements without resource contention.

Execution Order: rasp-ci-3 → rasp-ci-4 → rasp-ci-7 → rasp-ci-8

Each node completed all 4 workloads before the next node started.

Test 2 Results

Sequential Write (1M blocks)

rasp-ci-3  [NVMe]   ████████████████████████████████████████ 38.9 MB/s
rasp-ci-4  [NVMe]   █████████████████████████████████████   36.1 MB/s
rasp-ci-7  [iSCSI]  █████████████████████████               24.9 MB/s
rasp-ci-8  [iSCSI]  ████████████████████████████            28.1 MB/s
                    0       10       20       30       40   MB/s

Sequential Read (1M blocks)

rasp-ci-3  [NVMe]   ██████████████████████████████████████████ 78.7 MB/s
rasp-ci-4  [NVMe]   █████████████████████████████████         63.6 MB/s
rasp-ci-7  [iSCSI]  █████████████████████████████████████     69.0 MB/s
rasp-ci-8  [iSCSI]  ████████████████████████████████████      68.0 MB/s
                    0       20       40       60       80   MB/s

Random Write (4K blocks)

rasp-ci-3  [NVMe]   ████████████████████████████████████████   1354 IOPS
rasp-ci-4  [NVMe]   ██████████████████████████████████████████ 1421 IOPS
rasp-ci-7  [iSCSI]  █████████████████████████████████████      1278 IOPS
rasp-ci-8  [iSCSI]  █████████████████████████████████          1103 IOPS
                    0      400      800     1200    1600   IOPS

Random Read (4K blocks)

rasp-ci-3  [NVMe]   █████████████████████████████████████████████ 1794 IOPS
rasp-ci-4  [NVMe]   ███████████████████████████████████████     1594 IOPS
rasp-ci-7  [iSCSI]  ██████████████████████████████████████      1582 IOPS
rasp-ci-8  [iSCSI]  ████████████████████████████████████        1463 IOPS
                    0      500     1000    1500    2000   IOPS

Test 2 Detailed Metrics

rasp-ci-3 [NVMe] rasp-ci-4 [NVMe] rasp-ci-7 [iSCSI] rasp-ci-8 [iSCSI]
Seq Write (MB/s) 38.9 36.1 24.9 28.1
Seq Read (MB/s) 78.7 63.6 69.0 68.0
Rand Write (IOPS) 1354 1421 1278 1103
Rand Read (IOPS) 1794 1594 1582 1463

7. Comparative Analysis

Aggregate Comparison: NVMe LOCAL vs iSCSI REMOTE

Based on ISOLATED (Sequential) Test Results - Averaged per storage type:

Metric NVMe AVG iSCSI AVG Difference
Seq Write 37.5 MB/s 26.5 MB/s NVMe +41% faster
Seq Read 71.2 MB/s 68.5 MB/s NVMe +4% faster
Rand Write 1388 IOPS 1191 IOPS NVMe +17% faster
Rand Read 1694 IOPS 1523 IOPS NVMe +11% faster

Visual Comparison (iSCSI Efficiency vs NVMe)

Sequential Write:  NVMe ████████████████████████████████████████ 100%
                   iSCSI ████████████████████████████            71%

Sequential Read:   NVMe ████████████████████████████████████████ 100%
                   iSCSI ██████████████████████████████████████  96%  ◄ EXCELLENT!

Random Write:      NVMe ████████████████████████████████████████ 100%
                   iSCSI ██████████████████████████████████      86%

Random Read:       NVMe ████████████████████████████████████████ 100%
                   iSCSI ████████████████████████████████████    90%

Test 1 vs Test 2 Comparison

How did results change between parallel and isolated testing?

PARALLEL (Test 1) ISOLATED (Test 2) Change
NVMe Seq Write 23.6 MB/s 37.5 MB/s +59%
iSCSI Seq Write 23.0 MB/s 26.5 MB/s +15%
NVMe Seq Read 44.7 MB/s 71.2 MB/s +59%
iSCSI Seq Read 53.1 MB/s 68.5 MB/s +29%

KEY INSIGHT: NVMe nodes showed much larger performance gains (+59%) when tested in isolation, suggesting they were bottlenecked during parallel testing (possibly disk contention from Longhorn replication activity).


8. Critical Outcomes & Recommendations

Critical Finding #1: iSCSI READ Performance is Excellent

Sequential read on iSCSI is only 4% slower than local NVMe.

Impact: Read-heavy workloads (web servers, content delivery, databases with read replicas) can run on SD-only nodes with minimal performance penalty.

Recommendation: Schedule read-heavy pods on any available node.


Critical Finding #2: Write Performance Gap is Significant but Acceptable

Sequential write on iSCSI is 41% slower than local NVMe. (26.5 MB/s vs 37.5 MB/s)

Context: 26.5 MB/s is still very usable for most workloads:

  • Sufficient for database transaction logs
  • Adequate for application logging
  • Acceptable for container image pulls

Recommendation: For write-intensive workloads (databases, CI/CD build caches), prefer scheduling on NVMe nodes using node affinity.


Critical Finding #3: Random I/O Performance is Nearly Equivalent

Random read/write IOPS on iSCSI are within 11-17% of local NVMe.

Impact: Database workloads with random access patterns will perform acceptably on iSCSI nodes.

Recommendation: Don't avoid iSCSI nodes for database workloads; the performance difference is smaller than typically expected.


Critical Finding #4: Network is NOT the Bottleneck

  • 1 Gbps network theoretical max: ~125 MB/s
  • Observed iSCSI throughput: 26-69 MB/s

The network has significant headroom. Current bottleneck is likely:

  • iSCSI protocol overhead
  • Longhorn replication overhead
  • Raspberry Pi CPU/memory limitations

Recommendation: Current 1 Gbps network is sufficient. No immediate need to upgrade to 2.5 Gbps or 10 Gbps networking.


Critical Finding #5: Cluster Capacity Effectively Doubled

  • Before this work: Only 5 nodes (rasp-ci-1 to 5) could run storage workloads
  • After this work: All 10 nodes can run storage workloads

Impact: 100% increase in schedulable capacity for stateful workloads without purchasing additional NVMe drives.

Cost Savings: Avoided purchasing 5 additional NVMe drives (~$250-500 USD depending on capacity)


9. Summary

Performance Summary Table

Workload NVMe iSCSI iSCSI Efficiency
Seq Write 37.5 MB/s 26.5 MB/s 71%
Seq Read 71.2 MB/s 68.5 MB/s 96%
Rand Write 1388 IOPS 1191 IOPS 86%
Rand Read 1694 IOPS 1523 IOPS 90%

★ = Exceeds typical expectations for network storage

Conclusion

The Longhorn iSCSI storage configuration is PRODUCTION READY.

SD-only nodes (rasp-ci-6 through rasp-ci-10) can effectively utilize remote NVMe storage with:

  • 96% read efficiency (excellent for most workloads)
  • 71% write efficiency (acceptable for most workloads)
  • 86-90% random I/O efficiency (good for databases)

This architecture successfully enables a heterogeneous cluster where diskless nodes contribute compute capacity while leveraging centralized storage from NVMe-equipped nodes.

Recommended Workload Placement

Workload Type Recommended Node Placement
Web servers Any node (read-heavy)
API services Any node (read-heavy)
Read replicas Any node
CI/CD runners Prefer NVMe (write-heavy)
Database primary Prefer NVMe (write-heavy)
Log aggregation Prefer NVMe (write-heavy)
Stateless workloads Any node

Appendix A: Environment Prerequisites Checklist

For teams replicating this setup, ensure:

  • K3s installed with Longhorn storage driver
  • open-iscsi package installed on ALL nodes (apt install open-iscsi)
  • iscsid service enabled and running (systemctl enable --now iscsid)
  • Longhorn nodes configured with allowScheduling=false for diskless nodes
  • Network connectivity between all nodes (1 Gbps minimum recommended)
  • Node labels applied: storage=nvme or storage=sd for scheduling

Appendix B: Software Versions

Component Version Component Version
K3s v1.33.6 Longhorn v1.7.2
containerd 2.1.5 MetalLB v0.14.9
Traefik 3.5.1 cert-manager v1.16.2
CoreDNS 1.13.1 libedgetpu 16.0

Appendix C: Node Labels for Scheduling

All nodes have storage labels for workload scheduling:

# For NVMe nodes (rasp-ci-1 to 5)
nodeSelector:
  storage: nvme

# For SD-only nodes (rasp-ci-6 to 10)
nodeSelector:
  storage: sd

Critical: Use nodeSelector: { storage: nvme } for any workload requiring Longhorn persistent storage to ensure pods are scheduled on nodes that can host replicas OR have proper iSCSI connectivity.


End of Report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment