This report documents the storage performance testing conducted on a K3s Kubernetes cluster to evaluate whether nodes WITHOUT local NVMe storage can effectively utilize Longhorn distributed storage via iSCSI protocol.
KEY FINDING: iSCSI remote storage performs remarkably well, with sequential read performance within 4% of local NVMe storage. This validates the architectural decision to use diskless worker nodes with remote storage.
| Component | Details |
|---|---|
| Orchestration | K3s v1.33.6+k3s1 (Lightweight Kubernetes) |
| Architecture | ARM64 (aarch64) |
| Storage Backend | Longhorn v1.7.2 (Cloud-Native Distributed) |
| Network | 1 Gbps LAN (VLAN 100, 10.100.0.0/24) |
| Node Count | 10 Raspberry Pi nodes (9x Pi5 + 1x Pi4) |
┌─────────────────────────────────────────────────────────────────────────────┐
│ K3s CLUSTER (v1.33.6+k3s1) │
│ ARM64 / Raspberry Pi - 10 NODES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CONTROL PLANE (HA - 2 nodes) │ │
│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │
│ │ │ rasp-ci-1 │ │ rasp-ci-2 │ │ │
│ │ │ 10.100.0.11 │ │ 10.100.0.12 │ │ │
│ │ │ Pi5 + 1TB NVMe │ │ Pi5 + 1TB NVMe │ │ │
│ │ │ etcd, API, ctrl │ │ etcd, API, ctrl │ │ │
│ │ └─────────────────────┘ └─────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ WORKER NODES (8 nodes) │ │
│ │ │ │
│ │ NVMe WORKERS: │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ rasp-ci-3 │ │ rasp-ci-4 │ │ rasp-ci-5 │ │ │
│ │ │ 10.100. │ │ 10.100. │ │ 10.100. │ │ │
│ │ │ 0.13 │ │ 0.14 │ │ 0.15 │ │ │
│ │ │ Pi5+NVMe │ │ Pi5+NVMe │ │ Pi5+NVMe │ ◄── TESTED (NVMe) │ │
│ │ │ 1TB │ │ 1TB │ │ 1TB │ │ │
│ │ └───────────┘ └───────────┘ └───────────┘ │ │
│ │ │ │
│ │ SD-ONLY WORKERS (iSCSI Remote Storage): │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────┐│ │
│ │ │ rasp-ci-6 │ │ rasp-ci-7 │ │ rasp-ci-8 │ │ rasp-ci-9 │ │rasp-ci-││ │
│ │ │ 10.100. │ │ 10.100. │ │ 10.100. │ │ 10.100. │ │10 ││ │
│ │ │ 0.16 │ │ 0.17 │ │ 0.18 │ │ 0.19 │ │10.100. ││ │
│ │ │ Pi5+Coral │ │ Pi5 │ │ Pi5+Coral │ │ Pi5+Coral │ │0.20 ││ │
│ │ │ USB TPU │ │ SD only │ │ M.2 TPU │ │ M.2 TPU │ │Pi4 ││ │
│ │ └───────────┘ └───────────┘ └───────────┘ └───────────┘ └────────┘│ │
│ │ ▲ ▲ │ │
│ │ └─── TESTED (iSCSI) ───┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Node | Role | Hardware | Data Storage | Longhorn Role |
|---|---|---|---|---|
| rasp-ci-1 | Control Plane | Raspberry Pi 5 | 1TB NVMe | Replica Host |
| rasp-ci-2 | Control Plane | Raspberry Pi 5 | 1TB NVMe | Replica Host |
| rasp-ci-3 | Worker | Raspberry Pi 5 | 1TB NVMe | Replica Host |
| rasp-ci-4 | Worker | Raspberry Pi 5 | 1TB NVMe | Replica Host |
| rasp-ci-5 | Worker | Raspberry Pi 5 | 1TB NVMe | Replica Host |
| rasp-ci-6 | Worker | Pi5 + Coral USB | SD only | iSCSI Client |
| rasp-ci-7 | Worker | Raspberry Pi 5 | SD only | iSCSI Client |
| rasp-ci-8 | Worker | Pi5 + Coral M.2 | SD only | iSCSI Client |
| rasp-ci-9 | Worker | Pi5 + Coral M.2 | SD only | iSCSI Client |
| rasp-ci-10 | Worker | Raspberry Pi 4 | SD only | iSCSI Client |
Total NVMe Capacity: ~5 TB across 5 nodes
| Setting | Value |
|---|---|
| VLAN ID | 100 |
| Subnet | 10.100.0.0/24 |
| Gateway | 10.100.0.1 |
| Node IPs | 10.100.0.11-20 |
Three nodes have Google Coral Edge TPU accelerators for ML inference:
| Node | TPU Type | Interface | Device |
|---|---|---|---|
| rasp-ci-6 | Coral USB Accelerator | USB 3.0 | libusb |
| rasp-ci-8 | Coral M.2 Accelerator | PCIe | /dev/apex_0 |
| rasp-ci-9 | Coral M.2 Accelerator | PCIe | /dev/apex_0 |
Longhorn is a cloud-native distributed block storage system that:
- Creates replicated volumes across multiple nodes for redundancy
- Exposes volumes to pods via iSCSI protocol
- Provides automatic failover if a replica becomes unavailable
Two Access Patterns in Our Cluster:
┌─────────┐ ┌─────────────────┐
│ Pod │ ───► │ Local NVMe │ Direct disk I/O
│ │ │ (same node) │ Lowest latency
└─────────┘ └─────────────────┘
┌─────────┐ ┌─────────┐ ┌─────────────────┐
│ Pod │ ───► │ iSCSI │ ───► │ Remote NVMe │
│ │ │ over │ │ (different │
│ │ │ network│ │ node) │
└─────────┘ └─────────┘ └─────────────────┘
Network traversal adds latency but enables diskless nodes
Before benchmarking, the following issues were identified and resolved:
Root Cause:
- Nodes rasp-ci-7, 8, 9, 10 were missing the
open-iscsipackage - Without iSCSI initiator, nodes could not mount remote Longhorn volumes
- Longhorn manager pods were in CrashLoopBackOff state
Resolution:
- Installed
open-iscsipackage on all 4 affected nodes - Enabled and started
iscsidsystemd service - Configured Longhorn to set
allowScheduling=falseon SD-only nodes (prevents Longhorn from trying to store replicas on diskless nodes)
Result:
- All 10 longhorn-manager pods now running successfully
- SD-only nodes can mount iSCSI volumes from NVMe nodes
Compare storage I/O performance between:
- NVMe LOCAL: Pods on nodes with local NVMe (rasp-ci-3, rasp-ci-4)
- iSCSI REMOTE: Pods on diskless nodes using iSCSI (rasp-ci-7, rasp-ci-8)
This determines if workloads can be scheduled on SD-only nodes without significant performance degradation.
FIO is an industry-standard storage benchmarking tool that simulates various I/O workloads with configurable parameters.
Parameters Used:
| Parameter | Value | Description |
|---|---|---|
| --ioengine | libaio | Linux native async I/O |
| --direct | 1 | Bypass OS cache (O_DIRECT) |
| --runtime | 10s | Test duration per workload |
| --time_based | yes | Run for full duration |
| --numjobs | 1 | Single I/O thread |
| --size | 256MB | Test file size |
Workloads Tested:
| Workload | Block Size | Description |
|---|---|---|
| Sequential Write | 1 MB | Large file writes (backups, logs) |
| Sequential Read | 1 MB | Large file reads (media, exports) |
| Random Write | 4 KB | Database writes, transactions |
| Random Read | 4 KB | Database queries, app data |
Each test node was benchmarked by:
- Creating a Longhorn PersistentVolumeClaim (PVC) with node affinity
- Deploying a pod with Alpine Linux + FIO on the target node
- Running all 4 FIO workloads sequentially
- Collecting results
- Cleaning up (deleting pod and PVC)
Two Test Rounds Were Conducted:
- TEST 1 (Parallel): All 4 nodes tested simultaneously
- TEST 2 (Sequential): Nodes tested one-at-a-time for isolation
All 4 test nodes (rasp-ci-3, 4, 7, 8) ran FIO benchmarks at the same time. This simulates a real-world scenario with concurrent storage workloads.
Potential Impact: Network contention and shared resource competition may affect iSCSI nodes more than local NVMe nodes.
rasp-ci-3 [NVMe] ██████████████████████████████████████████ 25.9 MB/s ★
rasp-ci-4 [NVMe] ██████████████████████████████████ 21.2 MB/s
rasp-ci-7 [iSCSI] ████████████████████████████████████████ 23.4 MB/s
rasp-ci-8 [iSCSI] ██████████████████████████████████████ 22.6 MB/s
0 5 10 15 20 25 MB/s
rasp-ci-3 [NVMe] █████████████████████████████ 39.6 MB/s
rasp-ci-4 [NVMe] █████████████████████████████████████ 49.7 MB/s
rasp-ci-7 [iSCSI] ██████████████████████████████████████████ 53.6 MB/s ★
rasp-ci-8 [iSCSI] █████████████████████████████████████████ 52.5 MB/s
0 10 20 30 40 50 MB/s
rasp-ci-3 [NVMe] ████████████████████████████████████████ 950 IOPS
rasp-ci-4 [NVMe] ████████████████████████████████████ 856 IOPS
rasp-ci-7 [iSCSI] ██████████████████████████████████████████ 1003 IOPS
rasp-ci-8 [iSCSI] ███████████████████████████████████████████ 1048 IOPS ★
0 200 400 600 800 1000 IOPS
rasp-ci-3 [NVMe] ███████████████████████████████████████████ 1364 IOPS ★
rasp-ci-4 [NVMe] █████████████████████████████████████████ 1281 IOPS
rasp-ci-7 [iSCSI] ████████████████████████████████████████ 1252 IOPS
rasp-ci-8 [iSCSI] █████████████████████████████████████ 1153 IOPS
0 300 600 900 1200 1400 IOPS
★ = Highest in category
SURPRISING FINDING: iSCSI nodes performed BETTER than NVMe in 2 of 4 tests!
Possible explanations:
- NVMe nodes competing for local disk during parallel test
- iSCSI spreading I/O across multiple remote NVMe disks
- Longhorn's intelligent replica distribution
- Network not being the bottleneck at these throughput levels
NOTE: Parallel testing may not show true isolated performance. Sequential testing conducted to verify.
Each node tested individually with no other storage I/O in the cluster. This provides clean, isolated measurements without resource contention.
Execution Order: rasp-ci-3 → rasp-ci-4 → rasp-ci-7 → rasp-ci-8
Each node completed all 4 workloads before the next node started.
rasp-ci-3 [NVMe] ████████████████████████████████████████ 38.9 MB/s
rasp-ci-4 [NVMe] █████████████████████████████████████ 36.1 MB/s
rasp-ci-7 [iSCSI] █████████████████████████ 24.9 MB/s
rasp-ci-8 [iSCSI] ████████████████████████████ 28.1 MB/s
0 10 20 30 40 MB/s
rasp-ci-3 [NVMe] ██████████████████████████████████████████ 78.7 MB/s
rasp-ci-4 [NVMe] █████████████████████████████████ 63.6 MB/s
rasp-ci-7 [iSCSI] █████████████████████████████████████ 69.0 MB/s
rasp-ci-8 [iSCSI] ████████████████████████████████████ 68.0 MB/s
0 20 40 60 80 MB/s
rasp-ci-3 [NVMe] ████████████████████████████████████████ 1354 IOPS
rasp-ci-4 [NVMe] ██████████████████████████████████████████ 1421 IOPS
rasp-ci-7 [iSCSI] █████████████████████████████████████ 1278 IOPS
rasp-ci-8 [iSCSI] █████████████████████████████████ 1103 IOPS
0 400 800 1200 1600 IOPS
rasp-ci-3 [NVMe] █████████████████████████████████████████████ 1794 IOPS
rasp-ci-4 [NVMe] ███████████████████████████████████████ 1594 IOPS
rasp-ci-7 [iSCSI] ██████████████████████████████████████ 1582 IOPS
rasp-ci-8 [iSCSI] ████████████████████████████████████ 1463 IOPS
0 500 1000 1500 2000 IOPS
| rasp-ci-3 [NVMe] | rasp-ci-4 [NVMe] | rasp-ci-7 [iSCSI] | rasp-ci-8 [iSCSI] | |
|---|---|---|---|---|
| Seq Write (MB/s) | 38.9 | 36.1 | 24.9 | 28.1 |
| Seq Read (MB/s) | 78.7 | 63.6 | 69.0 | 68.0 |
| Rand Write (IOPS) | 1354 | 1421 | 1278 | 1103 |
| Rand Read (IOPS) | 1794 | 1594 | 1582 | 1463 |
Based on ISOLATED (Sequential) Test Results - Averaged per storage type:
| Metric | NVMe AVG | iSCSI AVG | Difference |
|---|---|---|---|
| Seq Write | 37.5 MB/s | 26.5 MB/s | NVMe +41% faster |
| Seq Read | 71.2 MB/s | 68.5 MB/s | NVMe +4% faster |
| Rand Write | 1388 IOPS | 1191 IOPS | NVMe +17% faster |
| Rand Read | 1694 IOPS | 1523 IOPS | NVMe +11% faster |
Sequential Write: NVMe ████████████████████████████████████████ 100%
iSCSI ████████████████████████████ 71%
Sequential Read: NVMe ████████████████████████████████████████ 100%
iSCSI ██████████████████████████████████████ 96% ◄ EXCELLENT!
Random Write: NVMe ████████████████████████████████████████ 100%
iSCSI ██████████████████████████████████ 86%
Random Read: NVMe ████████████████████████████████████████ 100%
iSCSI ████████████████████████████████████ 90%
How did results change between parallel and isolated testing?
| PARALLEL (Test 1) | ISOLATED (Test 2) | Change | |
|---|---|---|---|
| NVMe Seq Write | 23.6 MB/s | 37.5 MB/s | +59% |
| iSCSI Seq Write | 23.0 MB/s | 26.5 MB/s | +15% |
| NVMe Seq Read | 44.7 MB/s | 71.2 MB/s | +59% |
| iSCSI Seq Read | 53.1 MB/s | 68.5 MB/s | +29% |
KEY INSIGHT: NVMe nodes showed much larger performance gains (+59%) when tested in isolation, suggesting they were bottlenecked during parallel testing (possibly disk contention from Longhorn replication activity).
Sequential read on iSCSI is only 4% slower than local NVMe.
Impact: Read-heavy workloads (web servers, content delivery, databases with read replicas) can run on SD-only nodes with minimal performance penalty.
Recommendation: Schedule read-heavy pods on any available node.
Sequential write on iSCSI is 41% slower than local NVMe. (26.5 MB/s vs 37.5 MB/s)
Context: 26.5 MB/s is still very usable for most workloads:
- Sufficient for database transaction logs
- Adequate for application logging
- Acceptable for container image pulls
Recommendation: For write-intensive workloads (databases, CI/CD build caches), prefer scheduling on NVMe nodes using node affinity.
Random read/write IOPS on iSCSI are within 11-17% of local NVMe.
Impact: Database workloads with random access patterns will perform acceptably on iSCSI nodes.
Recommendation: Don't avoid iSCSI nodes for database workloads; the performance difference is smaller than typically expected.
- 1 Gbps network theoretical max: ~125 MB/s
- Observed iSCSI throughput: 26-69 MB/s
The network has significant headroom. Current bottleneck is likely:
- iSCSI protocol overhead
- Longhorn replication overhead
- Raspberry Pi CPU/memory limitations
Recommendation: Current 1 Gbps network is sufficient. No immediate need to upgrade to 2.5 Gbps or 10 Gbps networking.
- Before this work: Only 5 nodes (rasp-ci-1 to 5) could run storage workloads
- After this work: All 10 nodes can run storage workloads
Impact: 100% increase in schedulable capacity for stateful workloads without purchasing additional NVMe drives.
Cost Savings: Avoided purchasing 5 additional NVMe drives (~$250-500 USD depending on capacity)
| Workload | NVMe | iSCSI | iSCSI Efficiency |
|---|---|---|---|
| Seq Write | 37.5 MB/s | 26.5 MB/s | 71% |
| Seq Read | 71.2 MB/s | 68.5 MB/s | 96% ★ |
| Rand Write | 1388 IOPS | 1191 IOPS | 86% |
| Rand Read | 1694 IOPS | 1523 IOPS | 90% |
★ = Exceeds typical expectations for network storage
The Longhorn iSCSI storage configuration is PRODUCTION READY.
SD-only nodes (rasp-ci-6 through rasp-ci-10) can effectively utilize remote NVMe storage with:
- 96% read efficiency (excellent for most workloads)
- 71% write efficiency (acceptable for most workloads)
- 86-90% random I/O efficiency (good for databases)
This architecture successfully enables a heterogeneous cluster where diskless nodes contribute compute capacity while leveraging centralized storage from NVMe-equipped nodes.
| Workload Type | Recommended Node Placement |
|---|---|
| Web servers | Any node (read-heavy) |
| API services | Any node (read-heavy) |
| Read replicas | Any node |
| CI/CD runners | Prefer NVMe (write-heavy) |
| Database primary | Prefer NVMe (write-heavy) |
| Log aggregation | Prefer NVMe (write-heavy) |
| Stateless workloads | Any node |
For teams replicating this setup, ensure:
- K3s installed with Longhorn storage driver
-
open-iscsipackage installed on ALL nodes (apt install open-iscsi) -
iscsidservice enabled and running (systemctl enable --now iscsid) - Longhorn nodes configured with
allowScheduling=falsefor diskless nodes - Network connectivity between all nodes (1 Gbps minimum recommended)
- Node labels applied:
storage=nvmeorstorage=sdfor scheduling
| Component | Version | Component | Version |
|---|---|---|---|
| K3s | v1.33.6 | Longhorn | v1.7.2 |
| containerd | 2.1.5 | MetalLB | v0.14.9 |
| Traefik | 3.5.1 | cert-manager | v1.16.2 |
| CoreDNS | 1.13.1 | libedgetpu | 16.0 |
All nodes have storage labels for workload scheduling:
# For NVMe nodes (rasp-ci-1 to 5)
nodeSelector:
storage: nvme
# For SD-only nodes (rasp-ci-6 to 10)
nodeSelector:
storage: sdCritical: Use nodeSelector: { storage: nvme } for any workload requiring Longhorn persistent storage to ensure pods are scheduled on nodes that can host replicas OR have proper iSCSI connectivity.
End of Report