Longhorn Storage Performance Analysis Report

K3s Raspberry Pi Cluster

Date: December 29, 2025

1. Executive Summary

This report documents the storage performance testing conducted on a K3s Kubernetes cluster to evaluate whether nodes WITHOUT local NVMe storage can effectively utilize Longhorn distributed storage via iSCSI protocol.

KEY FINDING: iSCSI remote storage performs remarkably well, with sequential read performance within 4% of local NVMe storage. This validates the architectural decision to use diskless worker nodes with remote storage.

2. Environment Overview

Cluster Specifications

Component	Details
Orchestration	K3s v1.33.6+k3s1 (Lightweight Kubernetes)
Architecture	ARM64 (aarch64)
Storage Backend	Longhorn v1.7.2 (Cloud-Native Distributed)
Network	1 Gbps LAN (VLAN 100, 10.100.0.0/24)
Node Count	10 Raspberry Pi nodes (9x Pi5 + 1x Pi4)

Cluster Topology

┌─────────────────────────────────────────────────────────────────────────────┐
│                         K3s CLUSTER (v1.33.6+k3s1)                          │
│                        ARM64 / Raspberry Pi - 10 NODES                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                     CONTROL PLANE (HA - 2 nodes)                     │   │
│  │  ┌─────────────────────┐      ┌─────────────────────┐               │   │
│  │  │     rasp-ci-1       │      │     rasp-ci-2       │               │   │
│  │  │   10.100.0.11       │      │   10.100.0.12       │               │   │
│  │  │   Pi5 + 1TB NVMe    │      │   Pi5 + 1TB NVMe    │               │   │
│  │  │   etcd, API, ctrl   │      │   etcd, API, ctrl   │               │   │
│  │  └─────────────────────┘      └─────────────────────┘               │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      WORKER NODES (8 nodes)                         │   │
│  │                                                                     │   │
│  │  NVMe WORKERS:                                                      │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐                         │   │
│  │  │ rasp-ci-3 │ │ rasp-ci-4 │ │ rasp-ci-5 │                         │   │
│  │  │ 10.100.   │ │ 10.100.   │ │ 10.100.   │                         │   │
│  │  │ 0.13      │ │ 0.14      │ │ 0.15      │                         │   │
│  │  │ Pi5+NVMe  │ │ Pi5+NVMe  │ │ Pi5+NVMe  │  ◄── TESTED (NVMe)      │   │
│  │  │ 1TB       │ │ 1TB       │ │ 1TB       │                         │   │
│  │  └───────────┘ └───────────┘ └───────────┘                         │   │
│  │                                                                     │   │
│  │  SD-ONLY WORKERS (iSCSI Remote Storage):                           │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────┐│   │
│  │  │ rasp-ci-6 │ │ rasp-ci-7 │ │ rasp-ci-8 │ │ rasp-ci-9 │ │rasp-ci-││   │
│  │  │ 10.100.   │ │ 10.100.   │ │ 10.100.   │ │ 10.100.   │ │10      ││   │
│  │  │ 0.16      │ │ 0.17      │ │ 0.18      │ │ 0.19      │ │10.100. ││   │
│  │  │ Pi5+Coral │ │ Pi5       │ │ Pi5+Coral │ │ Pi5+Coral │ │0.20    ││   │
│  │  │ USB TPU   │ │ SD only   │ │ M.2 TPU   │ │ M.2 TPU   │ │Pi4     ││   │
│  │  └───────────┘ └───────────┘ └───────────┘ └───────────┘ └────────┘│   │
│  │                      ▲               ▲                              │   │
│  │                      └─── TESTED (iSCSI) ───┘                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Storage Configuration Details

Node	Role	Hardware	Data Storage	Longhorn Role
rasp-ci-1	Control Plane	Raspberry Pi 5	1TB NVMe	Replica Host
rasp-ci-2	Control Plane	Raspberry Pi 5	1TB NVMe	Replica Host
rasp-ci-3	Worker	Raspberry Pi 5	1TB NVMe	Replica Host
rasp-ci-4	Worker	Raspberry Pi 5	1TB NVMe	Replica Host
rasp-ci-5	Worker	Raspberry Pi 5	1TB NVMe	Replica Host
rasp-ci-6	Worker	Pi5 + Coral USB	SD only	iSCSI Client
rasp-ci-7	Worker	Raspberry Pi 5	SD only	iSCSI Client
rasp-ci-8	Worker	Pi5 + Coral M.2	SD only	iSCSI Client
rasp-ci-9	Worker	Pi5 + Coral M.2	SD only	iSCSI Client
rasp-ci-10	Worker	Raspberry Pi 4	SD only	iSCSI Client

Total NVMe Capacity: ~5 TB across 5 nodes

Network Configuration

Setting	Value
VLAN ID	100
Subnet	10.100.0.0/24
Gateway	10.100.0.1
Node IPs	10.100.0.11-20

Additional Hardware: Google Coral TPU Accelerators

Three nodes have Google Coral Edge TPU accelerators for ML inference:

Node	TPU Type	Interface	Device
rasp-ci-6	Coral USB Accelerator	USB 3.0	libusb
rasp-ci-8	Coral M.2 Accelerator	PCIe	/dev/apex_0
rasp-ci-9	Coral M.2 Accelerator	PCIe	/dev/apex_0

Longhorn Storage Architecture

Longhorn is a cloud-native distributed block storage system that:

Creates replicated volumes across multiple nodes for redundancy
Exposes volumes to pods via iSCSI protocol
Provides automatic failover if a replica becomes unavailable

Two Access Patterns in Our Cluster:

Pattern A: Local NVMe Access (rasp-ci-1 to 5)

┌─────────┐      ┌─────────────────┐
│   Pod   │ ───► │  Local NVMe     │   Direct disk I/O
│         │      │  (same node)    │   Lowest latency
└─────────┘      └─────────────────┘

Pattern B: iSCSI Remote Access (rasp-ci-6 to 10)

┌─────────┐      ┌─────────┐      ┌─────────────────┐
│   Pod   │ ───► │  iSCSI  │ ───► │  Remote NVMe    │
│         │      │  over   │      │  (different     │
│         │      │  network│      │   node)         │
└─────────┘      └─────────┘      └─────────────────┘

Network traversal adds latency but enables diskless nodes

3. Pre-Requisite Work Completed

Before benchmarking, the following issues were identified and resolved:

Issue: Longhorn manager pods crashing on SD-only nodes

Root Cause:

Nodes rasp-ci-7, 8, 9, 10 were missing the open-iscsi package
Without iSCSI initiator, nodes could not mount remote Longhorn volumes
Longhorn manager pods were in CrashLoopBackOff state

Resolution:

Installed open-iscsi package on all 4 affected nodes
Enabled and started iscsid systemd service
Configured Longhorn to set allowScheduling=false on SD-only nodes (prevents Longhorn from trying to store replicas on diskless nodes)

Result:

All 10 longhorn-manager pods now running successfully
SD-only nodes can mount iSCSI volumes from NVMe nodes

4. Test Methodology

Objective

Compare storage I/O performance between:

NVMe LOCAL: Pods on nodes with local NVMe (rasp-ci-3, rasp-ci-4)
iSCSI REMOTE: Pods on diskless nodes using iSCSI (rasp-ci-7, rasp-ci-8)

This determines if workloads can be scheduled on SD-only nodes without significant performance degradation.

Test Tool: FIO (Flexible I/O Tester)

FIO is an industry-standard storage benchmarking tool that simulates various I/O workloads with configurable parameters.

Parameters Used:

Parameter	Value	Description
--ioengine	libaio	Linux native async I/O
--direct	1	Bypass OS cache (O_DIRECT)
--runtime	10s	Test duration per workload
--time_based	yes	Run for full duration
--numjobs	1	Single I/O thread
--size	256MB	Test file size

Workloads Tested:

Workload	Block Size	Description
Sequential Write	1 MB	Large file writes (backups, logs)
Sequential Read	1 MB	Large file reads (media, exports)
Random Write	4 KB	Database writes, transactions
Random Read	4 KB	Database queries, app data

Test Execution Method

Each test node was benchmarked by:

Creating a Longhorn PersistentVolumeClaim (PVC) with node affinity
Deploying a pod with Alpine Linux + FIO on the target node
Running all 4 FIO workloads sequentially
Collecting results
Cleaning up (deleting pod and PVC)

Two Test Rounds Were Conducted:

TEST 1 (Parallel): All 4 nodes tested simultaneously
TEST 2 (Sequential): Nodes tested one-at-a-time for isolation

5. Test 1 - Parallel Execution

Test Description

All 4 test nodes (rasp-ci-3, 4, 7, 8) ran FIO benchmarks at the same time. This simulates a real-world scenario with concurrent storage workloads.

Potential Impact: Network contention and shared resource competition may affect iSCSI nodes more than local NVMe nodes.

Test 1 Results

Sequential Write (1M blocks)

rasp-ci-3  [NVMe]   ██████████████████████████████████████████ 25.9 MB/s  ★
rasp-ci-4  [NVMe]   ██████████████████████████████████        21.2 MB/s
rasp-ci-7  [iSCSI]  ████████████████████████████████████████  23.4 MB/s
rasp-ci-8  [iSCSI]  ██████████████████████████████████████    22.6 MB/s
                    0        5       10       15       20       25  MB/s

Sequential Read (1M blocks)

rasp-ci-3  [NVMe]   █████████████████████████████              39.6 MB/s
rasp-ci-4  [NVMe]   █████████████████████████████████████      49.7 MB/s
rasp-ci-7  [iSCSI]  ██████████████████████████████████████████ 53.6 MB/s  ★
rasp-ci-8  [iSCSI]  █████████████████████████████████████████  52.5 MB/s
                    0       10       20       30       40       50  MB/s

Random Write (4K blocks)

rasp-ci-3  [NVMe]   ████████████████████████████████████████    950 IOPS
rasp-ci-4  [NVMe]   ████████████████████████████████████        856 IOPS
rasp-ci-7  [iSCSI]  ██████████████████████████████████████████ 1003 IOPS
rasp-ci-8  [iSCSI]  ███████████████████████████████████████████ 1048 IOPS  ★
                    0      200     400     600     800    1000  IOPS

Random Read (4K blocks)

rasp-ci-3  [NVMe]   ███████████████████████████████████████████ 1364 IOPS  ★
rasp-ci-4  [NVMe]   █████████████████████████████████████████  1281 IOPS
rasp-ci-7  [iSCSI]  ████████████████████████████████████████   1252 IOPS
rasp-ci-8  [iSCSI]  █████████████████████████████████████      1153 IOPS
                    0      300     600     900    1200   1400  IOPS

★ = Highest in category

Test 1 Analysis

SURPRISING FINDING: iSCSI nodes performed BETTER than NVMe in 2 of 4 tests!

Possible explanations:

NVMe nodes competing for local disk during parallel test
iSCSI spreading I/O across multiple remote NVMe disks
Longhorn's intelligent replica distribution
Network not being the bottleneck at these throughput levels

NOTE: Parallel testing may not show true isolated performance. Sequential testing conducted to verify.

6. Test 2 - Sequential (Isolated) Execution

Test Description

Each node tested individually with no other storage I/O in the cluster. This provides clean, isolated measurements without resource contention.

Execution Order: rasp-ci-3 → rasp-ci-4 → rasp-ci-7 → rasp-ci-8

Each node completed all 4 workloads before the next node started.

Test 2 Results

Sequential Write (1M blocks)

rasp-ci-3  [NVMe]   ████████████████████████████████████████ 38.9 MB/s
rasp-ci-4  [NVMe]   █████████████████████████████████████   36.1 MB/s
rasp-ci-7  [iSCSI]  █████████████████████████               24.9 MB/s
rasp-ci-8  [iSCSI]  ████████████████████████████            28.1 MB/s
                    0       10       20       30       40   MB/s

Sequential Read (1M blocks)

rasp-ci-3  [NVMe]   ██████████████████████████████████████████ 78.7 MB/s
rasp-ci-4  [NVMe]   █████████████████████████████████         63.6 MB/s
rasp-ci-7  [iSCSI]  █████████████████████████████████████     69.0 MB/s
rasp-ci-8  [iSCSI]  ████████████████████████████████████      68.0 MB/s
                    0       20       40       60       80   MB/s

Random Write (4K blocks)

rasp-ci-3  [NVMe]   ████████████████████████████████████████   1354 IOPS
rasp-ci-4  [NVMe]   ██████████████████████████████████████████ 1421 IOPS
rasp-ci-7  [iSCSI]  █████████████████████████████████████      1278 IOPS
rasp-ci-8  [iSCSI]  █████████████████████████████████          1103 IOPS
                    0      400      800     1200    1600   IOPS

Random Read (4K blocks)

rasp-ci-3  [NVMe]   █████████████████████████████████████████████ 1794 IOPS
rasp-ci-4  [NVMe]   ███████████████████████████████████████     1594 IOPS
rasp-ci-7  [iSCSI]  ██████████████████████████████████████      1582 IOPS
rasp-ci-8  [iSCSI]  ████████████████████████████████████        1463 IOPS
                    0      500     1000    1500    2000   IOPS

Test 2 Detailed Metrics

	rasp-ci-3 [NVMe]	rasp-ci-4 [NVMe]	rasp-ci-7 [iSCSI]	rasp-ci-8 [iSCSI]
Seq Write (MB/s)	38.9	36.1	24.9	28.1
Seq Read (MB/s)	78.7	63.6	69.0	68.0
Rand Write (IOPS)	1354	1421	1278	1103
Rand Read (IOPS)	1794	1594	1582	1463

7. Comparative Analysis

Aggregate Comparison: NVMe LOCAL vs iSCSI REMOTE

Based on ISOLATED (Sequential) Test Results - Averaged per storage type:

Metric	NVMe AVG	iSCSI AVG	Difference
Seq Write	37.5 MB/s	26.5 MB/s	NVMe +41% faster
Seq Read	71.2 MB/s	68.5 MB/s	NVMe +4% faster
Rand Write	1388 IOPS	1191 IOPS	NVMe +17% faster
Rand Read	1694 IOPS	1523 IOPS	NVMe +11% faster

Visual Comparison (iSCSI Efficiency vs NVMe)

Sequential Write:  NVMe ████████████████████████████████████████ 100%
                   iSCSI ████████████████████████████            71%

Sequential Read:   NVMe ████████████████████████████████████████ 100%
                   iSCSI ██████████████████████████████████████  96%  ◄ EXCELLENT!

Random Write:      NVMe ████████████████████████████████████████ 100%
                   iSCSI ██████████████████████████████████      86%

Random Read:       NVMe ████████████████████████████████████████ 100%
                   iSCSI ████████████████████████████████████    90%

Test 1 vs Test 2 Comparison

How did results change between parallel and isolated testing?

	PARALLEL (Test 1)	ISOLATED (Test 2)	Change
NVMe Seq Write	23.6 MB/s	37.5 MB/s	+59%
iSCSI Seq Write	23.0 MB/s	26.5 MB/s	+15%
NVMe Seq Read	44.7 MB/s	71.2 MB/s	+59%
iSCSI Seq Read	53.1 MB/s	68.5 MB/s	+29%

KEY INSIGHT: NVMe nodes showed much larger performance gains (+59%) when tested in isolation, suggesting they were bottlenecked during parallel testing (possibly disk contention from Longhorn replication activity).

8. Critical Outcomes & Recommendations

Critical Finding #1: iSCSI READ Performance is Excellent

Sequential read on iSCSI is only 4% slower than local NVMe.

Impact: Read-heavy workloads (web servers, content delivery, databases with read replicas) can run on SD-only nodes with minimal performance penalty.

Recommendation: Schedule read-heavy pods on any available node.

Critical Finding #2: Write Performance Gap is Significant but Acceptable

Sequential write on iSCSI is 41% slower than local NVMe. (26.5 MB/s vs 37.5 MB/s)

Context: 26.5 MB/s is still very usable for most workloads:

Sufficient for database transaction logs
Adequate for application logging
Acceptable for container image pulls

Recommendation: For write-intensive workloads (databases, CI/CD build caches), prefer scheduling on NVMe nodes using node affinity.

Critical Finding #3: Random I/O Performance is Nearly Equivalent

Random read/write IOPS on iSCSI are within 11-17% of local NVMe.

Impact: Database workloads with random access patterns will perform acceptably on iSCSI nodes.

Recommendation: Don't avoid iSCSI nodes for database workloads; the performance difference is smaller than typically expected.

Critical Finding #4: Network is NOT the Bottleneck

1 Gbps network theoretical max: ~125 MB/s
Observed iSCSI throughput: 26-69 MB/s

The network has significant headroom. Current bottleneck is likely:

iSCSI protocol overhead
Longhorn replication overhead
Raspberry Pi CPU/memory limitations

Recommendation: Current 1 Gbps network is sufficient. No immediate need to upgrade to 2.5 Gbps or 10 Gbps networking.

Critical Finding #5: Cluster Capacity Effectively Doubled

Before this work: Only 5 nodes (rasp-ci-1 to 5) could run storage workloads
After this work: All 10 nodes can run storage workloads

Impact: 100% increase in schedulable capacity for stateful workloads without purchasing additional NVMe drives.

Cost Savings: Avoided purchasing 5 additional NVMe drives (~$250-500 USD depending on capacity)

9. Summary

Performance Summary Table

Workload	NVMe	iSCSI	iSCSI Efficiency
Seq Write	37.5 MB/s	26.5 MB/s	71%
Seq Read	71.2 MB/s	68.5 MB/s	96% ★
Rand Write	1388 IOPS	1191 IOPS	86%
Rand Read	1694 IOPS	1523 IOPS	90%

★ = Exceeds typical expectations for network storage

Conclusion

The Longhorn iSCSI storage configuration is PRODUCTION READY.

SD-only nodes (rasp-ci-6 through rasp-ci-10) can effectively utilize remote NVMe storage with:

96% read efficiency (excellent for most workloads)
71% write efficiency (acceptable for most workloads)
86-90% random I/O efficiency (good for databases)

This architecture successfully enables a heterogeneous cluster where diskless nodes contribute compute capacity while leveraging centralized storage from NVMe-equipped nodes.

Recommended Workload Placement

Workload Type	Recommended Node Placement
Web servers	Any node (read-heavy)
API services	Any node (read-heavy)
Read replicas	Any node
CI/CD runners	Prefer NVMe (write-heavy)
Database primary	Prefer NVMe (write-heavy)
Log aggregation	Prefer NVMe (write-heavy)
Stateless workloads	Any node

Appendix A: Environment Prerequisites Checklist

For teams replicating this setup, ensure:

K3s installed with Longhorn storage driver
open-iscsi package installed on ALL nodes (apt install open-iscsi)
iscsid service enabled and running (systemctl enable --now iscsid)
Longhorn nodes configured with allowScheduling=false for diskless nodes
Network connectivity between all nodes (1 Gbps minimum recommended)
Node labels applied: storage=nvme or storage=sd for scheduling

Appendix B: Software Versions

Component	Version	Component	Version
K3s	v1.33.6	Longhorn	v1.7.2
containerd	2.1.5	MetalLB	v0.14.9
Traefik	3.5.1	cert-manager	v1.16.2
CoreDNS	1.13.1	libedgetpu	16.0

Appendix C: Node Labels for Scheduling

All nodes have storage labels for workload scheduling:

# For NVMe nodes (rasp-ci-1 to 5)
nodeSelector:
  storage: nvme

# For SD-only nodes (rasp-ci-6 to 10)
nodeSelector:
  storage: sd

Critical: Use nodeSelector: { storage: nvme } for any workload requiring Longhorn persistent storage to ensure pods are scheduled on nodes that can host replicas OR have proper iSCSI connectivity.

End of Report

boris-42/longhorn-storage-benchmark.md

Longhorn Storage Performance Analysis Report

K3s Raspberry Pi Cluster

Date: December 29, 2025

1. Executive Summary

2. Environment Overview

Cluster Specifications

Cluster Topology

Storage Configuration Details

Network Configuration

Additional Hardware: Google Coral TPU Accelerators

Longhorn Storage Architecture

Pattern A: Local NVMe Access (rasp-ci-1 to 5)

Pattern B: iSCSI Remote Access (rasp-ci-6 to 10)

3. Pre-Requisite Work Completed

Issue: Longhorn manager pods crashing on SD-only nodes

4. Test Methodology

Objective

Test Tool: FIO (Flexible I/O Tester)

Test Execution Method

5. Test 1 - Parallel Execution

Test Description

Test 1 Results

Sequential Write (1M blocks)

Sequential Read (1M blocks)

Random Write (4K blocks)

Random Read (4K blocks)

Test 1 Analysis

6. Test 2 - Sequential (Isolated) Execution

Test Description

Test 2 Results

Sequential Write (1M blocks)

Sequential Read (1M blocks)

Random Write (4K blocks)

Random Read (4K blocks)

Test 2 Detailed Metrics

7. Comparative Analysis

Aggregate Comparison: NVMe LOCAL vs iSCSI REMOTE

Visual Comparison (iSCSI Efficiency vs NVMe)

Test 1 vs Test 2 Comparison

8. Critical Outcomes & Recommendations

Critical Finding #1: iSCSI READ Performance is Excellent

Critical Finding #2: Write Performance Gap is Significant but Acceptable

Critical Finding #3: Random I/O Performance is Nearly Equivalent

Critical Finding #4: Network is NOT the Bottleneck

Critical Finding #5: Cluster Capacity Effectively Doubled

9. Summary

Performance Summary Table

Conclusion

Recommended Workload Placement

Appendix A: Environment Prerequisites Checklist

Appendix B: Software Versions

Appendix C: Node Labels for Scheduling