Skip to content

Instantly share code, notes, and snippets.

@igalshilman
Created February 13, 2026 13:12
Show Gist options
  • Select an option

  • Save igalshilman/f0392829ea2b9e89cfdd48ab80e51470 to your computer and use it in GitHub Desktop.

Select an option

Save igalshilman/f0392829ea2b9e89cfdd48ab80e51470 to your computer and use it in GitHub Desktop.

RocksDB SIGTRAP on Apple Silicon: Hardened libc++ vs Custom Aligned Allocators

Summary

RocksDB crashes with SIGTRAP (EXC_BREAKPOINT, brk #0x1) on Apple Silicon (aarch64-apple-darwin) when statistics are enabled. The crash occurs inside StatisticsImpl::recordTick or StatisticsImpl::recordInHistogram during normal database operations, including DB::Open.

The root cause is an incompatibility between Apple Clang's hardened libc++ bounds checking and RocksDB's custom cache-line-aligned allocator used by StatisticsData.

Symptoms

  • Process exits with code 133 (128 + 5 = SIGTRAP)
  • lldb shows EXC_BREAKPOINT (code=1, subcode=...) with brk #0x1
  • Crash occurs on background threads (e.g., rs:io-lo) or during DB::Open
  • Stack trace passes through StatisticsImpl::recordTick or StatisticsImpl::recordInHistogram -> CoreLocalArray::Access -> trap

Example stack trace:

frame #0: restate-server`rocksdb::StatisticsImpl::recordInHistogram(...) + 284
frame #1: restate-server`rocksdb::SyncManifest(...)
frame #2: restate-server`rocksdb::DBImpl::NewDB(...)
frame #3: restate-server`rocksdb::DBImpl::Recover(...)
frame #4: restate-server`rocksdb::DBImpl::Open(...)

Environment

  • Apple Silicon (M1/M2/M3/M4, aarch64-apple-darwin)
  • Recent Xcode / Apple Clang toolchain (with hardened libc++ enabled by default)
  • RocksDB compiled from source via the rust-rocksdb crate (using the cc crate)
  • Statistics enabled via options.enable_statistics()

Root Cause Analysis

1. PhysicalCoreID() returns -1 on ARM64 macOS

RocksDB's port::PhysicalCoreID() (port/port_posix.cc:176-199) only has implementations for x86_64. On ARM64 macOS, it falls through to return -1. This forces CoreLocalArray::Access() to take the random fallback path:

if (UNLIKELY(cpuid < 0)) {
    core_idx = Random::GetTLSInstance()->Uniform(1 << size_shift_);
} else {
    core_idx = static_cast<size_t>(BottomNBits(cpuid, size_shift_));
}

This path is functional and produces valid indices. It is not the cause, but it exercises the code path that triggers the real bug.

2. Apple Clang's hardened libc++ inserts bounds checks

Recent versions of Apple Clang enable hardened mode for libc++ by default. Among other things, this adds bounds checking to std::unique_ptr<T[]>::operator[]. The generated code reads the new[] array cookie (element count) stored at ptr - 8 and verifies that the index is within bounds:

ldr  x9, [x21, #0x78]      ; load data_ pointer (unique_ptr's raw pointer)
ldur x10, [x9, #-0x8]      ; read "array count" from ptr - 8
cmp  x10, x8               ; compare with index
b.ls <trap>                 ; if count <= index, brk #0x1

3. RocksDB's custom aligned allocator breaks the cookie layout

StatisticsData in monitoring/statistics_impl.h is declared with cache-line alignment and a custom allocator:

struct ALIGN_AS(CACHE_LINE_SIZE) StatisticsData {
    std::atomic_uint_fast64_t tickers_[INTERNAL_TICKER_ENUM_MAX] = {{0}};
    HistogramImpl histograms_[INTERNAL_HISTOGRAM_ENUM_MAX];
    // ...
    void* operator new[](size_t s) { return port::cacheline_aligned_alloc(s); }
    void operator delete[](void* p) { port::cacheline_aligned_free(p); }
};

On Apple Silicon, CACHE_LINE_SIZE is 64 or 128 bytes. When new StatisticsData[n] is called:

  1. The compiler computes the total allocation size including cookie overhead and alignment padding
  2. operator new[] calls posix_memalign, returning a cache-line-aligned pointer P
  3. The compiler stores the array cookie near the start of the allocation
  4. The first array element is placed at the next properly-aligned offset: data_ = P + CACHE_LINE_SIZE

The result is that data_ - 8 points into alignment padding (zeros), not the cookie. The hardened operator[] reads 0 as the array count and traps on any access:

cookie location:  P + 0          (contains actual count, e.g., 16)
padding:          P + 8 ... P + (CACHE_LINE_SIZE - 1)   (zeros)
data_ pointer:    P + CACHE_LINE_SIZE
check reads:      data_ - 8 = P + (CACHE_LINE_SIZE - 8)  (zeros!)

4. Confirmed via lldb inspection

At crash time:

size_shift_      = 4        (array size = 16, correct)
core_idx         = 3        (within bounds, correct)
data_            = 0xbf4400080  (128-byte aligned)
*(data_ - 8)     = 0x0000000000000000  (padding, not the cookie!)

The bounds check evaluates 0 <= 3 -> trap. The actual array is properly allocated and the index is valid.

Fix

Disable Apple's hardened libc++ mode when compiling RocksDB's C++ code. In librocksdb-sys/build.rs, add after the NDEBUG define:

config.define("NDEBUG", Some("1"));
// Disable Apple Clang's hardened libc++ bounds checking which is
// incompatible with RocksDB's custom cache-line-aligned allocator in
// StatisticsData (the new[] cookie is not at ptr-8 due to alignment padding).
config.define("_LIBCPP_HARDENING_MODE", Some("_LIBCPP_HARDENING_MODE_NONE"));

Alternative fixes

  • In RocksDB source (util/core_local.h:82): Replace data_[core_idx] with data_.get()[core_idx] to bypass unique_ptr<T[]>::operator[] and its bounds check.
  • In RocksDB source (monitoring/statistics_impl.h): Remove the custom operator new[]/operator delete[] from StatisticsData and use standard allocation (losing cache-line alignment guarantees).

Affected Components

Any RocksDB operation that records statistics triggers this, including:

  • DB::Open (during WAL sync, manifest writes)
  • Reads, writes, compactions, flushes
  • Any path that calls RecordTick() or RecordInHistogram() internally

Disabling statistics (rocksdb_disable_statistics = true) avoids the crash but loses observability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment