Skip to content

Instantly share code, notes, and snippets.

@ruvnet
Last active December 7, 2025 22:53
Show Gist options
  • Select an option

  • Save ruvnet/a0f54ad1140904f1328c8718e9d70a9f to your computer and use it in GitHub Desktop.

Select an option

Save ruvnet/a0f54ad1140904f1328c8718e9d70a9f to your computer and use it in GitHub Desktop.
Time Traveler: Optimal Dimensionality for Hyperbolic Vector Representations in HPC Simulations
High-Dimensional Universe Simulation Kernel in Rust
This section provides a comprehensive Rust-style implementation of a simulation where "entities" (points) evolve on a dynamic submanifold embedded in a high-dimensional space. Each entity is represented by a high-dimensional state vector whose first 4 components are spacetime coordinates (time t and spatial coordinates x, y, z), and the remaining components are latent state variables (e.g. energy, mass, and other properties). We enforce that these state vectors lie on a specific manifold (such as a fixed-radius hypersphere or a Minkowski spacetime surface) via a projection step after each update. The update rule uses nearest neighbors with a Minkowski-like causal filter to ensure influences respect light-cone causality (no superluminal interaction
agemozphysics.com
). We also focus on performance by reusing allocations, aligning data to vector register boundaries, and supporting both single and double precision.
Data Structures and Parameters
We define a generic Universe struct to encapsulate the simulation state and parameters. It uses const generics for the state vector dimensionality D and a generic float type T for precision (e.g. f32 or f64). We include fields for the current and next state arrays, manifold radius, time step, light-speed constant, and (optionally) a neighbor search index (here represented abstractly, as integration with a specific ruvector engine is context-dependent):
use std::ops::AddAssign;
// Define a trait for real number operations to support both f32 and f64.
trait Real: Copy + PartialOrd {
fn zero() -> Self;
fn one() -> Self;
fn sqrt(self) -> Self;
}
impl Real for f32 {
fn zero() -> Self { 0.0 }
fn one() -> Self { 1.0 }
fn sqrt(self) -> Self { self.sqrt() }
}
impl Real for f64 {
fn zero() -> Self { 0.0 }
fn one() -> Self { 1.0 }
fn sqrt(self) -> Self { self.sqrt() }
}
// Constants for indexing the state vector (for clarity).
const TIME: usize = 0;
const X: usize = 1;
const Y: usize = 2;
const Z: usize = 3;
// (Indices 4,5,... will be used for mass, energy, and other latent variables.)
// Simulation struct encapsulating the universe state.
struct Universe<T: Real, const D: usize> {
/// Current state vectors for all entities (each length D).
current_states: Vec<[T; D]>,
/// Buffer for next state vectors (to avoid reallocating every step).
next_states: Vec<[T; D]>,
/// Constraint radius for the manifold (e.g., radius of hypersphere).
radius: T,
/// Time step increment for each iteration.
dt: T,
/// Maximum signal speed (light speed) for causal updates.
c: T,
/// Number of entities (for convenience).
n_entities: usize,
// Optionally, a nearest-neighbor index from ruvector or similar could be included:
// neighbor_index: RuVectorIndex<D, T>, (Assumed interface for fast neighbor queries)
}
// Implement initialization and simulation methods for Universe.
impl<T: Real, const D: usize> Universe<T, D> {
/// Initialize the universe with a given number of entities and initial conditions.
fn new(num_entities: usize, radius: T, dt: T, c: T) -> Self {
assert!(D >= 4, "State vector must have at least 4 dimensions (t,x,y,z).");
let mut current_states = vec![[T::zero(); D]; num_entities];
let mut next_states = vec![[T::zero(); D]; num_entities];
let mut universe = Universe {
current_states,
next_states,
radius,
dt,
c,
n_entities: num_entities,
// neighbor_index: RuVectorIndex::new(), // if using an external index
};
universe.initialize_entities();
// If an external neighbor index is used, build it on initial positions:
// universe.neighbor_index.build(&universe.current_states);
universe
}
/// Custom initialization of entity states (embedding coordinates, energy/mass, etc.).
fn initialize_entities(&mut self) {
use rand::Rng; // Assuming we can use a RNG for example initialization.
let mut rng = rand::thread_rng();
for state in &mut self.current_states {
// Initialize time to zero for all entities (starting simultaneously).
state[TIME] = T::zero();
// Randomly position the spatial coordinates (x,y,z) and latent coords.
// We will place them roughly uniformly on the sphere by using a Gaussian and projecting.
for d in X..D {
state[d] = T::from(rng.gen_range(-1.0..1.0) as f64).unwrap_or(T::zero());
}
// (If D>4, indices 4..D-1 include latent variables such as mass, energy, etc.)
// Example: assign "mass" at index 4 and "energy" at index 5 if available.
if D > 4 {
state[4] = T::one(); // assign a base mass of 1 (arbitrary units).
}
if D > 5 {
// Set initial energy, e.g., random in [0,1).
state[5] = T::from(rng.gen_range(0.0..1.0) as f64).unwrap_or(T::zero());
}
// Project the state onto the manifold (enforce radius constraint for spatial/latent part).
Self::project_to_manifold(state, self.radius);
}
}
Notes on Initialization: In initialize_entities, we set initial time to 0 for all entities (simulating a simultaneous start). Spatial components (x, y, z) and any additional latent dimensions are given random values and then projected onto the manifold to satisfy the radius constraint. We also demonstrate assigning an initial mass (index 4) and energy (index 5) if those slots exist. In a real setup, you might sample these from specific distributions or set them based on a scenario.
Manifold Projection Step
To ensure all state vectors remain on the defined submanifold, we provide a projection function. We illustrate two possible manifold constraints:
Fixed-radius hypersphere (Euclidean): All state vectors (excluding the time component) are projected to have a constant Euclidean norm (radius). This could represent, for example, a 3D space of fixed radius with additional latent dimensions forming part of the "shell".
Minkowski spacetime surface: Alternatively, one could enforce a Minkowski metric constraint (e.g., a hyperboloid) for the first 4 components. In that case, one might keep -t^2 + x^2 + y^2 + z^2 = R^2 constant (for some constant R). Here we implement the simpler Euclidean sphere projection and note how Minkowski would differ.
/// Project a state vector onto the manifold (e.g., enforce fixed radius or Minkowski surface).
fn project_to_manifold(state: &mut [T; D], radius: T) {
// For a Euclidean sphere constraint on spatial+latent coordinates (excluding time):
let mut norm_sq = T::zero();
// Compute squared norm of all coordinates except time (index 0).
for d in 1..D {
let val = state[d];
norm_sq = norm_sq + val * val;
}
// If norm is zero (unlikely with random init), skip to avoid division by zero.
if norm_sq == T::zero() {
return;
}
let norm = norm_sq.sqrt();
// Scale all non-time coordinates to lie on the shell of given radius.
let scale = radius / norm;
for d in 1..D {
state[d] = state[d] * scale;
}
// (For a Minkowski surface constraint, we would enforce -t^2 + x^2 + y^2 + z^2 = const.
// That would require adjusting time and/or space components together, not just scaling.
// One could, for example, solve for t given x,y,z to keep -t^2 + r^2 = const.)
}
In the code above, project_to_manifold rescales all non-time components so that their Euclidean length equals the specified radius. This effectively projects the point radially onto the surface of a hypersphere. We exclude the time component t from this normalization so that time is free to increase (time evolution is handled separately). If a Minkowski manifold were desired instead, the projection would need to maintain the Minkowski norm. For example, if we want -t^2 + x^2+y^2+z^2 = R^2, we might adjust the time coordinate after moving the spatial coordinates. This is more complex (and possibly not invariant under all updates), so here we stick to the Euclidean projection for demonstration.
Causal Update with Nearest Neighbors (Minkowski Filtering)
The core of the simulation is the update rule that evolves each entity's state from one time step to the next, based on its neighbors. To maintain locality and causality, we use a nearest-neighbors approach:
Each entity finds neighboring entities from the previous step that are close in the embedding space.
We then filter these neighbors using a Minkowski-like condition: an entity can only be influenced by another if the neighbor lies within the light cone, i.e., the neighbor is close enough in space given the time difference. In practice, if Δt is the time difference between the neighbor and the entity (usually one time step), and d is their spatial distance, we require d <= c * Δt. This ensures no influence travels faster than the speed c (analogous to light speed)
agemozphysics.com
. Neighbors outside this range are ignored.
With the causal neighbors identified, we apply a differentiable update rule (like diffusion of latent variables or energy transfer). In the example below, we implement a simple diffusion-like rule: each entity's new state is a blend of its old state and the neighbors' states. We take an average of neighbor contributions and move the entity's state slightly toward that average (controlled by a small factor α). This can model diffusion or smoothing of values (energies, etc.) across nearby entities. For spatial coordinates, this results in a slight motion toward the neighbors (which could be interpreted as a simplified interaction force). We also increment the time coordinate by dt for all entities.
/// Perform one simulation step (compute next_states from current_states).
fn update_step(&mut self) {
let alpha = T::from(0.1).unwrap_or(T::one()); // update rate (e.g., 0.1)
let c_dt = self.c * self.dt; // maximum distance for causal influence
// Loop over each entity to compute its new state:
for i in 0..self.n_entities {
let old_state = self.current_states[i];
let mut new_state = old_state; // start from old state as baseline
// Accumulators for summing neighbor contributions:
let mut neighbor_count = 0;
let mut accum = [T::zero(); D]; // accumulate neighbor states
// Find neighbors of entity i in the *previous* state.
// (Here we do a brute-force search for simplicity. In practice, use a spatial index.)
for j in 0..self.n_entities {
if i == j { continue; }
let neighbor = self.current_states[j];
// Compute Minkowski-like distance components:
let dt = neighbor[TIME] - old_state[TIME]; // time difference (should be <= 0 for past neighbors)
if dt > T::zero() {
// Neighbor is in the future relative to i's time, skip to prevent retro-causality.
continue;
}
let spatial_dist_sq =
(neighbor[X] - old_state[X]) * (neighbor[X] - old_state[X]) +
(neighbor[Y] - old_state[Y]) * (neighbor[Y] - old_state[Y]) +
(neighbor[Z] - old_state[Z]) * (neighbor[Z] - old_state[Z]);
let spatial_dist = spatial_dist_sq.sqrt();
// Only consider this neighbor if within the light-cone: d <= c * |dt|.
if spatial_dist > c_dt {
continue;
}
// If neighbor passed the causal filter, include it.
neighbor_count += 1;
for k in 0..D {
accum[k] = accum[k] + neighbor[k];
}
}
if neighbor_count > 0 {
// Compute average neighbor state (influence).
for k in 0..D {
accum[k] = accum[k] / T::from(neighbor_count as f64).unwrap_or(T::one());
}
// Diffusion update: move a bit towards the neighbor average for latent components.
// We'll leave time unchanged here (time updated separately).
for k in 1..D {
// For coordinates and latent variables, blend towards average.
new_state[k] = old_state[k] + alpha * (accum[k] - old_state[k]);
}
}
// Increment time coordinate for this entity (advancing to next step).
new_state[TIME] = old_state[TIME] + self.dt;
// Project the new state back to the manifold to enforce constraints.
Self::project_to_manifold(&mut new_state, self.radius);
// Store the computed new state in the next_states buffer.
self.next_states[i] = new_state;
}
// After computing all new states, swap the buffers so next_states becomes current_states.
std::mem::swap(&mut self.current_states, &mut self.next_states);
// If using a spatial index for neighbors, update it with new positions for next iteration:
// self.neighbor_index.update_all(&self.current_states);
}
}
In this code:
We iterate over all entities i. For each i, we search through all other entities j in the previous state (current_states) to find neighbors. (This brute-force approach is O(N²); in a real scenario with large N, you'd use a fast neighbor search structure, such as a KD-tree or the ruvector engine, to get nearest neighbors or all points within c*dt distance efficiently.)
For each candidate neighbor j, we calculate the spatial distance and check the causal condition. The time difference dt = neighbor[TIME] - old_state[TIME] should be non-positive (the neighbor is at an earlier or equal time). In our step-by-step update scheme, all neighbors come from the same previous time, so dt = 0 for simultaneous previous state neighbors (we allow those as simultaneous interaction in this model) or negative if we had stored older states. We also ensure spatial distance <= c*Δt. Since in our synchronous update Δt is the global self.dt, this simplifies to spatial_dist <= c_dt. Neighbors outside this range are skipped, which upholds the idea that no influence can propagate faster than c per time step (analogous to a light-cone limit).
We accumulate the states of all valid neighbors. Then we compute the average neighbor state.
We apply a diffusion update: the new state moves a fraction α (10% in this example) toward the neighbors' average for each component (except time). This means latent variables like energy or other fields will diffuse across nearby entities over time. Spatial coordinates (x, y, z) also shift toward neighbors slightly, which can be interpreted as a simplistic attractive interaction or movement along the manifold.
We increment the entity’s time coordinate by self.dt.
Finally, we project new_state to ensure it stays on the manifold (maintaining the radius constraint). For example, if the diffusion update caused the spatial coordinates to drift off the sphere, this will normalize them back to radius R. (Time is not affected by the projection in our design, remaining simply old_time + dt.)
After updating all entities, we swap current_states and next_states buffers. This avoids reallocating a new array each step and allows the simulation to proceed to the next iteration. If we had an external neighbor index (like from ruvector), we would call its update method here to inform it of the new positions, so that subsequent neighbor queries reflect the moved entities. The ruvector engine presumably can update points efficiently without full rebuild, enabling dynamic simulations.
Time-Stepping Loop
Using the Universe struct and its methods, we can now run the simulation in a loop for a certain number of steps. The loop will repeatedly call update_step(). This is the point where one could also introduce agentic extensions or external interventions at each step (for example, adding forces, spawning new entities, or changing properties based on some higher-level logic).
Below is an example of running the simulation for a number of iterations:
fn main() {
// Simulation parameters:
let num_entities = 1000;
let radius = 1.0_f64; // radius of the manifold (e.g., unit sphere in latent space)
let dt = 0.01_f64; // time step
let c = 1.0_f64; // "speed of light" in our units
// Initialize the universe simulation.
let mut universe = Universe::<f64, 8>::new(num_entities, radius, dt, c);
// (Here we choose T = f64 and D = 8 as an example, meaning each state vector has 8 components:
// t, x, y, z, mass, energy, and 2 additional latent variables.)
// Run the simulation for 100 steps.
for step in 0..100 {
universe.update_step();
// (Optional) If we had any custom agent behavior or logging, we could do it here.
}
// After simulation, we can inspect or output the final state.
for (i, state) in universe.current_states.iter().enumerate().take(5) {
println!("Entity {} final state: {:?}", i, state);
}
}
In this main function, we:
Create a universe with num_entities points on an 8-dimensional manifold (with a 1.0 radius for the spatial+latent part).
Run 100 iterations of the update loop.
Print out the final state of the first few entities for verification.
The code is structured to be reusable as a simulation kernel. One can easily modify the update rule (for example, to implement other interaction laws or incorporate external forces) by editing update_step. The system is amenable to adding agentic behaviors: for instance, one could give each entity an internal decision-making routine that modifies its latent state based on certain conditions, or spawn/remove entities during the loop. The state vector is flexible enough to encode additional information as needed for such extensions.
Performance Considerations
This implementation is written with performance in mind:
Reusing Allocations: We allocate the current_states and next_states vectors once, up front. Throughout the simulation, we recycle these buffers, writing new values into next_states and then swapping. This avoids frequent allocations or deallocations inside the time-stepping loop.
Contiguous Memory: The state vectors for all entities are stored in a single Vec, which ensures data is contiguous in memory. This improves cache locality and makes bulk operations more efficient. We use an Array-of-Structs (AoS) approach (Vec<[T; D]>). In some cases, a Struct-of-Arrays (SoA) layout can yield even better vectorization (separate arrays for each component)
medium.com
medium.com
, but AoS is simpler and still benefits from contiguity for each entity's full state.
Alignment: We can further ensure memory alignment for SIMD by aligning the state vectors. For example, we could declare the state array type with an alignment attribute to match 256-bit or 512-bit boundaries (for AVX/AVX-512). Rust allows using #[repr(align(32))] on a struct or array to guarantee 32-byte alignment
medium.com
. In our case, if we needed to, we could wrap the [T; D] in a newtype with repr(align(32)) to ensure each state is aligned. This can help the CPU load/store vectors efficiently in one operation.
SIMD and Parallelism: The math operations (like computing distances and summing neighbors) are amenable to SIMD acceleration. Rust's stable std::simd or explicit intrinsics could be used to operate on slices of data in parallel. Also, the outer loop over entities can be run in parallel threads since each entity's update is independent (except for reading the shared current_states). Using a crate like rayon, one could replace the for i in 0..n_entities loop with a par_iter().enumerate() to automatically distribute work across threads. This would significantly speed up the simulation for large numbers of entities on multi-core systems.
Optional Precision: By using a generic type T: Real, we support both single (f32) and double (f64) precision. This means one can trade off precision for speed or memory. For example, if f32 is sufficient for a particular simulation, it will use half the memory and potentially utilize SIMD (as eight 32-bit floats fit in a 256-bit AVX register vs. four 64-bit floats). The code can be instantiated as Universe::<f32, D> or Universe::<f64, D> as needed. The trait Real we defined ensures we have the basic operations (like sqrt) available for either type.
Finally, note that the neighbor search in our example is naive. In a real high-dimensional simulation with many entities, integrating with a fast nearest-neighbor engine (like the ruvector infrastructure) is crucial. The ruvector engine likely provides efficient queries for nearest neighbors or points within a radius. By building such an index (for example, an HNSW graph or tree structure) and updating it each step (or as needed), we can bring down the neighbor search cost significantly. After each time step, as entities move, we would call an update on the index (or rebuild it periodically). This ensures that even as the manifold evolves, queries remain fast and the simulation scales to a large number of entities.
Causality and Correctness: Our Minkowski filtering approach ensures that no entity is influenced by another outside its causal cone. This concept is derived from relativity: “For each point in spacetime... The time-forward light cone defines those points which could interact [with the] originating point. The light cone itself defines a boundary where no connection can be made.”
agemozphysics.com
In other words, events (state updates) cannot affect others if they are separated by more space than time would allow a signal to travel (with speed c). By incorporating this rule, our simulation kernel respects a form of causality constraint, which can be important for realism in physical simulations or simply as a design principle for emergent behavior.
Summary: The provided Rust-style code establishes a flexible framework for simulating a "universe" of entities on a constrained manifold with local interactions and causal updates. It can be used as a foundation for experiments in emergent behavior, distributed agent systems, or physically-inspired simulations in high-dimensional spaces. The structure allows further extension, such as adding more complex forces, dynamic creation/removal of entities, or integrating with specific libraries (like ruvector for neighbor search or linear algebra crates for optimization). The heavy use of generics, constants, and in-place updates follows idiomatic Rust, ensuring the implementation is both efficient and safe for further development.
Optimal Dimensionality for Hyperbolic Vector Representations in HPC Simulations
Introduction
Hyperbolic vector embeddings have emerged as powerful representations for hierarchical and graph-structured data. Unlike Euclidean space, hyperbolic space has a constant negative curvature and an exponential volume growth, which allows it to embed tree-like structures with low distortion even in low dimensions
arxiv.org
. This makes hyperbolic embeddings attractive for high-performance simulations and machine learning tasks that involve hierarchical relationships (e.g. taxonomies or networks). However, choosing the optimal dimensionality for these embeddings is crucial to balance inference accuracy against computational speed, especially in real-time or microsecond-scale simulation environments. In such high-performance contexts (e.g. using a custom vector engine), each additional dimension incurs computational and memory overhead, so one must carefully trade off the richness of the representation against latency and numerical stability constraints.
Hyperbolic Embedding Models and Characteristics
Hyperbolic Embeddings: In an n-dimensional hyperbolic space (denoted ℍ^n), distances grow exponentially with radius
arxiv.org
. Intuitively, this provides “more room” to embed exponentially expanding structures like trees. For example, a hierarchy that would quickly crowd the edges of a 2D Euclidean space can be spread out without distortion in a 2D hyperbolic disk. Figure 1 illustrates this: a branching tree embedded in the Poincaré disk (a 2D hyperbolic model) can place many nodes at increasing radii without compressing distances, thanks to the disk’s negative curvature. Sibling nodes remain well-separated and parent-child distances stay uniform even near the boundary, avoiding the crowding observed in Euclidean embeddings
bjlkeng.io
bjlkeng.io
.
Figure 1: A hierarchical tree (binary branching) embedded in a 2D hyperbolic plane (Poincaré disk model). Distances from the center increase exponentially toward the boundary, allowing each level of the tree to occupy a “ring” of roughly equal thickness without overlapping
bjlkeng.io
. In a Euclidean plane, by contrast, nodes at deeper levels would crowd together at the outskirts (causing loss of distance resolution).
Poincaré vs. Lorentz Models: Two popular coordinate models for hyperbolic space are the Poincaré ball and the Lorentz (hyperboloid) model. Both are mathematically equivalent (related by isometry) but have different numerical properties. In the Poincaré ball model, points lie inside the unit ball in ℝ^n, and the distance between two vectors u and v is given by:
d_{\mathcal{H}}(u,v) \;=\; \cosh^{-1}\!\Big(1 \;+\; 2\,\frac{\|u-v\|^2}{(1-\|u\|^2)\,(1-\|v\|^2)}\Big)\,. \tag{1}
This formula involves Euclidean norms and an $\operatorname{arcosh}$, and it rapidly grows as points approach the boundary (‖u‖ or ‖v‖ → 1)
bjlkeng.io
. In contrast, the Lorentz model represents points on an n+1 dimensional hyperboloid defined in Minkowski space. A point is given by coordinates $(x_0, x_1,\dots,x_n)$ satisfying $-x_0^2 + x_1^2+\cdots+x_n^2 = -1$ (with $x_0>0$). The distance can be computed via the Minkowski inner product $\eta(x,y) = -x_0y_0 + \sum_{i=1}^n x_i y_i$ as:
d_{\mathcal{H}}(x,y) \;=\; \cosh^{-1}\!\big(-\,\eta(x,y)\big)\,. \tag{2}
Both models yield the same geometry but have different computational considerations. The Lorentz model’s distance formula is somewhat simpler (a single dot product plus $\cosh^{-1}$) and lacks the division by small numbers that appears in the Poincaré formula. This often translates to more stable gradient calculations in training and simpler distance computations in inference. However, the Lorentz coordinates include an extra dimension (for $x_0$) and can grow large in magnitude for points far from the origin, which has implications for numerical precision (discussed below).
Geometric Deep Learning in Hyperbolic Space: Many modern algorithms incorporate hyperbolic geometry to exploit these properties. Examples include Poincaré embeddings for hierarchical representations
arxiv.org
, hyperbolic graph neural networks, and knowledge graph embeddings. These methods often report that hyperbolic embeddings can represent complex hierarchies with far fewer dimensions than Euclidean embeddings for the same accuracy. By capturing hierarchical structure in the geometry (through negative curvature), a hyperbolic model can use each dimension efficiently to encode exponential growth patterns. This means that a relatively low-dimensional hyperbolic vector can encode a rich tree of relationships that would otherwise require a very high-dimensional Euclidean vector to achieve similar pairwise distance fidelity
arxiv.org
. The next section explores this dimensionality–accuracy trade-off in detail.
Dimensionality vs. Accuracy Trade-offs
A major advantage of hyperbolic representations is their ability to achieve high accuracy with low dimensionality. Researchers have observed that even 2–10 dimensions in hyperbolic space can often outperform hundreds of dimensions in Euclidean space for hierarchical data
arxiv.org
proceedings.mlr.press
. In one notable result, a two-dimensional hyperbolic embedding of the WordNet taxonomy achieved a mean average precision of 0.989, whereas a prior approach needed 200 dimensions in Euclidean space to reach only 0.87 MAP
arxiv.org
. This highlights that hyperbolic geometry can encode hierarchy implicitly, reducing the need for additional vector dimensions. In practical terms, a hyperbolic model finds it easier to place unrelated nodes far apart and related nodes close together without crowding, so each dimension provides significant discriminatory power.
Empirical studies consistently show diminishing returns beyond a certain number of hyperbolic dimensions. For example, Nickel & Kiela (2018) found that on a complex taxonomy (WordNet nouns), a 10-dimensional Lorentz-model embedding outperformed the best 200-dimensional embedding from their earlier Poincaré model
proceedings.mlr.press
. In fact, their Lorentz embeddings in just 2 dimensions already captured the hierarchy much more accurately (74% improvement in certain metrics) than Poincaré embeddings in 2D
proceedings.mlr.press
. Similarly, a summary of experiments from another study showed that with as few as 5 dimensions, Poincaré hyperbolic embeddings achieved massive improvements in reconstruction accuracy over Euclidean methods
bjlkeng.io
. In large-scale network embedding tasks, researchers often report using only about 5–10 dimensions in hyperbolic space, since the embedding dimension is typically small (e.g. 10) and yet yields lower distance distortion than a higher-dimensional Euclidean embedding
pure.tue.nl
. In other words, beyond a small number of dimensions, hyperbolic embeddings tend to saturate in performance: additional dimensions give only marginal gains because the negative curvature already provides an efficient encoding of similarity
arxiv.org
.
It is worth noting that the optimal dimensionality can depend on the data’s structure. Purely tree-structured data (like strict hierarchies) might reach peak accuracy at extremely low dimensions (2 or 5), whereas more complex graphs (with multiple latent factors or deviations from a perfect tree) may benefit from slightly higher dimensions (perhaps 10–20) before leveling off. Nonetheless, even in those cases, the required dimension is far lower than one would need in Euclidean space for comparable accuracy
arxiv.org
. This efficiency is a direct consequence of hyperbolic geometry’s ability to naturally represent hierarchical distances (exponential branching) in a linear or low-dimensional way
arxiv.org
.
Trade-off implications: From a simulation perspective, using the smallest sufficient dimension is desirable. Lower dimensionality means fewer coordinates to update or compute distances on, which translates to faster calculations. At the same time, too low a dimension might start to sacrifice fidelity if the data has more complex structure than the embedding can capture. The key is to find a balance where adding one more dimension yields negligible improvement in accuracy – that point can be considered an optimal dimension. Published benchmarks (as cited above) suggest that for many hierarchical datasets, this optimal point is often in the single digits or low tens of dimensions
bjlkeng.io
pure.tue.nl
. Selecting a dimension in that range typically “balances the scales” between accuracy and speed for hyperbolic embeddings.
Precision, Memory, and Numerical Stability Considerations
Choosing dimensionality in hyperbolic space is also tied to precision and numerical stability concerns, especially under tight computational budgets. Hyperbolic models can push coordinate values to extreme ranges when representing large or deep hierarchies, which strains finite precision arithmetic. There are two critical aspects: (1) the model (Poincaré vs Lorentz) and (2) the floating-point precision (single vs double) used in computation.
Poincaré model stability: In the Poincaré ball, points representing very distant nodes in a hierarchy gravitate exponentially close to the unit sphere boundary. This means coordinate differences can become very small (e.g. 1 − ‖u‖ is tiny), and distance calculations involve subtracting nearly equal quantities. As a result, to faithfully distinguish points near the boundary, one requires a large number of bits of precision
arxiv.org
. Sala et al. (2018) proved that embedding a tree with low distortion in the Poincaré model may demand high-precision arithmetic; using standard 64-bit doubles, there is essentially a maximum radius (a bit under the edge of the disk) beyond which points cannot be distinguished due to rounding
arxiv.org
arxiv.org
. In practical terms, this means a very deep hierarchy could cause 32-bit floats to produce severe rounding error or even NaNs, and even 64-bit floats can represent points only up to a certain depth (beyond which all points blur at the boundary)
arxiv.org
. Thus, the Poincaré model has a “hard” representation limit – with Float64, points approaching the boundary lose their accuracy entirely
arxiv.org
.
Lorentz model stability: The Lorentz model tends to have the opposite numerical issue. Distances increase when points move farther from the origin along the hyperboloid, meaning the time-like coordinate $x_0 = \cosh(\text{radial distance})$ can grow very large. Large values in $x_0$ or other coordinates can exceed the dynamic range or precision of floating-point representation. Yu & De Sa (2019) showed that using floats to represent Lorentz embeddings leads to huge errors once points are far out on the hyperboloid
arxiv.org
. Essentially, differences like $(x_0 + x_0') - x_0$ may lose precision if $x_0$ is on the order of $10^{16}$ and $x_0'$ is much smaller. Under 64-bit precision, the Lorentz model can faithfully represent points only within a certain radius as well, which in one analysis was even smaller (in hyperbolic distance) than that of the Poincaré model
arxiv.org
arxiv.org
. However, an important distinction is that the Lorentz model’s limit is a “soft” constraint: even if points are numerically inexact when extremely far out, one can often still compute with the large coordinates (e.g. for a while $x_0$ might overflow the guarantee of precise addition, but the coordinates are still finite and can be used in dot products)
arxiv.org
. This, combined with typically better optimization behavior, means Lorentz embeddings have been observed to be empirically more stable during training than Poincaré, despite the theoretical representation limits
arxiv.org
.
Precision vs. speed: Using double precision (64-bit) floating point is generally recommended for hyperbolic simulations if the hierarchy is deep, as it significantly extends the range of representable distances. Many hyperbolic embedding experiments are conducted in double precision specifically to avoid numerical issues
arxiv.org
. In a microsecond-scale simulation, however, double precision arithmetic might be slower (and doubles consume twice the memory of floats). If the hierarchical depth or required dynamic range is moderate, 32-bit floats might suffice and would improve speed and memory usage. For example, if all points stay within a radius that 32-bit can handle with acceptable rounding error, floats would cut memory bandwidth in half and likely fit better in vector registers, accelerating computation. The safe approach is to profile and determine if single precision causes any noticeable degradation in the distance calculations (e.g. by checking for NaNs or deviations in invariants). In many real-time applications with limited hierarchy levels, floats might perform well, but caution is warranted for any algorithm that might push values to extremes.
Memory and cache: Memory footprint grows linearly with dimensionality. An n-dimensional embedding vector in double precision uses 8n bytes. In an HPC context, large dimensions can quickly tax memory bandwidth – which is often a bottleneck in simulation – and also pressure caches. For instance, consider tracking 1 million entities with 16-dimensional vs. 128-dimensional hyperbolic vectors: the latter would consume 8× more memory and each simulation step would need to move 8× more data for the same computations. In a microsecond-scale loop, this difference is enormous. Keeping the dimensionality low (and using floats if possible) means more vectors or coordinates can reside in fast cache memory and fewer bytes need to be streamed from main memory per update. This directly improves throughput on modern HPC architectures. In fact, one study notes that using an embedding dimension around 10 allows distance computations in essentially constant time (since 10 values easily fit in vector registers or cache lines)
pure.tue.nl
. Meanwhile, extremely high dimensions would force multiple load/store operations and potentially cache misses, increasing latency per operation.
Vectorization: If using specialized vector hardware (e.g. SIMD units or a custom “ruvector” engine), the dimensionality should ideally align with hardware capabilities. Many vector processors operate on fixed-size blocks (e.g. 4, 8, or 16 numbers at once). If the dimension is a multiple of that width, distance calculations can be perfectly vectorized; if not, there might be under-utilization of some lanes or extra cycles to handle the remainder. For example, if an engine has a 16-wide vector unit, a 16-dimensional or 32-dimensional vector is optimal, whereas a 17-dimensional vector might incur an extra partial operation. This doesn’t usually override the primary consideration of embedding accuracy, but when fine-tuning for microsecond-level performance, it’s a factor. Choosing a dimension that is both minimal for accuracy and friendly to the hardware’s vector length can yield slight performance gains.
Recommendations for Selecting Dimensionality in Real-Time Simulation
Balancing speed and accuracy in a high-performance simulation context requires an informed choice of hyperbolic embedding dimensionality. Based on the above insights and published benchmarks, here are specific recommendations:
Start Low and Increase Gradually: Begin with the smallest dimension that conceptually fits the data (e.g. d = 2 or 5 for a simple tree hierarchy). Hyperbolic space is very expressive in low dimensions
arxiv.org
, so a minimal vector size often suffices. Increase d incrementally and monitor the improvement in simulation accuracy or distortion. There is usually a knee point where additional dimensions yield diminishing returns
arxiv.org
– use that as the cutoff for optimal d.
Leverage Theoretical Trade-offs: Be aware that hyperbolic embeddings can trade off precision and dimensionality. If you find that a very low dimension causes slight distortion, consider whether a small increase in dimension can dramatically reduce error (as seen when going from 2D to 5D in some cases
bjlkeng.io
). Often a jump to around 5–10 dimensions captures most hierarchies with high fidelity, and going beyond that yields only minor gains. Aim to stay in the lowest dimensional regime that meets the accuracy criteria.
Choose the Right Model: For training or dynamic simulations that adjust the embeddings, the Lorentz model often provides more stable gradients and faster convergence in low dimensions
proceedings.mlr.press
arxiv.org
. Thus, if you are learning embeddings on the fly in your simulation, using the Lorentz model with Riemannian optimization can achieve a given accuracy in a smaller dimension or fewer iterations. For inference-only simulations (using fixed embeddings), either model is fine, but the Lorentz model’s distance computation (a single Minkowski dot product) may be slightly simpler to implement on vector hardware. Just keep an eye on the range of values: if your simulation might require extremely distant points, Poincaré coordinates have more headroom under double precision
arxiv.org
, whereas Lorentz might need careful handling to avoid overflow. In most practical cases (where radii are moderate), we recommend Lorentz for its robustness and efficiency, unless there is a specific reason the Poincaré model suits your application better.
Mind Precision Limits: If the simulation deals with a deep hierarchy or requires very high precision distance calculations, use 64-bit floating point for safety
arxiv.org
. Check if 32-bit floats are causing any anomalies (e.g., distances not obeying triangle inequality due to rounding, or points “sticking” on the boundary). For many real-time simulations with controlled conditions, 32-bit may be adequate and will double the speed of vector arithmetic and halve memory usage. But always validate – particularly if your hyperbolic vectors venture close to the boundary (Poincaré) or far from the origin (Lorentz), where rounding issues manifest. If necessary, impose a radius limit (clamping the maximum norm of vectors) to keep values within a safe numeric range
arxiv.org
, a practice used in some implementations to avoid instability.
Optimize Memory Access: Ensure that your data layout and dimension choice are optimized for the memory hierarchy. For example, storing vectors in a structure-of-arrays (SoA) format can allow contiguous memory access for each coordinate across many points, which vectorizes well. Align the dimensional size to cache line boundaries or the vector engine’s native width if possible. For instance, if using an engine with 256-bit registers (able to process four 64-bit doubles at once), a dimension that is a multiple of 4 will utilize this fully. While you should not drastically change the dimension purely for alignment, it’s something to consider if you’re deciding between, say, 7 or 8 dimensions for a marginal accuracy difference.
Benchmark in Context: Finally, use benchmarks or small-scale tests that mimic your simulation’s workload to evaluate the performance impact of dimensionality. Measure the per-step computation time and the accuracy metrics as you vary dimension. Publications have provided general guidance (e.g., hyperbolic embeddings often plateau by 10–20 dimensions or less
pure.tue.nl
proceedings.mlr.press
), but the optimal point for your specific simulation might differ. If, for example, 8 dimensions give you 99% of the accuracy of 16 dimensions, the speed gain from halving the vector size is likely worth the tiny accuracy drop in a microsecond-critical loop. Use such data to justify your choice.
In summary, hyperbolic vector representations allow one to use remarkably low dimensionality to capture complex hierarchical structures, which is a boon for high-performance, real-time simulations. By selecting the minimal effective dimension (often in the single-digits or teens) and using appropriate models and precision, you can achieve a sweet spot where the simulation runs at microsecond-scale speeds without compromising on the fidelity of distance or inference computations. Empirical studies and theory concur that hyperbolic embeddings offer excellent accuracy per dimension
arxiv.org
bjlkeng.io
, so the goal is to capitalize on that efficiency while respecting the computational limits of your hardware. Adhering to these guidelines will ensure that your simulation leverages the full power of hyperbolic geometry with a balanced approach to speed, precision, and accuracy.
References and Further Reading
Nickel, M. & Kiela, D. (2017). Poincaré Embeddings for Learning Hierarchical Representations. Advances in Neural Information Processing Systems. (Introduced the use of Poincaré ball model for embedding taxonomies; showed drastic reduction in required dimensions for hierarchical data.)
Nickel, M. & Kiela, D. (2018). Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. ICML. (Demonstrated advantages of the Lorentz model and low-dimensional embeddings for large taxonomies, with comparisons to Poincaré.)
Sala, F. et al. (2018). Representation Tradeoffs for Hyperbolic Embeddings. ICML. (Provided theoretical bounds on precision vs. dimensionality in hyperbolic space; combinatorial constructions achieving low distortion with few dimensions at the cost of high precision.)
arxiv.org
arxiv.org
Mishne, G. et al. (2022). The Numerical Stability of Hyperbolic Representation Learning. ICML. (Analyzed numerical limits of Poincaré and Lorentz models under finite precision, and proposed an alternative parameterization to improve stability.)
arxiv.org
arxiv.org
Chami, I. et al. (2019). Hyperbolic Graph Convolutional Networks. NeurIPS. (Applied hyperbolic embeddings in deep learning for graph data; discusses selecting curvature and dimension in practice.)
Sarkar, R. (2011). Low Distortion Embeddings of Trees in Hyperbolic Plane. (Proved that trees can be embedded with arbitrarily low distortion in 2D hyperbolic space, motivating later machine learning applications
arxiv.org
.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment