ENN (Euclidean Nearest Neighbor) Performance Optimization

Problem Summary

Goal: compute nearest-neighbor distance between forest fragments at scale in GEE, with operational runtime and memory constraints. Current vectorization plus spatial join per tile is too expensive.

Current Bottlenecks

The current implementation in gee.js uses:

reduceToVectors to polygonize fragments
ee.Join.saveBest with withinDistance for nearest-neighbor search

This is slow because:

Issue	Impact
Vectorization overhead	Large feature collections and heavy geometry
Spatial join complexity	Near O(n^2) in dense tiles
Per-tile processing	Each tile repeats expensive steps

Proposed Direction (Raster-First, Vector-Light)

A. Boundary-Only Vectors + Small Join

Idea: reduce vector load by polygonizing only fragment boundaries or centroids.

Steps:

Create fragment boundaries from the label image.
Vectorize boundaries or centroids (much fewer vertices than full polygons).
Do nearest-neighbor join on boundary vectors.
Rasterize the fragment-level distance back to pixels using reduceConnectedComponents.

Pros:

Fewer features and simpler geometries.
Nearest-neighbor join cost drops significantly.

Cons:

Still uses vectorization; may be heavy on very dense tiles.

B. Raster-First Using Distance Transform + Label Reduction

Idea: stay in raster space and compute a per-fragment minimum distance using distance transforms and label reductions.

High-level plan:

Start from the fragment label image frag.
Extract a 1-pixel boundary mask boundary from frag.
Compute distance transform from boundary to get distance to nearest boundary.
For each fragment, evaluate distance only on a thin outer ring (1-2 pixels) to avoid the distance to its own boundary dominating the minimum.
Reduce per fragment using reduceConnectedComponents and write back to pixels.

Notes:

The outer-ring trick avoids per-fragment exclusion logic.
This gives an edge-to-edge distance approximation that is typically acceptable for ENN; it needs a validation run on a few tiles.

C. Chunked-Label Raster Exclusion (Exact but Heavy)

Idea: compute exact distances by excluding self labels in batches.

Sketch:

Split fragment labels into chunks (e.g., 1-2000, 2001-4000).
For each chunk, create a mask of all other fragments and run distance transform.
Reduce per fragment in that chunk.

Pros:

Exact nearest-neighbor distance.

Cons:

Multiple passes; still heavy but controllable per tile.

Suggested Prototype (A then B)

Start with A (boundary-only vectors) because it is the least risky change. If performance is still too slow, test B on a small set of tiles to validate accuracy and speed.

Pseudocode Sketches

Boundary-Only Vectors

var frag = native_i.rename('frag').selfMask();

var boundary = frag.reduceNeighborhood({
  reducer: ee.Reducer.countDistinct(),
  kernel: ee.Kernel.square(1)
}).gt(1).selfMask();

var boundaryVec = boundary.reduceToVectors({
  geometry: grid_i_buffered,
  scale: 30,
  crs: frag.projection(),
  geometryType: 'centroid',
  maxPixels: 1e13,
  bestEffort: true
});

var joined = ee.Join.saveBest({
  matchKey: 'nearest_feat',
  measureKey: 'nearest_dist_m'
}).apply({
  primary: boundaryVec,
  secondary: boundaryVec,
  condition: ee.Filter.and(
    ee.Filter.withinDistance({distance: searchRadius, leftField: '.geo', rightField: '.geo'}),
    ee.Filter.notEquals({leftField: 'label', rightField: 'label'})
  )
});

Raster-First With Outer Ring

var frag = native_i.rename('frag').selfMask();

var boundary = frag.reduceNeighborhood({
  reducer: ee.Reducer.countDistinct(),
  kernel: ee.Kernel.square(1)
}).gt(1).selfMask();

// Distance in pixels, convert to meters
var distToBoundary = boundary.fastDistanceTransform().sqrt().multiply(30);

// 1-pixel ring outside each fragment
var ring = frag.focal_max(1).and(frag.not()).selfMask();

// Reduce per fragment (min distance on the ring)
var fragMin = distToBoundary.updateMask(ring)
  .reduceConnectedComponents(ee.Reducer.min(), 'frag');

var enn = fragMin.rename('ENN');

Open Questions (Need User Input)

Is edge-to-edge ENN acceptable, or must it be centroid-to-centroid?
Can ENN be stored per fragment (one value per label) instead of per pixel?
What maximum error is acceptable if we use the ring approximation?
Typical fragment counts per tile and target runtime per tile?

Next Step

Pick 5-10 representative tiles and run a benchmark of A and B to compare runtime and accuracy. Use the current vector approach as baseline.

dirceu-jr/gee-mapbiomas-2.md

Select an option

No results found