Goal: compute nearest-neighbor distance between forest fragments at scale in GEE, with operational runtime and memory constraints. Current vectorization plus spatial join per tile is too expensive.
The current implementation in gee.js uses:
reduceToVectorsto polygonize fragmentsee.Join.saveBestwithwithinDistancefor nearest-neighbor search
This is slow because:
| Issue | Impact |
|---|---|
| Vectorization overhead | Large feature collections and heavy geometry |
| Spatial join complexity | Near O(n^2) in dense tiles |
| Per-tile processing | Each tile repeats expensive steps |
Idea: reduce vector load by polygonizing only fragment boundaries or centroids.
Steps:
- Create fragment boundaries from the label image.
- Vectorize boundaries or centroids (much fewer vertices than full polygons).
- Do nearest-neighbor join on boundary vectors.
- Rasterize the fragment-level distance back to pixels using
reduceConnectedComponents.
Pros:
- Fewer features and simpler geometries.
- Nearest-neighbor join cost drops significantly.
Cons:
- Still uses vectorization; may be heavy on very dense tiles.
Idea: stay in raster space and compute a per-fragment minimum distance using distance transforms and label reductions.
High-level plan:
- Start from the fragment label image
frag. - Extract a 1-pixel boundary mask
boundaryfromfrag. - Compute distance transform from
boundaryto get distance to nearest boundary. - For each fragment, evaluate distance only on a thin outer ring (1-2 pixels) to avoid the distance to its own boundary dominating the minimum.
- Reduce per fragment using
reduceConnectedComponentsand write back to pixels.
Notes:
- The outer-ring trick avoids per-fragment exclusion logic.
- This gives an edge-to-edge distance approximation that is typically acceptable for ENN; it needs a validation run on a few tiles.
Idea: compute exact distances by excluding self labels in batches.
Sketch:
- Split fragment labels into chunks (e.g., 1-2000, 2001-4000).
- For each chunk, create a mask of all other fragments and run distance transform.
- Reduce per fragment in that chunk.
Pros:
- Exact nearest-neighbor distance.
Cons:
- Multiple passes; still heavy but controllable per tile.
Start with A (boundary-only vectors) because it is the least risky change. If performance is still too slow, test B on a small set of tiles to validate accuracy and speed.
var frag = native_i.rename('frag').selfMask();
var boundary = frag.reduceNeighborhood({
reducer: ee.Reducer.countDistinct(),
kernel: ee.Kernel.square(1)
}).gt(1).selfMask();
var boundaryVec = boundary.reduceToVectors({
geometry: grid_i_buffered,
scale: 30,
crs: frag.projection(),
geometryType: 'centroid',
maxPixels: 1e13,
bestEffort: true
});
var joined = ee.Join.saveBest({
matchKey: 'nearest_feat',
measureKey: 'nearest_dist_m'
}).apply({
primary: boundaryVec,
secondary: boundaryVec,
condition: ee.Filter.and(
ee.Filter.withinDistance({distance: searchRadius, leftField: '.geo', rightField: '.geo'}),
ee.Filter.notEquals({leftField: 'label', rightField: 'label'})
)
});var frag = native_i.rename('frag').selfMask();
var boundary = frag.reduceNeighborhood({
reducer: ee.Reducer.countDistinct(),
kernel: ee.Kernel.square(1)
}).gt(1).selfMask();
// Distance in pixels, convert to meters
var distToBoundary = boundary.fastDistanceTransform().sqrt().multiply(30);
// 1-pixel ring outside each fragment
var ring = frag.focal_max(1).and(frag.not()).selfMask();
// Reduce per fragment (min distance on the ring)
var fragMin = distToBoundary.updateMask(ring)
.reduceConnectedComponents(ee.Reducer.min(), 'frag');
var enn = fragMin.rename('ENN');- Is edge-to-edge ENN acceptable, or must it be centroid-to-centroid?
- Can ENN be stored per fragment (one value per label) instead of per pixel?
- What maximum error is acceptable if we use the ring approximation?
- Typical fragment counts per tile and target runtime per tile?
Pick 5-10 representative tiles and run a benchmark of A and B to compare runtime and accuracy. Use the current vector approach as baseline.