Yes, TheRock already builds Python with --enable-shared for embedding use cases. The infrastructure exists and is used by rocgdb. rocprofiler-compute just needs to wire into it.
Source: CI ASAN Run #21463906609
Artifacts Index: https://therock-ci-artifacts.s3.amazonaws.com/21463906609-linux/index-gfx94X-dcgpu-asan.html
From TheRock repo root:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> | |
| #map1 = affine_map<(d0, d1) -> (d1, d0)> | |
| #map2 = affine_map<(d0, d1) -> (d0, d1)> | |
| #map3 = affine_map<(d0, d1) -> (d1)> | |
| #map4 = affine_map<(d0, d1, d2) -> (d2, d0)> | |
| #map5 = affine_map<(d0, d1, d2) -> (d1, d2)> | |
| #map6 = affine_map<(d0, d1, d2) -> (d0, d2)> | |
| #map7 = affine_map<(d0, d1, d2) -> (d2, d0, d1)> | |
| #map8 = affine_map<(d0, d1, d2) -> (d1, d2, d0)> | |
| #map9 = affine_map<(d0, d1, d2) -> (d2)> |
This document provides architectural guidance for "Quartz" - a PyTorch HUD-like system for ROCm downstream CI/CD orchestration. The junior engineer's instinct to start with status.json is understandable but insufficient for the stated requirements. A database-first approach is correct.
Extract MoE primitives from /develop/ai-no-fluff/kb/ben/moe_f32_parameterized.mlir:
mul_mat_id- Expert-selected matrix multiplication (gather + batch_matmul)moe_ffn_block- Full MoE FFN block composing routing, expert compute, weighted sum
Key challenge: moe_ffn_block depends on mul_mat_id and swiglu. Need systematic composition without manual inlining.
| Merge Commit | Individual Commits on Main |
|---|---|
| One atomic integration point | 500 commits sprawled on main |
git revert -m1 <merge> undoes everything |
Good luck reverting |
git bisect can skip the whole merge |
Bisect walks through 500 commits |
| Main history is readable | Main history is chaos |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| module @aqt_matmul { | |
| iree_input.global private @_params$0 = dense<[[0.000000e+00, 5.003000e+02, 1.000600e+03], [1500.8999, 2.001200e+03, 2.501500e+03], [3001.7998, 3502.09985, 4.002400e+03], [4502.69971, 5.003000e+03, 5.503300e+03], [6003.59961, 6503.8999, 7004.1997], [7.504500e+03, 8004.7998, 8.505100e+03]]> : tensor<6x3xf32> | |
| iree_input.global private @_params$1 = dense<5.000000e+00> : tensor<f32> | |
| func @compute_native(%arg0: tensor<5x6xf32>) -> tensor<5x3xf32> { | |
| %0 = iree_input.global.load @_params$0 : tensor<6x3xf32> | |
| %1 = iree_input.global.load @_params$1 : tensor<f32> | |
| %2 = call @main(%0, %1, %arg0) : (tensor<6x3xf32>, tensor<f32>, tensor<5x6xf32>) -> tensor<5x3xf32> | |
| return %2 : tensor<5x3xf32> | |
| } | |
| func private @main(%arg0: tensor<6x3xf32>, %arg1: tensor<f32>, %arg2: tensor<5x6xf32>) -> tensor<5x3xf32> { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #device_target_vmvx = #hal.device.target<"vmvx", {executable_targets = [#hal.executable.target<"vmvx", "vmvx-bytecode-fb">]}> | |
| module attributes {hal.device.targets = [#device_target_vmvx]} { | |
| util.global private @hoisted_1 : !hal.buffer | |
| util.global private @hoisted_1__offset : index | |
| util.global private @hoisted_1__size : index | |
| util.global private @hoisted_0 : !hal.buffer | |
| util.global private @hoisted : !hal.buffer | |
| util.global private @hoisted__storage_size : index | |
| util.global private @hoisted__offset : index | |
| util.global private @hoisted__size : index |
NewerOlder