Stella Laurenzo stellaraccident

TheRock Already Provides Shared libpython for Nuitka/Embedding

Summary

Yes, TheRock already builds Python with --enable-shared for embedding use cases. The infrastructure exists and is used by rocgdb. rocprofiler-compute just needs to wire into it.

Current Infrastructure

1. `install_shared_pythons.sh` builds Python with shared library support

Installing ROCm from TheRock ASAN CI Artifacts

Source: CI ASAN Run #21463906609

Artifacts Index: https://therock-ci-artifacts.s3.amazonaws.com/21463906609-linux/index-gfx94X-dcgpu-asan.html

Basic Installation

From TheRock repo root:

Quartz Design Document: ROCm CI/CD Dashboard & Orchestration

Executive Summary

This document provides architectural guidance for "Quartz" - a PyTorch HUD-like system for ROCm downstream CI/CD orchestration. The junior engineer's instinct to start with status.json is understandable but insufficient for the stated requirements. A database-first approach is correct.

Requirements Recap

Plan: Add MoE Components (mul_mat_id, moe_ffn_block)

Overview

Extract MoE primitives from /develop/ai-no-fluff/kb/ben/moe_f32_parameterized.mlir:

mul_mat_id - Expert-selected matrix multiplication (gather + batch_matmul)
moe_ffn_block - Full MoE FFN block composing routing, expert compute, weighted sum

Key challenge: moe_ffn_block depends on mul_mat_id and swiglu. Need systematic composition without manual inlining.

Why Merge Commits Beat Pushing Hundreds of Commits to Main

The Comparison

Merge Commit	Individual Commits on Main
One atomic integration point	500 commits sprawled on main
`git revert -m1 <merge>` undoes everything	Good luck reverting
`git bisect` can skip the whole merge	Bisect walks through 500 commits
Main history is readable	Main history is chaos

TheRock: Developer Builds from PRs and PyTorch Packages

Claude Code Prompt: Read the docs and workflows in therock using subagents and tell me how to make developer builds from a PR and get packages w/ pytorch.

Triggering PR Builds

Label-based (simplest)

Add labels to your PR:

Notes on integrating LLVM from the OSS side

Strategy 1: Sync everything to a Google/TensorFlow commit

cd ~/src
git clone git clone https://github.com/tensorflow/tensorflow.git
git clone https://github.com/tensorflow/mlir-hlo.git

	#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
	#map1 = affine_map<(d0, d1) -> (d1, d0)>
	#map2 = affine_map<(d0, d1) -> (d0, d1)>
	#map3 = affine_map<(d0, d1) -> (d1)>
	#map4 = affine_map<(d0, d1, d2) -> (d2, d0)>
	#map5 = affine_map<(d0, d1, d2) -> (d1, d2)>
	#map6 = affine_map<(d0, d1, d2) -> (d0, d2)>
	#map7 = affine_map<(d0, d1, d2) -> (d2, d0, d1)>
	#map8 = affine_map<(d0, d1, d2) -> (d1, d2, d0)>
	#map9 = affine_map<(d0, d1, d2) -> (d2)>

	module @aqt_matmul {
	iree_input.global private @_params$0 = dense<[[0.000000e+00, 5.003000e+02, 1.000600e+03], [1500.8999, 2.001200e+03, 2.501500e+03], [3001.7998, 3502.09985, 4.002400e+03], [4502.69971, 5.003000e+03, 5.503300e+03], [6003.59961, 6503.8999, 7004.1997], [7.504500e+03, 8004.7998, 8.505100e+03]]> : tensor<6x3xf32>
	iree_input.global private @_params$1 = dense<5.000000e+00> : tensor<f32>
	func @compute_native(%arg0: tensor<5x6xf32>) -> tensor<5x3xf32> {
	%0 = iree_input.global.load @_params$0 : tensor<6x3xf32>
	%1 = iree_input.global.load @_params$1 : tensor<f32>
	%2 = call @main(%0, %1, %arg0) : (tensor<6x3xf32>, tensor<f32>, tensor<5x6xf32>) -> tensor<5x3xf32>
	return %2 : tensor<5x3xf32>
	}
	func private @main(%arg0: tensor<6x3xf32>, %arg1: tensor<f32>, %arg2: tensor<5x6xf32>) -> tensor<5x3xf32> {

	#device_target_vmvx = #hal.device.target<"vmvx", {executable_targets = [#hal.executable.target<"vmvx", "vmvx-bytecode-fb">]}>
	module attributes {hal.device.targets = [#device_target_vmvx]} {
	util.global private @hoisted_1 : !hal.buffer
	util.global private @hoisted_1__offset : index
	util.global private @hoisted_1__size : index
	util.global private @hoisted_0 : !hal.buffer
	util.global private @hoisted : !hal.buffer
	util.global private @hoisted__storage_size : index
	util.global private @hoisted__offset : index
	util.global private @hoisted__size : index