Yuval Lifshitz yuvalif

# code
  - you can and SHOULD use AI when writing code
  - please avoid some of the things that AI likes to do in code:
    - write long, unnecessary comments on self explanatory code
    - use non-ascii characters in comments
    - generate redundant (even though it may be correct) code
    - write repetitive code that could be easily refactored
    - reimplement functionality that can be taken from a library
  - in short, PRs that are AI generated without human guidance tend to be unnecessarily long

Setup

start a vstart cluster with RGW

Alternative 1

Add object locking to all bucket creations via a lua.

upload the following script in prerequest context:

-- enablog object lock on bucket creation

copied from: https://claude.ai/share/e4bed98a-9049-44b3-9aee-173bba941120

When a Kafka producer sets partitions explicitly, there are several important trade-offs to consider:

Pros of Explicit Partition Assignment

Guaranteed Message Ordering: Messages sent to the same partition are guaranteed to maintain their order. This is crucial for use cases where sequence matters, like financial transactions or event sourcing.
Predictable Data Locality: You can ensure related messages always go to the same partition, which helps with consumer processing efficiency and stateful operations.
Load Distribution Control: You have fine-grained control over how messages are distributed across partitions, allowing you to optimize for your specific access patterns.
Deterministic Behavior: Your application's behavior becomes more predictable since you know exactly where each message will land.

machine

 lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda              8:0    0 893.8G  0 disk
└─sda1           8:1    0 893.7G  0 part /
nvme1n1        259:0    0   1.5T  0 disk
nvme0n1        259:1    0   1.5T  0 disk
├─vg_nvme-lv_1 253:0    0  89.4G  0 lvm
├─vg_nvme-lv_2 253:1    0  89.4G  0 lvm

Basic Bucket Logging Testing

to enable our extension to the API when using python (boto3 or aws CLI) the following file has to be placed under: ~/.aws/models/s3/2006-03-01/ (the directory should be created if it dioes not exist)

currently there is no generic solution for other client SDKs

start a vstart cluster
create a bucket:

aws --endpoint-url http://localhost:8000 s3 mb s3://fish

create a log bucket:

how much time did this refasctoring work took?

_ Looking at the timestamps in our conversation, the refactoring work took approximately 45-50 minutes from start to finish.

Timeline Breakdown:

Phase 1: Analysis & Planning (5-10 minutes)

Understanding the codebase structure
Identifying the RGWOp class hierarchy
Finding all canonical_name() overrides

this is a followup on: https://gist.github.com/yuvalif/d50bcc3d27121732ffbbafe7b7146112 for an RGW of version "squid" or above, where bucket notifications and topics are synced between zones

deploy a zonegroup with 2 zones:

MON=1 OSD=1 MDS=0 MGR=0 ../src/test/rgw/test-rgw-multisite.sh 2

export credentials:

export AWS_ACCESS_KEY_ID=1234567890

start a vstart cluster
created a tenanted user:

bin/radosgw-admin user create --display-name "Ka Boom" --tenant boom --uid ka --access_key ka --secret_key boom

create a bucket on that tenant

AWS_ACCESS_KEY_ID=ka AWS_SECRET_ACCESS_KEY=boom aws --endpoint-url http://localhost:8000 s3 mb s3://fish

create a log bucket with no tenant

	-- Lua script to auto-tier S3 object PUT requests
	-- based on this: https://ceph.io/en/news/blog/2024/auto-tiering-ceph-object-storage-part-2/

	-- exit script quickly if it is not a PUT request
	if Request == nil or Request.RGWOp ~= "put_obj" then
	return
	end

	local threshold = 1024*1024 -- 1MB
	local debug = true