Skip to content

Instantly share code, notes, and snippets.

# code
  - you can and SHOULD use AI when writing code
  - please avoid some of the things that AI likes to do in code:
    - write long, unnecessary comments on self explanatory code
    - use non-ascii characters in comments
    - generate redundant (even though it may be correct) code
    - write repetitive code that could be easily refactored
    - reimplement functionality that can be taken from a library
  - in short, PRs that are AI generated without human guidance tend to be unnecessarily long

Setup

start a vstart cluster with RGW

Alternative 1

Add object locking to all bucket creations via a lua.

  • upload the following script in prerequest context:
-- enablog object lock on bucket creation

copied from: https://claude.ai/share/e4bed98a-9049-44b3-9aee-173bba941120

When a Kafka producer sets partitions explicitly, there are several important trade-offs to consider:

Pros of Explicit Partition Assignment

  • Guaranteed Message Ordering: Messages sent to the same partition are guaranteed to maintain their order. This is crucial for use cases where sequence matters, like financial transactions or event sourcing.
  • Predictable Data Locality: You can ensure related messages always go to the same partition, which helps with consumer processing efficiency and stateful operations.
  • Load Distribution Control: You have fine-grained control over how messages are distributed across partitions, allowing you to optimize for your specific access patterns.
  • Deterministic Behavior: Your application's behavior becomes more predictable since you know exactly where each message will land.
  • machine
 lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda              8:0    0 893.8G  0 disk
└─sda1           8:1    0 893.7G  0 part /
nvme1n1        259:0    0   1.5T  0 disk
nvme0n1        259:1    0   1.5T  0 disk
├─vg_nvme-lv_1 253:0    0  89.4G  0 lvm
├─vg_nvme-lv_2 253:1    0  89.4G  0 lvm

Basic Bucket Logging Testing

  • to enable our extension to the API when using python (boto3 or aws CLI) the following file has to be placed under: ~/.aws/models/s3/2006-03-01/ (the directory should be created if it dioes not exist)
  • currently there is no generic solution for other client SDKs
  • start a vstart cluster
  • create a bucket:
aws --endpoint-url http://localhost:8000 s3 mb s3://fish
  • create a log bucket:

how much time did this refasctoring work took?

_ Looking at the timestamps in our conversation, the refactoring work took approximately 45-50 minutes from start to finish.

Timeline Breakdown:

Phase 1: Analysis & Planning (5-10 minutes)

  • Understanding the codebase structure
  • Identifying the RGWOp class hierarchy
  • Finding all canonical_name() overrides

this is a followup on: https://gist.github.com/yuvalif/d50bcc3d27121732ffbbafe7b7146112 for an RGW of version "squid" or above, where bucket notifications and topics are synced between zones

  • deploy a zonegroup with 2 zones:
MON=1 OSD=1 MDS=0 MGR=0 ../src/test/rgw/test-rgw-multisite.sh 2
  • export credentials:
export AWS_ACCESS_KEY_ID=1234567890
-- Lua script to auto-tier S3 object PUT requests
-- based on this: https://ceph.io/en/news/blog/2024/auto-tiering-ceph-object-storage-part-2/
-- exit script quickly if it is not a PUT request
if Request == nil or Request.RGWOp ~= "put_obj" then
return
end
local threshold = 1024*1024 -- 1MB
local debug = true
  • start a vstart cluster
  • created a tenanted user:
bin/radosgw-admin user create --display-name "Ka Boom" --tenant boom --uid ka --access_key ka --secret_key boom
  • create a bucket on that tenant
AWS_ACCESS_KEY_ID=ka AWS_SECRET_ACCESS_KEY=boom aws --endpoint-url http://localhost:8000 s3 mb s3://fish
  • create a log bucket with no tenant