planetis-m · December 25, 2025 21:53
diff --git a/gistfile1.txt b/gistfile1.txt
 I am building a Python translation system called KDEAI. 
 I have placed the engineering specification in SPEC.md and the implementation plan in PLAN.md.

 Please read SPEC.md carefully. This system has strict requirements regarding file locking, hashing, and determinism. 
 We will implement this iteratively.

 First, please initialize the project structure:
 1. Create the `kdeai/` directory.
 2. Create a `pyproject.toml` using Python 3.12+.
 3. Add dependencies: `polib`, `portalocker`, `spacy`, `sqlite-vector` (if available via pip, otherwise note it), `typer` (for CLI), and `pydantic` (for config validation).
 4. Create an empty `__init__.py` in `kdeai/`.




 Let's implement the Safety Core. Reference SPEC.md Sections 6, 7, and 10.
 Implement the following modules:
 1. `kdeai/locks.py`: Implement the global run lock and per-file locks using `portalocker`. 
 2. `kdeai/hash.py`: Implement the hashing functions exactly as described in Section 10 (translation_hash, state_hash, etc).
 3. `kdeai/snapshot.py`: Implement `locked_read_file` as described in Section 7.2.

 Create a test file `tests/test_safety.py` to verify that locks work and hashes are deterministic. Run the test.



 Reference SPEC.md Sections 5 and 8. 
 Implement:
 1. `kdeai/config.py`: Load JSON, compute `config_hash` and `embed_policy_hash`. 
 2. `kdeai/po_model.py`: Use `polib` to parse files. Implement the canonical text derivation and `source_key` logic.

 Ensure `source_key` generation is strictly tested against the spec definitions.



 Reference SPEC.md Sections 12, 13, 14, and 15.
 Implement `kdeai/db.py`.
 - Define the SQL schemas for Workspace TM, Reference TM, Examples, and Glossary.
 - Implement the connection helpers with the specific PRAGMAs mentioned in Section 12 (WAL, busy_timeout, foreign_keys).
 - Create a helper to validate the `meta` table keys as per Section 14.



 We are moving to the Logic layer. Reference SPEC.md Sections 16, 17, and 22.
 Implement `kdeai/workspace_tm.py` and `kdeai/retrieve_tm.py`.
 - Ensure `index_file_snapshot_tm` is atomic.
 - Ensure retrieval follows the scope order (Session -> Workspace -> Reference).
 - Implement the "Exact-only" rule for TM reuse.



 Reference SPEC.md Sections 19, 20, and 21.
 Implement `kdeai/plan.py` and `kdeai/apply.py`.
 - The Plan JSON must be deterministic (Section 19).
 - The Apply phase must follow the "Two-phase apply" algorithm (Section 20.3).
 - Implement the validators in `kdeai/validate.py`.

 This is the most complex logic. Write a test case that creates a mock plan and attempts to apply it with `apply-mode=strict`.



 Finally, implement the CLI interface in `kdeai/cli.py` using Typer.
 Reference SPEC.md Section 23.
 Wire up the commands: init, plan, apply, index, doctor.
 Ensure the entry point handles the Global Run Lock (Section 6.3) before doing anything else.



 We need to implement the Glossary system. Please check SPEC.md Sections 13.4, 15.4, and especially Section 18 (Normalization and Matching).

 Implement `kdeai/glossary.py`:
 1. Implement the `kdeai_glossary_norm_v1` using spaCy. It must handle tokenization and lemmatization exactly as specified.
 2. Implement the "Miner" that reads from a Reference TM and builds the immutable Glossary DB.
 3. Implement the "Matcher" using a Trie structure to find terms in source text deterministically.
 4. Ensure the `meta` table validation checks the `normalization_id`.

 Update `kdeai/db.py` if necessary to support the Glossary schema. Create a small test to verify that the Trie correctly matches terms in a sample sentence.



 Now implement the Examples system. Check SPEC.md Sections 13.3, 15.3, and 17.2.

 Implement `kdeai/examples.py`:
 1. Implement the logic to build an Examples DB from Workspace or Reference TMs.
 2. Ensure it respects `embed_policy_hash`.
 3. Implement the retrieval using `sqlite-vector`.
   * Note: If `sqlite-vector` cannot be easily installed in this environment, please implement the SQL query generation assuming the extension is loaded, and provide a clear place in `kdeai/db.py` where the extension should be loaded via `con.enable_load_extension`.
 4. Enforce the "Non-empty eligibility" rules for selecting examples.



 Implement `kdeai/prompt.py` based on SPEC.md Section 5.4 and 8.3.
 This module should:
 1. Construct the `source_text_v1` string (ctx + id + pl).
 2. Format the few-shot examples and glossary terms into a system/user prompt structure.
 3. This module should NOT call the LLM; it should only build the payload.



 Implement `kdeai/plan.py` and the `plan` command in `kdeai/cli.py`.
 Reference SPEC.md Section 19 (Plan Format) and Section 17 (Retrieval).

 The Planner must:
 1. Iterate through .po files.
 2. Perform Retrieval:
   - Check TM scopes in order (Session -> Workspace -> Reference).
   - If exact match found: Create a `copy_tm` plan item.
   - If no match: Retrieve Examples & Glossary, and create an `llm` plan item.
 3. Serialize the Plan to JSON using the "Canonical JSON" rules (sorted keys) to ensure the `plan_id` is deterministic.
 4. Ensure no DB-local IDs (rowids) are written to the plan.




 Implement `kdeai/validate.py` based on SPEC.md Section 21.
 Create the following validators:
 1. `NonEmptyValidator`: Rejects empty strings.
 2. `PluralConsistencyValidator`: Checks if the number of plural forms matches the header.
 3. `TagIntegrityValidator`: Ensures printf-style tags (like `%s` or `{name}`) present in source are preserved in translation.



 Implement `kdeai/apply.py` and the `apply` command in `kdeai/cli.py`.
 Reference SPEC.md Section 20.

 Implement the strict "Two-Phase Apply":
 1. Phase A: Locked Read & Hash Check (Verify `base_sha256`).
 2. Phase B: Apply patches in memory, run `validate.py`, and serialize.
 3. Phase C: Locked Write (Atomic `os.replace`).
 4. Implement the `strict` vs `rebase` modes.
 5. If `post-index` is on, call the workspace indexer *after* releasing the file lock.


 Implement `kdeai/doctor.py` and `kdeai/gc.py`.
 Reference SPEC.md Section 24 and 22.6.

 1. `kdeai doctor`: Check that the Run Lock works, config hash matches, and DB pointers are valid.
 2. `kdeai gc`: Implement cleaning up old workspace TM entries (TTL).
 3. Wire these into `kdeai/cli.py`.


 Please review `kdeai/cli.py`.
 Ensure all commands defined in SPEC.md Section 23.1 are implemented:
 - init, plan, apply, translate, index, reference build, examples build, glossary build, gc, doctor.

 Ensure every command acquires the Global Run Lock (Section 6.3) at startup and handles the `LockException` gracefully.



 We need to implement the LLM interaction layer using `dspy`.

 1. Add `dspy` to `pyproject.toml` dependencies.
 2. Create a new module `kdeai/llm_provider.py`.
 3. In this module, create a `configure_dspy(config: Config)` function.
   - It should read `config.prompt.embedding_policy.model_id` (or add a specific generation model key to config if missing from the spec, default to `gpt-4o` for now).
   - Use `dspy.configure` to set up the LM.
   - Ensure this respects the `config.json` settings.

 Note: The spec forbids LLM calls inside file locks. This module will be used strictly *between* the Planning and Applying phases.



 Implement the Translation Logic in `kdeai/llm.py`.

 1. Define a `dspy.Signature` named `TranslationSignature`.
   - Inputs: 
     - `source_context` (msgctxt)
     - `source_text` (msgid)
     - `plural_text` (msgid_plural)
     - `target_lang`
     - `glossary_context` (string of comma-separated terms)
     - `few_shot_examples` (string format of retrieved TM matches)
   - Outputs:
     - `translated_text`
     - `translated_plural` (if applicable)

 2. Create a `dspy.Module` named `KDEAITranslator`.
   - It should take the `PromptData` we built in `kdeai/prompt.py`.
   - It should call the ChainOfThought or Predict module using the signature.
   - It needs to handle the logic for singular vs plural outputs.

 3. Create a helper function `batch_translate_plan(plan: Plan, config: Config) -> Plan`.
   - This function takes a generated Plan (which currently has empty translations for 'llm' items).
   - It iterates through the plan items marked as `llm`.
   - It calls the `KDEAITranslator`.
   - It **updates the Plan items in-place** with the resulting `msgstr`.
   - It adds the `KDEAI-AI: model=...` comment to the plan item as per Spec Section 9.2.
   
   
 
 Now update `kdeai/cli.py` to implement the `translate` command properly. 
 This command must strictly follow the "Single-Threaded, Lock-Free LLM" rule from Spec Section 2.4.

 Implement `translate` with this exact workflow:

 1. **Phase 1: Planning (Locked)**
   - Acquire file lock.
   - Read file (`locked_read_file`).
   - Run the Planner to generate a Plan.
   - **Release the file lock.** (CRITICAL)

 2. **Phase 2: Inference (Unlocked)**
   - Check if the plan has any `llm` action items.
   - If yes, call `kdeai.llm.batch_translate_plan(plan)`.
   - This runs the DSPy logic over the network without holding disk locks.

 3. **Phase 3: Applying (Locked)**
   - Acquire file lock again.
   - Re-verify the file hash (it might have changed, though unlikely in single-user mode).
   - Call the Applier with the fully populated Plan.
   - **Release the file lock.**

 Refactor the `translate` command loop to ensure this "Lock -> Unlock -> Lock" pattern is enforced for every file processed.


diff --git a/SPEC.md b/SPEC.md
	I am building a Python translation system called KDEAI.
	I have placed the engineering specification in SPEC.md and the implementation plan in PLAN.md.

	Please read SPEC.md carefully. This system has strict requirements regarding file locking, hashing, and determinism.
	We will implement this iteratively.

	First, please initialize the project structure:
	1. Create the `kdeai/` directory.
	2. Create a `pyproject.toml` using Python 3.12+.
	3. Add dependencies: `polib`, `portalocker`, `spacy`, `sqlite-vector` (if available via pip, otherwise note it), `typer` (for CLI), and `pydantic` (for config validation).
	4. Create an empty `__init__.py` in `kdeai/`.




	Let's implement the Safety Core. Reference SPEC.md Sections 6, 7, and 10.
	Implement the following modules:
	1. `kdeai/locks.py`: Implement the global run lock and per-file locks using `portalocker`.
	2. `kdeai/hash.py`: Implement the hashing functions exactly as described in Section 10 (translation_hash, state_hash, etc).
	3. `kdeai/snapshot.py`: Implement `locked_read_file` as described in Section 7.2.

	Create a test file `tests/test_safety.py` to verify that locks work and hashes are deterministic. Run the test.



	Reference SPEC.md Sections 5 and 8.
	Implement:
	1. `kdeai/config.py`: Load JSON, compute `config_hash` and `embed_policy_hash`.
	2. `kdeai/po_model.py`: Use `polib` to parse files. Implement the canonical text derivation and `source_key` logic.

	Ensure `source_key` generation is strictly tested against the spec definitions.



	Reference SPEC.md Sections 12, 13, 14, and 15.
	Implement `kdeai/db.py`.
	- Define the SQL schemas for Workspace TM, Reference TM, Examples, and Glossary.
	- Implement the connection helpers with the specific PRAGMAs mentioned in Section 12 (WAL, busy_timeout, foreign_keys).
	- Create a helper to validate the `meta` table keys as per Section 14.



	We are moving to the Logic layer. Reference SPEC.md Sections 16, 17, and 22.
	Implement `kdeai/workspace_tm.py` and `kdeai/retrieve_tm.py`.
	- Ensure `index_file_snapshot_tm` is atomic.
	- Ensure retrieval follows the scope order (Session -> Workspace -> Reference).
	- Implement the "Exact-only" rule for TM reuse.



	Reference SPEC.md Sections 19, 20, and 21.
	Implement `kdeai/plan.py` and `kdeai/apply.py`.
	- The Plan JSON must be deterministic (Section 19).
	- The Apply phase must follow the "Two-phase apply" algorithm (Section 20.3).
	- Implement the validators in `kdeai/validate.py`.

	This is the most complex logic. Write a test case that creates a mock plan and attempts to apply it with `apply-mode=strict`.



	Finally, implement the CLI interface in `kdeai/cli.py` using Typer.
	Reference SPEC.md Section 23.
	Wire up the commands: init, plan, apply, index, doctor.
	Ensure the entry point handles the Global Run Lock (Section 6.3) before doing anything else.



	We need to implement the Glossary system. Please check SPEC.md Sections 13.4, 15.4, and especially Section 18 (Normalization and Matching).

	Implement `kdeai/glossary.py`:
	1. Implement the `kdeai_glossary_norm_v1` using spaCy. It must handle tokenization and lemmatization exactly as specified.
	2. Implement the "Miner" that reads from a Reference TM and builds the immutable Glossary DB.
	3. Implement the "Matcher" using a Trie structure to find terms in source text deterministically.
	4. Ensure the `meta` table validation checks the `normalization_id`.

	Update `kdeai/db.py` if necessary to support the Glossary schema. Create a small test to verify that the Trie correctly matches terms in a sample sentence.



	Now implement the Examples system. Check SPEC.md Sections 13.3, 15.3, and 17.2.

	Implement `kdeai/examples.py`:
	1. Implement the logic to build an Examples DB from Workspace or Reference TMs.
	2. Ensure it respects `embed_policy_hash`.
	3. Implement the retrieval using `sqlite-vector`.
	* Note: If `sqlite-vector` cannot be easily installed in this environment, please implement the SQL query generation assuming the extension is loaded, and provide a clear place in `kdeai/db.py` where the extension should be loaded via `con.enable_load_extension`.
	4. Enforce the "Non-empty eligibility" rules for selecting examples.



	Implement `kdeai/prompt.py` based on SPEC.md Section 5.4 and 8.3.
	This module should:
	1. Construct the `source_text_v1` string (ctx + id + pl).
	2. Format the few-shot examples and glossary terms into a system/user prompt structure.
	3. This module should NOT call the LLM; it should only build the payload.



	Implement `kdeai/plan.py` and the `plan` command in `kdeai/cli.py`.
	Reference SPEC.md Section 19 (Plan Format) and Section 17 (Retrieval).

	The Planner must:
	1. Iterate through .po files.
	2. Perform Retrieval:
	- Check TM scopes in order (Session -> Workspace -> Reference).
	- If exact match found: Create a `copy_tm` plan item.
	- If no match: Retrieve Examples & Glossary, and create an `llm` plan item.
	3. Serialize the Plan to JSON using the "Canonical JSON" rules (sorted keys) to ensure the `plan_id` is deterministic.
	4. Ensure no DB-local IDs (rowids) are written to the plan.




	Implement `kdeai/validate.py` based on SPEC.md Section 21.
	Create the following validators:
	1. `NonEmptyValidator`: Rejects empty strings.
	2. `PluralConsistencyValidator`: Checks if the number of plural forms matches the header.
	3. `TagIntegrityValidator`: Ensures printf-style tags (like `%s` or `{name}`) present in source are preserved in translation.



	Implement `kdeai/apply.py` and the `apply` command in `kdeai/cli.py`.
	Reference SPEC.md Section 20.

	Implement the strict "Two-Phase Apply":
	1. Phase A: Locked Read & Hash Check (Verify `base_sha256`).
	2. Phase B: Apply patches in memory, run `validate.py`, and serialize.
	3. Phase C: Locked Write (Atomic `os.replace`).
	4. Implement the `strict` vs `rebase` modes.
	5. If `post-index` is on, call the workspace indexer after releasing the file lock.


	Implement `kdeai/doctor.py` and `kdeai/gc.py`.
	Reference SPEC.md Section 24 and 22.6.

	1. `kdeai doctor`: Check that the Run Lock works, config hash matches, and DB pointers are valid.
	2. `kdeai gc`: Implement cleaning up old workspace TM entries (TTL).
	3. Wire these into `kdeai/cli.py`.


	Please review `kdeai/cli.py`.
	Ensure all commands defined in SPEC.md Section 23.1 are implemented:
	- init, plan, apply, translate, index, reference build, examples build, glossary build, gc, doctor.

	Ensure every command acquires the Global Run Lock (Section 6.3) at startup and handles the `LockException` gracefully.



	We need to implement the LLM interaction layer using `dspy`.

	1. Add `dspy` to `pyproject.toml` dependencies.
	2. Create a new module `kdeai/llm_provider.py`.
	3. In this module, create a `configure_dspy(config: Config)` function.
	- It should read `config.prompt.embedding_policy.model_id` (or add a specific generation model key to config if missing from the spec, default to `gpt-4o` for now).
	- Use `dspy.configure` to set up the LM.
	- Ensure this respects the `config.json` settings.

	Note: The spec forbids LLM calls inside file locks. This module will be used strictly between the Planning and Applying phases.



	Implement the Translation Logic in `kdeai/llm.py`.

	1. Define a `dspy.Signature` named `TranslationSignature`.
	- Inputs:
	- `source_context` (msgctxt)
	- `source_text` (msgid)
	- `plural_text` (msgid_plural)
	- `target_lang`
	- `glossary_context` (string of comma-separated terms)
	- `few_shot_examples` (string format of retrieved TM matches)
	- Outputs:
	- `translated_text`
	- `translated_plural` (if applicable)

	2. Create a `dspy.Module` named `KDEAITranslator`.
	- It should take the `PromptData` we built in `kdeai/prompt.py`.
	- It should call the ChainOfThought or Predict module using the signature.
	- It needs to handle the logic for singular vs plural outputs.

	3. Create a helper function `batch_translate_plan(plan: Plan, config: Config) -> Plan`.
	- This function takes a generated Plan (which currently has empty translations for 'llm' items).
	- It iterates through the plan items marked as `llm`.
	- It calls the `KDEAITranslator`.
	- It updates the Plan items in-place with the resulting `msgstr`.
	- It adds the `KDEAI-AI: model=...` comment to the plan item as per Spec Section 9.2.



	Now update `kdeai/cli.py` to implement the `translate` command properly.
	This command must strictly follow the "Single-Threaded, Lock-Free LLM" rule from Spec Section 2.4.

	Implement `translate` with this exact workflow:

	1. Phase 1: Planning (Locked)
	- Acquire file lock.
	- Read file (`locked_read_file`).
	- Run the Planner to generate a Plan.
	- Release the file lock. (CRITICAL)

	2. Phase 2: Inference (Unlocked)
	- Check if the plan has any `llm` action items.
	- If yes, call `kdeai.llm.batch_translate_plan(plan)`.
	- This runs the DSPy logic over the network without holding disk locks.

	3. Phase 3: Applying (Locked)
	- Acquire file lock again.
	- Re-verify the file hash (it might have changed, though unlikely in single-user mode).
	- Call the Applier with the fully populated Plan.
	- Release the file lock.

	Refactor the `translate` command loop to ensure this "Lock -> Unlock -> Lock" pattern is enforced for every file processed.
No results found