Q: Do entity types and predicates improve with more data? A: YES! With the new refactoring tools, the graph continuously evolves to be more MECE and generic.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXTRACTION PHASE (Schemes 1-10) β
β β’ Uses initial entity types & predicates β
β β’ Discovers patterns: requires_age, requires_income, etc. β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ANALYSIS PHASE (After 10 schemes) β
β β’ analyze_entity_types() β Find underused types β
β β’ analyze_predicates() β Find redundant predicates β
β β’ Get AI suggestions for improvements β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REFACTORING PHASE (Improve schema) β
β β’ consolidate_predicates() β Merge requires_* β requires β
β β’ rename_entity_type() β age_requirement β requirement β
β β’ rename_predicate() β Normalize naming β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXTRACTION PHASE (Schemes 11-50) β
β β’ Uses improved, generic schema β
β β’ Better entity reuse β
β β’ More MECE compliance β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
REPEAT CYCLE
Current Issues Found:
β caste (1 entity) - Only 1 entity, consider merging
β area_type (1 entity) - Only 1 entity, consider merging
β occupation (1 entity) - Only 1 entity, consider merging
π‘ Recommendation: Standardize requirement types under "requirement"
What to do:
// Merge underused types into "requirement"
rename_entity_type({
oldType: "age_requirement",
newType: "requirement"
})
rename_entity_type({
oldType: "income_criteria",
newType: "requirement"
})
rename_entity_type({
oldType: "caste",
newType: "requirement"
})
rename_entity_type({
oldType: "area_type",
newType: "requirement"
})
// Result: More generic, MECE-compliant types!Current Issues Found:
β 7 different "requires_*" predicates:
- requires_age (6 uses)
- requires_income (5 uses)
- requires_gender (3 uses)
- requires_caste (3 uses)
- requires_area (1 use)
- requires_area_type (1 use)
- requires_occupation (1 use)
π‘ Suggested merge: "requires" (generic)
What to do:
// Consolidate all "requires_*" into generic "requires"
consolidate_predicates({
predicates: [
"requires_age",
"requires_income",
"requires_gender",
"requires_caste",
"requires_area",
"requires_area_type",
"requires_occupation"
],
targetPredicate: "requires"
})
// Similarly for "allows_*":
consolidate_predicates({
predicates: [
"allows_area_type",
"allows_caste",
"allows_gender"
],
targetPredicate: "allows"
})
// Result:
// Before: scheme requires_age age_min_40
// After: scheme requires requirement:age_min_40
// (Entity type provides context, predicate is generic!)Example: Merge all requirement types
// Before refactoring:
Entity Types:
- age_requirement (4 entities)
- income_criteria (4 entities)
- caste (1 entity)
- area_type (1 entity)
- occupation (1 entity)
Total: 5 types, 11 entities
// Refactoring:
rename_entity_type({ oldType: "age_requirement", newType: "requirement" })
rename_entity_type({ oldType: "income_criteria", newType: "requirement" })
rename_entity_type({ oldType: "caste", newType: "requirement" })
rename_entity_type({ oldType: "area_type", newType: "requirement" })
rename_entity_type({ oldType: "occupation", newType: "requirement" })
// After refactoring:
Entity Types:
- requirement (11 entities)
Total: 1 type, 11 entities
// Benefits:
β
More generic
β
Better entity reuse
β
Easier to query: list_entities_by_type("requirement")
β
MECE compliant (all requirements in one type)Example: Standardize predicate naming
// Fix inconsistencies
rename_predicate({
oldPredicate: "available_in",
newPredicate: "applies_to_location"
})
rename_predicate({
oldPredicate: "provides",
newPredicate: "offers"
})
// Result: More consistent, descriptive predicatesExample: Real refactoring from current graph
// Current state (BEFORE):
Predicates:
- requires_age (6 uses)
- requires_income (5 uses)
- requires_gender (3 uses)
- requires_caste (3 uses)
- requires_area (1 use)
- requires_area_type (1 use)
- requires_occupation (1 use)
Total: 7 predicates, 25 triples
// Consolidate:
consolidate_predicates({
predicates: [
"requires_age", "requires_income", "requires_gender",
"requires_caste", "requires_area", "requires_area_type",
"requires_occupation"
],
targetPredicate: "requires"
})
// Result (AFTER):
Predicates:
- requires (25 uses)
Total: 1 predicate, 25 triples
// Benefits:
β
Simpler schema
β
Entity type provides context (requirement:age_min_40)
β
More MECE (predicate is generic, specificity in entity)
β
Easier queries: find_triples({predicate: "requires"})Entity Types:
- scheme: 5
- beneficiary_type: 3
- age_requirement: 3
- income_criteria: 2
- gender: 1
- caste: 1
Predicates:
- targets: 5
- provides: 5
- requires_age: 3
- requires_income: 2
- requires_gender: 1Issues:
- Too many specific types
- Fragmented predicates
- Low entity reuse
Entity Types:
- scheme: 20
- beneficiary_type: 8
- benefit_type: 3
- requirement: 15 # β MERGED!
- location: 10
- gender: 2
Predicates:
- targets: 20
- provides: 20
- requires: 35 # β CONSOLIDATED!
- allows: 12 # β CONSOLIDATED!
- applies_to: 20Improvements:
- β Generic "requirement" type (MECE)
- β Consolidated predicates
- β 40% better entity reuse
- β Simpler schema
Entity Types:
- scheme: 100
- beneficiary: 25 # β RENAMED for clarity
- benefit: 10 # β RENAMED for clarity
- requirement: 50
- location: 30
Predicates:
- targets: 100
- provides: 100
- requires: 150
- allows: 50
- applies_to: 100Final State:
- β Highly generic types
- β Minimal predicates
- β 60%+ entity reuse
- β Fully MECE compliant
# Every 10-20 schemes processed:
1. Run analysis tools
2. Review suggestions
3. Apply refactorings
4. Continue extraction// After every N schemes, auto-refactor
if (processedSchemes.length % 10 === 0) {
const typeAnalysis = analyze_entity_types()
const predAnalysis = analyze_predicates()
// Auto-consolidate predicates if >3 similar ones found
if (predAnalysis.duplicates.length > 0) {
for (const dup of predAnalysis.duplicates) {
consolidate_predicates({
predicates: dup.predicates,
targetPredicate: dup.suggestedMerge
})
}
}
// Auto-merge underused types
const underused = typeAnalysis.types.filter(t => t.count === 1)
for (const type of underused) {
// Merge into related type (using semantic similarity)
rename_entity_type({
oldType: type.type,
newType: findSimilarType(type.type)
})
}
}- Don't wait until the end
- Refactor every 10-20 schemes
- Prevents schema ossification
- Always run
analyze_entity_types()andanalyze_predicates() - Get data-driven suggestions
- Don't guess at improvements
- Rename one predicate, verify triples updated correctly
- Check visualization still works
- Validate queries return expected results
- Keep a refactoring log
- Note: "Consolidated requires_* β requires (Iteration 2)"
- Helps track evolution
- When consolidating, add synonyms:
consolidate_predicates({
predicates: ["requires_age", "requires_income"],
targetPredicate: "requires"
})
// Then add synonym mappings for backward compatibility
add_synonym({ variant: "requires_age", canonical: "requires" })
add_synonym({ variant: "requires_income", canonical: "requires" })// Entity reuse rate (target: >60%)
entity_reuse = (total_entities / (schemes * avg_entities_per_scheme)) * 100
// Type diversity (target: 5-10 types)
type_count = unique(entity.type for entity in entities)
// Predicate diversity (target: 3-8 predicates)
predicate_count = unique(triple.predicate for triple in triples)
// Schema stability (lower = better)
schema_changes_per_iteration = refactorings / iterations
// MECE compliance score
mece_score = (
entity_reuse * 0.4 + // 40% weight
(100 - type_count * 5) * 0.3 + // 30% weight (fewer types = better)
(100 - predicate_count * 5) * 0.3 // 30% weight (fewer predicates = better)
)Iteration 1 (0-10 schemes):
Entity Reuse: 20%
Type Count: 12
Predicate Count: 15
MECE Score: 35/100
Iteration 2 (11-30 schemes, after refactoring):
Entity Reuse: 45%
Type Count: 7
Predicate Count: 8
MECE Score: 68/100
Iteration 3 (31-100 schemes, after more refactoring):
Entity Reuse: 62%
Type Count: 5
Predicate Count: 5
MECE Score: 87/100
- Process 10-20 schemes with current schema
- Run analysis tools to get suggestions
- Apply refactorings based on recommendations
- Continue extraction with improved schema
- Repeat every 10-20 schemes
The graph continuously evolves to be more MECE and generic! π
| Tool | Purpose | When to Use |
|---|---|---|
analyze_entity_types() |
Find improvement opportunities | Every 10-20 schemes |
analyze_predicates() |
Find redundant predicates | Every 10-20 schemes |
rename_entity_type() |
Standardize type names | After analysis |
rename_predicate() |
Fix predicate naming | After analysis |
consolidate_predicates() |
Merge redundant predicates | When >3 similar found |
The graph schema is not static - it's a living, evolving system that becomes more MECE and generic as you process more data and learn patterns!
With these refactoring tools, you can:
- β Start with specific types/predicates
- β Learn from data
- β Refactor to be more generic
- β Improve continuously
- β Achieve higher MECE compliance over time