Skip to content

Instantly share code, notes, and snippets.

@rh0dium
Created December 14, 2025 17:44
Show Gist options
  • Select an option

  • Save rh0dium/dd8adf9a729055eefc36a479cb076a8e to your computer and use it in GitHub Desktop.

Select an option

Save rh0dium/dd8adf9a729055eefc36a479cb076a8e to your computer and use it in GitHub Desktop.
Migration Plan: Haystack to django-opensearch-dsl (Issue #2501)

Migration Plan: Haystack to django-opensearch-dsl

Problem Statement

Issue #2501: elasticsearch-py==7.17.12 requires urllib3<2, causing dependency conflicts with other packages that have moved to urllib3 2.x.

Solution

Migrate from django-haystack + elasticsearch-py to django-opensearch-dsl + opensearch-py, which supports urllib3 2.x (urllib3!=2.2.0,!=2.2.1,<3,>=1.26.19).


Background & Rationale

Source Commit

Commit: 170f197fc97e6fb4a59e4ba1818c7c5bf58d562e Title: Replace AWSSignedTransport with OpenSearchTransport in settings Date: Tue Jan 14 14:43:42 2025

Why the Custom Transport Exists

The custom OpenSearchTransport class in settings/utils/aws_elasticsearch_transport.py was created to solve a compatibility issue between:

  1. elasticsearch-py 7.x client - Has a built-in "product check" that verifies it's talking to genuine Elasticsearch
  2. AWS OpenSearch - A fork of Elasticsearch that returns different headers/taglines

The Problem:

  • elasticsearch-py 7.x verifies the server by checking:
    • Tagline: "You Know, for Search" (OpenSearch returns "The OpenSearch Project...")
    • Header: X-Elastic-Product: Elasticsearch (OpenSearch doesn't send this)
  • Without modification, elasticsearch-py raises UnsupportedProductError when connecting to OpenSearch

The Workaround: The custom transport overrides _do_verify_elasticsearch() and _ProductChecker to:

  • Detect OpenSearch via "opensearch" in tagline.lower()
  • Bypass the strict Elasticsearch validation for OpenSearch connections
  • Use requests_aws4auth.AWS4Auth for AWS IAM authentication

Code Evolution:

  1. Original (6b7f814c8a): AWSSignedTransport using opensearchpy.AWSV4SignerAuth
  2. Current (170f197fc9): OpenSearchTransport with custom product checker

Why This Is Technical Debt

  1. Performance bug: The _do_verify_elasticsearch method doesn't cache results properly - it makes an extra GET / request on every API call
  2. Maintenance burden: Custom transport code must be updated if elasticsearch-py internals change
  3. Version lock: Prevents upgrading elasticsearch-py beyond 7.x

urllib3 Dependency Comparison

Package urllib3 Requirement Status
elasticsearch==7.17.12 urllib3<2,>=1.21.1 BLOCKS urllib3 2.x
opensearch-py==3.1.0 urllib3!=2.2.0,!=2.2.1,<3,>=1.26.19 SUPPORTS urllib3 2.x

Docker Considerations

Current State:

  • Local dev uses Elasticsearch 7.10.2 (docker.elastic.co/elasticsearch/elasticsearch:7.10.2)
  • Kibana 8.6.0 is a version mismatch (should be 7.10.x for ES 7.10.2)
  • AWS production runs OpenSearch 2.x

Migration Approach:

  • Switch local Docker to OpenSearch 2.13.0 to align with AWS production
  • Replace Kibana with OpenSearch Dashboards 2.13.0
  • Disable security plugin for local dev simplicity (DISABLE_SECURITY_PLUGIN=true)

Version Rationale:

  • OpenSearch 2.13+ is under AWS standard support until Nov 2025+
  • Versions 2.3-2.9 lose standard support Nov 7, 2025
  • 2.13.0 provides good stability and opensearch-py 3.1.0 compatibility

References:

Scope Summary

  • 12 SearchIndex classes → Document classes
  • 12 search templates → field definitions (or keep templates)
  • 2 custom integrations (AxisAdmin search, AxisSearchFilter)
  • 1 Celery task (batch index updates)
  • Settings across 6 environments
  • Docker local Elasticsearch → OpenSearch

Phase 1: Dependencies & Settings

1.1 Update pyproject.toml

# Remove:
"django-haystack~=3.3.0",
"elasticsearch~=7.17.12",  # hold - tied to ES 7.x server

# Add:
"django-opensearch-dsl~=0.6.2",
# Keep: "opensearch-py~=3.1.0" (already present)

1.2 Update settings/base.py

Replace HAYSTACK_CONNECTIONS with OPENSEARCH_DSL:

OPENSEARCH_DSL = {
    "default": {
        "hosts": "localhost:9200",
    }
}

1.3 Update settings for each environment

  • settings/production.py - AWS OpenSearch with auth
  • settings/staging.py
  • settings/beta.py
  • settings/gamma.py
  • settings/demo.py
  • settings/dev_docker.py

Production example:

OPENSEARCH_DSL = {
    "default": {
        "hosts": OPENSEARCH_HOST,
        "http_auth": awsauth,
        "use_ssl": True,
        "verify_certs": True,
    }
}

1.4 Remove custom transport

Delete settings/utils/aws_elasticsearch_transport.py (no longer needed - opensearch-py handles this natively)

1.5 Update INSTALLED_APPS

# Remove: "haystack"
# Add: "django_opensearch_dsl"

Phase 2: Convert SearchIndex → Document Classes

Pattern: Haystack → django-opensearch-dsl

Before (Haystack):

from haystack import indexes
class UserIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    def get_model(self):
        return User

After (django-opensearch-dsl):

from django_opensearch_dsl import Document, fields
from django_opensearch_dsl.registries import registry

@registry.register_document
class UserDocument(Document):
    # Define fields explicitly instead of template
    first_name = fields.TextField()
    last_name = fields.TextField()
    email = fields.TextField()
    # ... other fields

    class Index:
        name = "users"
        settings = {"number_of_shards": 1, "number_of_replicas": 0}

    class Django:
        model = User
        fields = ["id", "username"]  # Simple fields auto-mapped

2.1 Documents to Create (12 total)

Create documents.py in each app:

App File Index Name
core axis/core/documents.py users
company axis/company/documents.py companies
community axis/community/documents.py communities
geographic axis/geographic/documents.py cities
home axis/home/documents.py eep_program_home_statuses
floorplan axis/floorplan/documents.py floorplans, simulations
subdivision axis/subdivision/documents.py subdivisions
invoicing axis/invoicing/documents.py invoices
user_management axis/user_management/documents.py accreditations
customer_hirl axis/customer_hirl/documents.py hirl_projects, verification_report_cells

2.2 Special Cases

UserDocument - has should_update() logic:

class UserDocument(Document):
    class Django:
        model = User
        # Implement via queryset_pagination or ignore_signals

VerificationReportCellDocument - has custom prepare_text():

  • Convert template rendering to explicit prepare_* methods

HIRLProjectDocument - has complex index_queryset():

  • Use get_queryset() method with select_related/prefetch_related

Phase 3: Update Integrations

3.1 AxisAdmin Search

File: axis/core/admin/axis_admin.py

Replace haystack SearchQuerySet with opensearch-dsl queries:

# Before:
from haystack.query import SearchQuerySet
sqs = SearchQuerySet().models(model).auto_query(search_term)

# After:
from django_opensearch_dsl.registries import registry
doc_class = registry.get_documents(models=[model])[0]
search = doc_class.search().query("multi_match", query=search_term, fields=["*"])

3.2 AxisSearchFilter

File: axis/core/api_v3/filters/axis.py

Replace haystack query API with opensearch-dsl:

# Before:
from haystack.query import SearchQuerySet
from haystack.inputs import AutoQuery
qs = SearchQuerySet().models(model).filter(text=AutoQuery(search_str))

# After:
from django_opensearch_dsl.registries import registry
doc_class = registry.get_documents(models=[model])[0]
search = doc_class.search().query("query_string", query=search_str)

3.3 Celery Task

File: axis/customer_hirl/tasks/update_verification_report_index.py

Replace batch update with django-opensearch-dsl bulk:

# Before:
from haystack import connections
search_backend = connections["default"].get_backend()
search_backend.update(index, batch)

# After:
from django_opensearch_dsl.registries import registry
doc_class = registry.get_documents(models=[Model])[0]
doc_class().update(queryset)

Phase 4: Docker & Local Development

4.1 OpenSearch Version Selection

AWS Production: OpenSearch 2.x Recommended Local Version: OpenSearch 2.13.0 (or 2.11.1)

Version rationale:

  • 2.13+ is under standard support until Nov 2025+
  • 2.x versions 2.3-2.9 lose standard support Nov 7, 2025
  • OpenSearch 2.13 has good stability and opensearch-py 3.1.0 compatibility
  • Aligns with AWS production 2.x series

4.2 Update docker-compose.yml

Current:

elasticsearch:
  image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
  container_name: elasticsearch
  environment:
    - discovery.type=single-node
    - ES_JAVA_OPTS=-Xms512m -Xmx512m

New:

opensearch:
  image: opensearchproject/opensearch:2.13.0
  container_name: opensearch
  environment:
    - discovery.type=single-node
    - DISABLE_INSTALL_DEMO_CONFIG=true
    - DISABLE_SECURITY_PLUGIN=true  # For local dev simplicity
    - OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m
  ports:
    - "9200:9200"
    - "9600:9600"  # Performance analyzer
  volumes:
    - ../docker/${COMPOSE_PROJECT_NAME:-axis}/opensearch/data:/usr/share/opensearch/data
  networks:
    - main
  restart: unless-stopped

4.3 Update docker-compose.e2e.yml

Same pattern for E2E testing environment (port 9201).

4.4 Update Kibana → OpenSearch Dashboards

Current (version mismatch):

kibana:
  image: docker.elastic.co/kibana/kibana:8.6.0

New:

opensearch-dashboards:
  image: opensearchproject/opensearch-dashboards:2.13.0
  container_name: opensearch-dashboards
  environment:
    - OPENSEARCH_HOSTS=["http://opensearch:9200"]
    - DISABLE_SECURITY_DASHBOARDS_PLUGIN=true
  ports:
    - "5601:5601"
  depends_on:
    - opensearch

4.5 Update dev_docker.py

# Before:
HAYSTACK_CONNECTIONS["default"]["URL"] = "http://elasticsearch:9200"

# After:
OPENSEARCH_DSL = {
    "default": {
        "hosts": "opensearch:9200",
    }
}

Phase 5: Management Commands

New commands (django-opensearch-dsl):

# Create indexes
python manage.py opensearch index create

# Populate indexes
python manage.py opensearch document index

# Delete indexes
python manage.py opensearch index delete

Remove old haystack commands:

  • rebuild_indexopensearch document index --rebuild
  • update_indexopensearch document index
  • clear_indexopensearch index delete

Phase 6: Cleanup

6.1 Delete haystack templates

Remove all templates/search/indexes/*/ directories:

  • axis/core/templates/search/
  • axis/company/templates/search/
  • ... (12 total)

6.2 Delete search_indexes.py files

  • axis/core/search_indexes/
  • axis/company/search_indexes/
  • ... (12 total)

6.3 Delete custom transport

  • settings/utils/aws_elasticsearch_transport.py

6.4 Update imports throughout codebase

Search for and remove:

  • from haystack import *
  • from haystack.query import *
  • from haystack.constants import *

Files to Modify

Settings:

  • settings/base.py
  • settings/production.py
  • settings/staging.py
  • settings/beta.py
  • settings/gamma.py
  • settings/demo.py
  • settings/dev_docker.py
  • settings/e2e_docker.py
  • settings/steven/test_lite.py

Dependencies:

  • pyproject.toml

Delete:

  • settings/utils/aws_elasticsearch_transport.py

Create (documents.py):

  • axis/core/documents.py
  • axis/company/documents.py
  • axis/community/documents.py
  • axis/geographic/documents.py
  • axis/home/documents.py
  • axis/floorplan/documents.py
  • axis/subdivision/documents.py
  • axis/invoicing/documents.py
  • axis/user_management/documents.py
  • axis/customer_hirl/documents.py

Modify (integrations):

  • axis/core/admin/axis_admin.py
  • axis/core/api_v3/filters/axis.py
  • axis/customer_hirl/tasks/update_verification_report_index.py

Delete (search_indexes):

  • All search_indexes/*.py files (12 apps)
  • All templates/search/indexes/ directories

Docker:

  • docker-compose.yml
  • docker-compose.e2e.yml

Testing Strategy

  1. Unit tests: Update test settings to use SimpleEngine equivalent or mock
  2. Integration tests: Test each Document class indexes correctly
  3. E2E tests: Verify search functionality in E2E environment
  4. Production verification: Test against AWS OpenSearch staging before production

Rollback Plan

If issues arise:

  1. Revert pyproject.toml changes
  2. Restore haystack settings
  3. Keep old search_indexes.py files until migration is verified
  4. Docker can run either ES 7.x or OpenSearch

PR Breakdown (Multiple PRs)

PR 1: Dependencies & Infrastructure

  • Update pyproject.toml (add django-opensearch-dsl, keep haystack temporarily)
  • Update Docker (OpenSearch image)
  • Add django_opensearch_dsl to INSTALLED_APPS
  • Add OPENSEARCH_DSL settings (alongside existing HAYSTACK_CONNECTIONS)
  • Goal: Both systems can run in parallel

PR 2: Document Classes (All 12)

  • Create all documents.py files
  • All 12 Document classes with proper field mappings
  • Convert template-based indexing to explicit fields
  • Handle special cases (should_update, prepare_text, complex querysets)
  • Goal: Documents are created but not yet wired up

PR 3: Integration Updates

  • Update axis/core/admin/axis_admin.py (AxisAdmin search)
  • Update axis/core/api_v3/filters/axis.py (AxisSearchFilter)
  • Update axis/customer_hirl/tasks/update_verification_report_index.py (Celery task)
  • Add feature flag or setting to switch between haystack/opensearch-dsl
  • Goal: Search functionality works with new backend

PR 4: Settings Migration

  • Update all settings files to use OPENSEARCH_DSL exclusively
  • Remove HAYSTACK_CONNECTIONS
  • Remove custom transport (aws_elasticsearch_transport.py)
  • Update test settings
  • Goal: Haystack settings fully removed

PR 5: Cleanup

  • Remove haystack from dependencies
  • Remove elasticsearch-py from dependencies
  • Delete all search_indexes/*.py files
  • Delete all templates/search/indexes/ directories
  • Remove haystack imports throughout codebase
  • Goal: Clean codebase, urllib3 conflict resolved

Deployment Order

  1. PR 1 → Deploy to staging (verify OpenSearch connectivity)
  2. PR 2 → Deploy to staging (create indexes, verify documents)
  3. PR 3 → Deploy to staging (verify search works end-to-end)
  4. PR 4 → Deploy to staging (verify AWS OpenSearch production-like config)
  5. PR 5 → Deploy to staging, then production
  6. Rebuild indexes in production: python manage.py opensearch document index --rebuild
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment