Skip to content

Instantly share code, notes, and snippets.

@kurokobo
Last active December 17, 2025 01:17
Show Gist options
  • Select an option

  • Save kurokobo/51fbe7f92f4526957e12dacfa7783cdf to your computer and use it in GitHub Desktop.

Select an option

Save kurokobo/51fbe7f92f4526957e12dacfa7783cdf to your computer and use it in GitHub Desktop.
Dify: Weaviate 1.19 to 1.27+ Migration Guide (community-edited, simplified)
"""
# NOTE: THIS SCRIPT IS DEPRECATED AND OUTDATED
## TL;DR;
Use the official migration script instead.
https://github.com/langgenius/dify-docs/blob/main/assets/migrate_weaviate_collections.py
## Background
This script was originally released as a community-edited version of the draft of the script presented by the Dify Team,
to address some issues encountered during migration of Weaviate collections in certain environments,
before the official script was finalized, as a temporary workaround.
However, all modifications made in this script have already been backported to the official script.
Therefore, this unofficial script is deprecated and will not be maintained in the future.
We strongly recommend using the latest official script.
If you face any issues with the official script, please report them to the Dify Team via their GitHub repository or any supported channels.
You can see the revisions made in this script by checking the git history: https://gist.github.com/kurokobo/51fbe7f92f4526957e12dacfa7783cdf/revisions
The original source for this script can be found at: https://github.com/langgenius/dify/issues/27291#issuecomment-3501003678.
The key changes made in this script were:
- Retrieve Weaviate connection info from environment variables to make this script run in the Worker container.
- Switch to cursor-based pagination in "replace_old_collection", since the migration could fail with large collections.
- Fix an issue where both the old and new collections remained without being deleted after migrating an empty collection.
"""
import sys
print("WARNING: This migration script is DEPRECATED and OUTDATED.")
print("Please use the following official migration script instead:")
print("https://github.com/langgenius/dify-docs/blob/main/assets/migrate_weaviate_collections.py")
print("This script will now exit without making any changes.")
sys.exit(1)

Weaviate 1.19 to 1.27+ Migration Guide for Dify

  • โš ๏ธ This guide is not officially supported by the Dify Team.
  • โš ๏ธ This is a community-edited, simplified version of the official migration guide presented by the Dify Team.

Complete guide to safely migrate Dify knowledge bases from Weaviate 1.19 to 1.27/1.33.


โœ… NOTE: BEFORE PROCEEDING FURTHER

If your environment contains only a small number of Knowledges, you might be able to resolve the issue using the following much simpler steps, instead of the more complicated process on this page.

  1. Open the Settings page for your knowledge.
  2. Change the Embedding Model to something else.
  3. On the Documents page, wait until all documents become Available.
  4. Open the Settings page again and change the Embedding Model back to the original.
  5. On the Documents page, wait again until all documents become Available.
  6. Repeat these steps for each Knowledges.

The steps described in the following sections are aimed at large environments, where it's not feasible to manually edit every Knowledges.


๐Ÿ“ Outline

This guide covers the following two cases.
While Case A is recommended for a safer migration, this guide can also be applied to Case B:

  • Case A
    • You are currently running a version of Dify 1.9.1 or earlier with Weaviate 1.19 included.
    • All knowledge is functioning properly.
  • Case B
    • You have already upgraded to Weaviate 1.27+ and are running Dify 1.9.2 or later.
    • The knowledge created with the previous version is corrupted, and you have no backup to revert to the earlier version.

The procedure in this guide is as follows:

  1. Take a complete backup of your current Dify environment.
  2. If your Dify version is 1.9.1 or earlier, upgrade Dify.
  3. Operate the weaviate container and modify the directory structure of the LSM data.
  4. Operate the worker container and run the migration script.
  5. Perform cleanup.

๐Ÿ“ Migration Procedure

Note:
This procedure cannot be rolled back by any means other than a restore. Attempting to roll back using anything other than a restore may make things worse.
We recommend that you follow the steps to take a full backup first, in preparation for a possible restore.


Step 1: Backup Your Environment

Stop your Dify services:

cd /path/to/dify/docker
docker compose down

Then making full copy or archive of your entire docker directory (/path/to/dify/docker for example) as a safety measure.

If you encounter issues later, you can restore this backup to revert to the original state.


Step 2: Upgrade to Weaviate 1.27+ (Only for Case A)

This step is only for Case A - users currently on Dify 1.9.1 or earlier with Weaviate 1.19.
If you are already running Weaviate 1.27+ (Case B), you can skip this step.

Follow the upgrade guide to move to the latest (or a specific) Dify version that uses Weaviate 1.27+.


Step 3: Fix Orphaned LSM Data

If your Dify has stopped, start it and wait until it has fully launched.

cd /path/to/dify/docker
docker compose up -d

Ensure your Weaviate using the image version 1.27.0 or higher.

cd /path/to/dify/docker
docker compose ps weaviate  # The "IMAGE" column should show "semitechnologies/weaviate:1.27.0" or higher

Enter the shell of your weaviatwe container:

cd /path/to/dify/docker
docker compose exec -it weaviate /bin/sh

Then run the following commands inside the container to fix LSM data:

cd /var/lib/weaviate
for dir in vector_index_*_node_*_lsm; do
  [ -d "$dir" ] || continue
  
  # Extract index ID and shard ID
  index_id=$(echo "$dir" | sed -n 's/vector_index_\([^_]*_[^_]*_[^_]*_[^_]*_[^_]*\)_node_.*/\1/p')
  shard_id=$(echo "$dir" | sed -n 's/.*_node_\([^_]*\)_lsm/\1/p')
  
  # Create target directory and copy
  mkdir -p "vector_index_${index_id}_node/$shard_id/lsm"
  cp -a "$dir/"* "vector_index_${index_id}_node/$shard_id/lsm/"
  
  echo "โœ“ Copied $dir"
done
exit

Then restart weaviate container to ensure changes are recognized:

cd /path/to/dify/docker
docker compose restart weaviate

Step 4: Migrate Schema

Place migrate_weaviate_collections.py script to your /path/to/dify/docker/volumes/app/storage/ directory, then enter the shell of your worker container:

cp /path/to/migrate_weaviate_collections.py /path/to/dify/docker/volumes/app/storage/
cd /path/to/dify/docker
docker compose exec -it worker /bin/bash

Then run the following commands inside the container to execute the migration script:

uv run --no-cache /app/api/storage/migrate_weaviate_collections.py
exit

Restart Dify services:

docker compose down
docker compose up -d

Verify in Dify UI:

  1. Go to your Dify console
  2. Open your knowledge bases
  3. Try "Retrieval Testing"
  4. Should work without errors!

Step 5: Cleanup (Optional)

After successful migration, you can delete orphaned files to free up space.
Enter the shell of your weaviatwe container:

cd /path/to/dify/docker
docker compose exec -it weaviate /bin/sh

Then run the following commands inside the container to delete orphaned files:

cd /var/lib/weaviate
rm -rf vector_index_*_node_*
exit

Also, you can delete the migration script from your storage volume:

rm /path/to/dify/docker/volumes/app/storage/migrate_weaviate_collections.py

๐Ÿ“ Files Needed

๐Ÿ“ Credits

  • Original migration approach: Dify team
  • LSM recovery method: Chinese Dify community user
  • Combined solution: Community effort
@tomy-kyu
Copy link

tomy-kyu commented Dec 7, 2025

@kurokobo
Unfortunately, the migration has not been successful.
It seems some trial and error may be required.

Due to the gRPC migration, it appears that after switching Weaviate to gRPC, the index_node_id returned from the vector database is now being treated as a UUID object rather than a string.
This has caused SQLAlchemy to generate UUID-type parameters, and PostgreSQL is failing to perform the type conversion.
Consequently, the following error was being output:

Run failed: (psycopg2.errors.UndefinedFunction) operator does not exist: character varying = uuid
LINE 1: ...T_TIMESTAMP WHERE document_segments.index_node_id = 'c6f0cc1...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

[SQL: UPDATE document_segments SET hit_count=(document_segments.hit_count + %(hit_count_1)s), updated_at=CURRENT_TIMESTAMP WHERE document_segments.index_node_id = %(index_node_id_1)s::UUID AND document_segments.dataset_id = %(dataset_id_1)s::UUID]
[parameters: {'hit_count_1': 1, 'index_node_id_1': UUID('c6f0cc12-0522-41b5-b22f-55e8dcbdc79a'), 'dataset_id_1': 'c04750cd-58e6-4e24-85d5-852dd3c4d321'}]
(Background on this error at: https://sqlalche.me/e/20/f405)

Therefore, during the migration process, explicit modifications should be made to return UUID values as strings instead. While I plan to try implementing these corrections, I wanted to note that this approach appears unworkable in its current state, so I'm leaving this comment for reference.

@kurokobo
Copy link
Author

kurokobo commented Dec 7, 2025

@tomy-kyu
Could you please provide the exact steps to reproduce your issue?
After the migration, what operations are you performing, and where is the error being logged?

In any case, it would be best to clarify whether this is a problem with this guide itself or if it could also happen when following the official guide. If necessary, we shouldn't try to resolve everything here, but instead open an issue at official repo.

@tomy-kyu
Copy link

tomy-kyu commented Dec 7, 2025

@kurokobo
Correct. We're currently running the correction script, and if it executes successfully, we'll include this information as reference. The sequence of events was as follows:

  • Used this guide to migrate from 1.8.1 to 1.9.2
  • After execution, tested queries on the knowledge base screen and confirmed they returned zero results
  • Executed the knowledge search node in the workflow app
  • The problematic error occurred

@kurokobo
Copy link
Author

kurokobo commented Dec 7, 2025

@tomy-kyu
Thanks, I tried to reproduce your issue, but I couldn't.

  1. Deploy a whole new Dify 1.8.1 instance, install the OpenAI plugin, and configure the OpenAI API key.
  2. Create new Knowledge with the High Quality method using the OpenAI embedding model.
  3. Ensure that the Retrieval Testing works.
  4. Create new workflow app including Knowledge Retrieval node with the Knowledge created above.
  5. Ensure that the new app can retrieve the Knowledge, and retrieval count increased.
  6. Follow my guide to upgrade Dify to 1.9.2 and Weaviate to 1.27.0.

Now I can confirm that both the Retrieval Testing and the workflow app works, and retrieval count increased as expected.
Adding docs to the existing knowledge also works.

@tomy-kyu
Copy link

tomy-kyu commented Dec 7, 2025

@kurokobo

Thank you for your assistance with the verification process.
I've now understood that under normal circumstances, your system should function correctly.

My environment does indeed have some differences:

  • Weaviate was originally configured with the latest version 1.28 during initial setup.
  • We customized several processing steps to use gse as the tokenizer.
  • Our knowledge configuration always utilized parent-child chunks.

What occurred in my environment may be attributable to these factors.
Given the practical challenges of completely restoring and reimplementing all these changes at this time, I'm considering continuing to operate the tenant on version 1.10.1.fix1 for the moment.

Fortunately, we have two tenants (production and development), and the target is the development tenant. I expect the development tenant will continue to serve as a testing environment for versions 1.10 and above.

The production tenant remains on version 1.8.1. I'll freeze this version for ongoing operation for the time being, then consider setting up a new production tenant as needed and performing an environment migration.

I apologize for not being able to provide more useful information.
Best wishes for your future endeavors.

@suntao2015005848
Copy link

run: uv run /app/api/storage/migrate_weaviate_collections_ce.py
error:
[
](error: failed to create directory /home/dify/.cache/uv: Permission denied (os error 13))
fix:
docker compose exec -it -u root worker /bin/bash
uv run /app/api/storage/migrate_weaviate_collections_ce.py

@kurokobo
Copy link
Author

@suntao2015005848
Good catch, thanks!
It seems that /home/dify no longer exists in the container starting from version 1.11.0, which caused the cache directory for uv to fail to be created.

As you suggested, using -u root would work, but an even simpler solution is just to add --no-cache to the uv command. I've updated the guide accordingly.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment