You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue you're experiencing with high memory usage during the indexing of a large CSV file in Dify is likely due to the memory requirements for handling a large number of embeddings. Here are a few suggestions to help manage memory usage:
Batch Processing: Instead of processing all 350,000 lines at once, consider processing the CSV file in smaller batches. This can help reduce the peak memory usage by only loading a subset of the data into memory at any given time.
Increase Memory Allocation: If possible, increase the memory allocation for your Docker container. This can be done by adjusting the Docker settings to allow more memory for the container running Dify.
Optimize Embedding Size: If feasible, consider reducing the dimensionality of the embeddings. This will decrease the memory required for each embedding, thus reducing the overall memory footprint.
Use Swap Space: Configure your system to use swap space, which can help handle memory spikes by temporarily using disk space as additional memory. However, this may slow down processing due to the slower read/write speeds of disk storage compared to RAM.
Monitor and Log Memory Usage: Continue using tools like objgraph to monitor memory usage and identify potential memory leaks or inefficiencies in the code.
Implementing these strategies can help mitigate the memory issues you're facing during the document indexing process.
Self Checks
Dify version
0.15.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
✔️ Expected Behavior
documents status will turning to avaliable
❌ Actual Behavior
docker container will restart due to running out of memory
documents status will pending in indexing
10,000 embeddings takes 10076 * 1536 * 8byte / 1024 / 1024 = 117Mb memory (float take 8 bytes memory in python)
350,000 embeddings will takes 117Mb * 35 / 1024 = 4Gb memory
key result:
full result:
The text was updated successfully, but these errors were encountered: