Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector store search result should remove same records, or the memory will be deleted. #2165

Open
yp05327 opened this issue Jan 21, 2025 · 2 comments

Comments

@yp05327
Copy link

yp05327 commented Jan 21, 2025

🐛 Describe the bug

mem0/mem0/memory/main.py

Lines 167 to 178 in a5355f7

for new_mem in new_retrieved_facts:
messages_embeddings = self.embedding_model.embed(new_mem)
new_message_embeddings[new_mem] = messages_embeddings
existing_memories = self.vector_store.search(
query=messages_embeddings,
limit=5,
filters=filters,
)
for mem in existing_memories:
retrieved_old_memory.append({"id": mem.id, "text": mem.payload["data"]})
logging.info(f"Total existing memories: {len(retrieved_old_memory)}")

The search result from vector store is handled by the code above.
There's no process to remove same records in retrieved_old_memory, which will cause the old memory will be removed:

Image
id 0 and id 1 is the record in vector store, but id 2 and id 3 is same as id 0 and id 1, and you can see that the event of id 2/3 is marked as DELETE.

To reproduce this bug, you should prepare a message which will be splited into two records.

@yp05327
Copy link
Author

yp05327 commented Jan 21, 2025

This is a sample to fix this bug: yp05327@dcc3bd7

@chinmay3012
Copy link

The issue at hand involves the retrieved_old_memory list in mem0/memory/main.py, where duplicate records from the vector store search results are not being removed. This can lead to unintended deletion of old memories.

Proposed Solution:

To address this, we can modify the code to ensure that only unique records are added to the retrieved_old_memory list. This can be achieved by using a set to track the IDs of the memories that have already been added, preventing duplicates.

Implementation Steps:

Initialize a Set for Tracking: Create an empty set named seen_ids to keep track of memory IDs that have been processed.
Filter Duplicates: Before appending a memory to retrieved_old_memory, check if its ID is already in the seen_ids set. If it is not, append the memory and add its ID to the set.
Code Modification:

Here's how you can implement the above logic in mem0/memory/main.py:

`retrieved_old_memory = []
seen_ids = set() # Initialize an empty set to track seen memory IDs

for new_mem in new_retrieved_facts:
messages_embeddings = self.embedding_model.embed(new_mem)
new_message_embeddings[new_mem] = messages_embeddings
existing_memories = self.vector_store.search(
query=messages_embeddings,
limit=5,
filters=filters,
)
for mem in existing_memories:
if mem.id not in seen_ids: # Check if the memory ID has already been processed
retrieved_old_memory.append({"id": mem.id, "text": mem.payload["data"]})
seen_ids.add(mem.id) # Add the memory ID to the set

logging.info(f"Total existing memories: {len(retrieved_old_memory)}")
`

Explanation:

seen_ids = set(): Initializes an empty set to keep track of memory IDs that have been encountered.
if mem.id not in seen_ids: Before appending a memory to retrieved_old_memory, this condition checks whether the memory's ID has already been added.
retrieved_old_memory.append(...) and seen_ids.add(mem.id): If the memory ID is not in seen_ids, the memory is appended to retrieved_old_memory, and its ID is added to the seen_ids set to prevent future duplicates.
Testing the Fix:

After implementing the above changes, it's crucial to test the functionality to ensure that duplicates are effectively removed and that no unintended side effects occur. This can be done by adding unit tests that cover scenarios with duplicate memories and verifying that the retrieved_old_memory list contains only unique entries.

By implementing this solution, the system will handle vector store search results more robustly, preventing the accidental deletion of old memories due to duplicate entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants