Vector store search result should remove same records, or the memory will be deleted. #2165

yp05327 · 2025-01-21T11:01:36Z

🐛 Describe the bug

Lines 167 to 178 in a5355f7

    
           for new_mem in new_retrieved_facts: 
        
               messages_embeddings = self.embedding_model.embed(new_mem) 
        
               new_message_embeddings[new_mem] = messages_embeddings 
        
               existing_memories = self.vector_store.search( 
        
                   query=messages_embeddings, 
        
                   limit=5, 
        
                   filters=filters, 
        
               ) 
        
               for mem in existing_memories: 
        
                   retrieved_old_memory.append({"id": mem.id, "text": mem.payload["data"]}) 
        
           logging.info(f"Total existing memories: {len(retrieved_old_memory)}")

The search result from vector store is handled by the code above.
There's no process to remove same records in retrieved_old_memory, which will cause the old memory will be removed:

id 0 and id 1 is the record in vector store, but id 2 and id 3 is same as id 0 and id 1, and you can see that the event of id 2/3 is marked as DELETE.

To reproduce this bug, you should prepare a message which will be splited into two records.

The text was updated successfully, but these errors were encountered:

yp05327 · 2025-01-21T11:09:46Z

This is a sample to fix this bug: yp05327@dcc3bd7

chinmay3012 · 2025-01-26T20:42:02Z

The issue at hand involves the retrieved_old_memory list in mem0/memory/main.py, where duplicate records from the vector store search results are not being removed. This can lead to unintended deletion of old memories.

Proposed Solution:

To address this, we can modify the code to ensure that only unique records are added to the retrieved_old_memory list. This can be achieved by using a set to track the IDs of the memories that have already been added, preventing duplicates.

Implementation Steps:

Initialize a Set for Tracking: Create an empty set named seen_ids to keep track of memory IDs that have been processed.
Filter Duplicates: Before appending a memory to retrieved_old_memory, check if its ID is already in the seen_ids set. If it is not, append the memory and add its ID to the set.
Code Modification:

Here's how you can implement the above logic in mem0/memory/main.py:

`retrieved_old_memory = []
seen_ids = set() # Initialize an empty set to track seen memory IDs

for new_mem in new_retrieved_facts:
messages_embeddings = self.embedding_model.embed(new_mem)
new_message_embeddings[new_mem] = messages_embeddings
existing_memories = self.vector_store.search(
query=messages_embeddings,
limit=5,
filters=filters,
)
for mem in existing_memories:
if mem.id not in seen_ids: # Check if the memory ID has already been processed
retrieved_old_memory.append({"id": mem.id, "text": mem.payload["data"]})
seen_ids.add(mem.id) # Add the memory ID to the set

logging.info(f"Total existing memories: {len(retrieved_old_memory)}")
`

Explanation:

seen_ids = set(): Initializes an empty set to keep track of memory IDs that have been encountered.
if mem.id not in seen_ids: Before appending a memory to retrieved_old_memory, this condition checks whether the memory's ID has already been added.
retrieved_old_memory.append(...) and seen_ids.add(mem.id): If the memory ID is not in seen_ids, the memory is appended to retrieved_old_memory, and its ID is added to the seen_ids set to prevent future duplicates.
Testing the Fix:

After implementing the above changes, it's crucial to test the functionality to ensure that duplicates are effectively removed and that no unintended side effects occur. This can be done by adding unit tests that cover scenarios with duplicate memories and verifying that the retrieved_old_memory list contains only unique entries.

By implementing this solution, the system will handle vector store search results more robustly, preventing the accidental deletion of old memories due to duplicate entries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector store search result should remove same records, or the memory will be deleted. #2165

Vector store search result should remove same records, or the memory will be deleted. #2165

yp05327 commented Jan 21, 2025 •

edited

Loading

yp05327 commented Jan 21, 2025

chinmay3012 commented Jan 26, 2025

Vector store search result should remove same records, or the memory will be deleted. #2165

Vector store search result should remove same records, or the memory will be deleted. #2165

Comments

yp05327 commented Jan 21, 2025 • edited Loading

🐛 Describe the bug

yp05327 commented Jan 21, 2025

chinmay3012 commented Jan 26, 2025

yp05327 commented Jan 21, 2025 •

edited

Loading