Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with KeyError for Custom Scoring Plugin #204

Open
xjtx21 opened this issue Feb 8, 2025 · 0 comments
Open

Help with KeyError for Custom Scoring Plugin #204

xjtx21 opened this issue Feb 8, 2025 · 0 comments

Comments

@xjtx21
Copy link

xjtx21 commented Feb 8, 2025

Hello,

Firstly, I just want to say I love REINVENT and am having a lot of fun with the program.

I am an amateur with computational chemistry, and I'm trying to figure out how to create a custom scoring plugin but am running into some trouble. I would really appreciate some guidance.

  1. I tried to copy the general structure of the comp_group_count.py file and adjusted it to calculate the distance between 2 substructures. I called it comp_closeness.py (file at the bottom of the post).
  2. I placed comp_closeness.py in the /top/dir/somewhere/reinvent/reinvent_plugins/components directory.
  3. I added /top/dir/somewhere/reinvent to the python path with sys.path.insert (I am a little confused why I had to do this because all the other components work).
  4. I then added the following block to my config file:
# Closeness between 2 specified SMARTS
[[stage.scoring.component]]
[stage.scoring.component.Closeness]
[[stage.scoring.component.Closeness.endpoint]]
name = "Closeness Distance"
weight = 0.5
params.smarts = "[Cl,n;H1]"
  1. I ran reinvent and get the follow error:
Traceback (most recent call last):
  File "/home/jt/anaconda3/envs/reinvent4/bin/reinvent", line 8, in <module>
    sys.exit(main_script())
  File "/home/jt/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 195, in main_script
    main(args)
  File "/home/jt/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 164, in main
    runner(
  File "/home/jt/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/RL/run_staged_learning.py", line 296, in run_staged_learning
    packages = create_packages(reward_strategy, stages, rdkit_smiles_flags2)
  File "/home/jt/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/RL/run_staged_learning.py", line 157, in create_packages
    scoring_function = Scorer(stage.scoring)
  File "/home/jt/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/scoring/scorer.py", line 80, in __init__
    self.components = get_components(config.component)
  File "/home/jt/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/scoring/config.py", line 53, in get_components
    Component, ComponentParams = component_registry[component_type_lookup]
KeyError: 'closeness'
CPU times: user 21.6 ms, sys: 34.2 ms, total: 55.7 ms
Wall time: 4.06 s

Please note, I haven't fully worked out the scoring function, transforms or using molcache in comp_closeness.py yet. I'll get to that once REINVENT can find the component.

Any help is appreciated!

Thanks,
JT

"""
Score the distance between 2 SMARTS patterns
"""

__all__ = ["Closeness"]

from typing import List

from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np
from pydantic.dataclasses import dataclass

from .component_results import ComponentResults
from reinvent_plugins.mol_cache import molcache
from .add_tag import add_tag

@add_tag("__parameters")
@dataclass
class Parameters:
    """Parameters for the scoring component

    Note that all parameters are always lists because components can have
    multiple endpoints and so all the parameters from each endpoint is
    collected into a list.  This is also true in cases where there is only one
    endpoint.
    """

    smarts: List[str]


@add_tag("__component")
class Closeness:
    def __init__(self, params: Parameters):
        # Parameters could include SMARTS patterns or other settings
        self.patterns = []

        for smarts in params.smarts:
            pattern = Chem.MolFromSmarts(smarts)

            if pattern:
                self.patterns.append(pattern)

        if not self.patterns:
            raise ValueError(f"{__name__}: no valid SMARTS patterns found")
        
        self.number_of_endpoints = len(params.smarts)
    
    @molcache
    def __call__(self, smiles):
        mol = Chem.MolFromSmiles(smiles)
        if mol is None:
            return 0.0, {"error": "Invalid SMILES"}

        mol = Chem.AddHs(mol)  # Add hydrogens if not present
        AllChem.EmbedMolecule(mol)  # Generate 3D conformer

        if mol.HasSubstructMatch(self.pattern1) and mol.HasSubstructMatch(self.pattern2):
            matches1 = mol.GetSubstructMatches(self.pattern1)
            matches2 = mol.GetSubstructMatches(self.pattern2)
            
            min_distance = float('inf')
            for match1 in matches1:
                for match2 in matches2:
                    distance = mol.GetConformer().GetAtomPosition(match1[0]).Distance(mol.GetConformer().GetAtomPosition(match2[0]))
                    min_distance = min(min_distance, distance)
            
            # return min_distance, {"distance": min_distance}
            return ComponentResults(min_distance)
        else:
            return 0.0, {"message": "Molecule does not contain both specified atom types."}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant