Diversity metrics
Designing evaluation metrics is an important part of the challenge. These metrics assess the quality and diversity of generated samples. Here, contributions from medicinal chemists and statisticians are especially welcome.
Measures of diversity are based on distance metrics in the
chemical space. This distance tells when two molecules are chemically close to each other. The most popular distance is the
Tanimoto distance on
Morgan fingerprints. It’s not necessary to get into details of the definition, the point is that those fingerprints are hand-crafted features, and it’s probably better to replace them with deep learning features, as suggested in the
MoleculeNet benchmark.
Let’s denote:
- Td the distance in the chemical space.
- A the set of generated molecules with desired properties. Its size is noted |A|.
- B the training set.