How can machine learning help us to evaluate the risk possessed by emerging contaminants?
Description:
A small fraction of chemical mixture toxicity can be explained by known chemical contaminants monitored by targeted analytical methods as only a few thousand chemicals have ever been monitored and have experimental toxicity values available. Simultaneously, LC/HRMS spectra carry information about the polarity, size, and functional groups present in the detected compounds, which also affect the toxicity of the compounds. This suggests that mixture toxicity could be predicted from the non-targeted LC/HRMS data directly. To explore this possibility, we use estimated physicochemical properties and empirical spectral information acquired in LC/HRMS analysis, and subsequent machine learning to predict the lethal concentration for 50% of the population (LC50) for rainbow trout, bluegill, fathead minnow, algae, and water flea retrieved from CompTox Chemical Dashboard (>800 compounds) alongside the concentration (exposure). Machine learning approaches like gradient boosting, random forest, and support vector machines are used.
For model training, the experimental LC50 values and theoretical structural fingerprints were calculated from SMILES representation of the compound. The range for LC50 varies from 57000 to 0.0001 mg/L with the average experimental repeatability of 2.5x (max >100x). Root-mean-square error (RMSE) for training and test set are 4x and 10x on the concentration scale.
For validation, MS/MS data from MassBank are used alongside spectra measured in-house with LC-Orbitrap. The fingerprints are calculated with Finger:ID from experimental MS/MS spectra and used for predicting the LC50 values. The RMSE of 10x is observed, which agrees with the RMSE obtained with theoretical fingerprints. The analysis of the model reveals that the most important features are the exact mass of the compound, presence of aromatic rings and/or sulfur, halogens, oxygen. The exact mass is inversely correlated with the polarity of the chemicals and, therefore, baseline toxicity.
Speaker: Anneli Kruve - Stockholm University
Anneli Kruve graduated in 2011 from the University of Tartu and continued her studies as a post-doc in Technion, Israel. Anneli Kruve was a Humboldt fellow at Freie Universität Berlin (2017–2018). In 2019 Kruve joined Stockholm University and she is in charge of the mass spectrometry laboratory at Stockholm University. Her field of study is the fundamentals and applications of mass spectrometry. Specifically, modeling and machine learning to understand ionization processes in electrospray and unravel the quantification and structural assignment challenges in environmental non-targeted screening. Recently, her group has developed machine learning models which can be further used to quantify the contaminants detected with non-targeted LC/HRMS even if analytical standards are not available. The research group of Kruve is also investigating the structural characterization of small and large molecules with high-resolution ion mobility.
Co-Authors
How can machine learning help us to evaluate the risk possessed by emerging contaminants?
Category
2023 Call for Invited Abstracts
Description
Session Number: S09-02
Session Type: Symposium
Session Date: Sunday 3/19/2023
Session Time: 1:30 PM - 4:45 PM
Room Number: 116
Track: Forensics & Toxicology
Category: Environmental, Liquid Chromatography/Mass Spectrometry, Toxicology
Register for Pittcon 2023