Harness the power of the Synthetic Accessibility Score (SAS)
The ability to differentiate between ‘easy-to-make’ and ‘difficult-to-make’ molecules is a hard, but widely useful task, e.g., for prioritizing compounds in virtual screening pipelines. By combining the modern deep-learning model, and data collected with our renowned retrosynthetic planning software, we deliver SYNTHIA™ Synthetic Accessibility Score (SAS) service, a tool applicable to high-throughput in-silico compounds processing.
At present, combinatorial chemistry and generative modelling are used for constructing gigantic compounds datasets . However, the actual synthesis of many molecules obtained with such methods may be challenging. To address this problem, synthetic accessibility measures are used to determine molecule feasibility as early as possible in drug discovery pipeline.
SYNTHIA™ SAS API service provides the predictions for such ‘molecular complexity’ in terms of number of synthetic steps from small, commercially available building blocks. The machine learning model underpinning SAS has been pre-trained on synthetic scenarios obtained with algorithms from SYNTHIA™ Retrosynthetic Planning Tool , , . Finally, our cloud hosted and ISO-27001 certified product offers the ability to easily process millions of molecules daily and up to a thousand molecules in a single query, enabling SYNTHIA™ SAS service prediction to be more commonly used in drug design process.
Input/output for SAS model
Input molecules need to be provided in the widely used SMILES text format  and the API endpoint supports batch requests. The input SMILES consist of single fragment molecule.
The returned measure, here defined as Synthetic Accessibility Score (SAS), is a single float number from range 0-10, assigned for each corresponding input molecule. Returned score approximates how many steps it takes to synthesize the molecule using commercially available building blocks. The lowest numbers (values close to 0) are returned to chemicals that are predicted to be easy to make (or even can be commercially available). The higher numbers are returned when the model forecasts more synthetic steps to obtain the requested compound. For scores close to maximal value (10), synthesis is predicted to be either extremely complex (many reaction steps) or even unfeasible, e.g., due to exotic structural motifs in the molecule. In general, the lower the score the easier it should be to synthesize the molecule.
In an event that some of the molecules in request are invalid (e.g., hypervalent, incomplete rings, improper protonation of aromatic atoms, multi-fragment) the request will still be processed. Scores for such entries will be null and appropriate comments will be returned alongside in the response structure.
Predictive model characteristics
SYNTHIA™ SAS v1.0 is based on a regressor that includes graph convolutional neural network (GCNN). Such architecture allows for learning an internal representation of each molecule by operating on its graph structure rather than pre-computed molecular descriptors . In particular, the model consists of bond-level directed message passing neural network (D-MPNN) followed by feedforward neural network (FNN) The implementation was adapted from Chemprop open-source project .
Machine learning model was trained using SYNTHIA™ automatic retrosynthesis module results as a target value. Specialized and normalized SYNTHIA™ score was used to reflect the number of steps, e.g., not penalizing non-selective reactions, implicit protections strategy, minimal price contribution to the score, and only small building blocks were used as SYNTHIA™ search settings. Additionally, a smoothing function was applied to better build gradient for high scores, aimed for better resolution of hard to synthesize molecules (see also Fig. 1).