Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning

Vornholt, Tobias and Mutny, Mojmir and Schmidt, Gregor and Schellhaas, Christian and Tachibana, Ryo and Panke, Sven and Ward, Thomas R and Krause, Andreas and Jeschek, Markus, Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning.”, 2024. preprint

Abstract: Tailored enzymes hold great potential to accelerate the transition to a sustainable bioeconomy. Yet, enzyme engineering remains challenging as it relies largely on serendipity and is, therefore, highly laborious and prone to failure. The efficiency and success rates of engineering campaigns may be improved substantially by applying machine learning to construct a comprehensive representation of the sequence-activity landscape from small sets of experimental data. However, it often proves challenging to reliably model a large protein sequence space while keeping the experimental effort tractable. To address this challenge, we present an integrated pipeline combining large-scale screening with active machine learning and model-guided library design. We applied this strategy to efficiently engineer an artificial metalloenzyme (ArM) catalysing a new-to-nature hydroamination reaction. By combining lab automation and next-generation sequencing, we acquired sequence-activity data for several thousand ArM variants. We then used Gaussian process regression to model the activity landscape and guide further screening rounds according to user-defined objectives. Crucial characteristics of our enhanced enzyme engineering pipeline include i) the cost-effective generation of information-rich experimental data sets, ii) the integration of an explorative round to improve the performance of the model, as well as iii) the consideration of experimental noise during modelling. Our approach led to an order-of-magnitude boost in the hit rate of screening while making efficient use of experimental resources. Smart search strategies like this should find broad utility in enzyme engineering and accelerate the development of novel biocatalysts.

Full text here