Tianyu Lu

@tylu

Joined on May 17, 2019

  • 1. Abstract We present modifications to the design algorithm Conditioning by Adaptive Sampling (CbAS) (Brookes et al.) and apply it to the design of plastic-degrading enzymes. We curated a dataset of 212 unique PETase sequences and their relative catalytic activities, 159 of which have an experimental Tm value. We implemented CbAS in PyTorch and report the optimization trajectory along with the model uncertainties. Generative models discovered by CbAS are able to consistently sample sequences that are predicted to surpass the catalytic activity and thermostability of the best sequence in the training data. We plan to synthesize the generated sequences in the wetlab for iGEM Toronto's 2021 project. 2. Motivation Our main motivation for applying CbAS to protein design is the promising results of iGEM Toronto's 2019 project. In it, the best sequence as predicted by CbAS was synthesized and not only was it able to fold, but also had a catalytic activity competitive with the rationally-designed enzyme in Austin et al. As such, we sought to more carefully monitor the CbAS optimization trajectory with various modifications to the code described below. 3. Generative Model Variational Autoencoder We train a VAE encoder with two hidden layers for the encoder and two hidden layers for the decoder with a 20-dimensional latent space. The model that is trained on the initial set of PETase sequences is denoted vae_0 and is a parameterization of the prior probability distribution over PETase sequences. For the prior, we ensured that resulting model is not overfitted to the training data by selecting the model with the lowest loss on a held-out set over 100 training epochs. This is only done for vae_0 since it is unclear what overfitting means for subsequent VAE models during optimization. 1000 samples from vae_0 with no decoder noise
     Like  Bookmark
  • 1. Abstract We present a generalizable and automated pipeline for protein design. Our model can be applied to the optimization of any protein class, even those with scarce data. We first train an AdaBoost regressor that is able to predict a protein property from sequence alone. We then train a recurrent neural network (RNN) that is able to generate novel protein sequences. Generated sequences are evaluated by the regressor and those that pass a specified threshold are added in the training set for the RNN to be retrained. This iterative process continues until convergence or experimental validation. 2. Generative Model Intuition We will use a Recurrent Neural Network as a generative model. The structure of an RNN is shown below. The $\mathbf{x}_i$ are the input vectors, the boxes in $A$ are described in more detail in the Formal Definition section, and the $\mathbf{h}_i$ are the outputs for each cell. These types of neural networks are well-suited for sequence data, such as amino acid sequences. We were first drawn to the generative ability of RNNs from an experiment done by Andreji Karpathy [cite]. He trained an RNN on the entire Shakespeare corpus and asked it to generate new Shakespeare text. Remarkably, the sample shown below closely captures Shakespeare's writing style. He also trained an RNN on Linux source code.
     Like  Bookmark
  • # Installing Apps on the iPad Steps 1 to 2: updates the code on your local machine by pulling the latest version from the cloud. BuddingMindsLab projects are [hosted here](https://github.com/BuddingMindsLab?tab=repositories) ## Step 1 Go to Source Control > Pull ![](https://i.imgur.com/jnUVp6j.jpg) This doesn't work in this example because CircularArenaController.swift has local changes, indicated by an *M* beside the file, that is not present in Github. Either commit these files to save yo
     Like  Bookmark