# Questions for Andrew ## Data Science 1. What are priority _problems to solve_? (Explicitly NOT "what models are needed to be built"...) Is the problem space focus on the research side, or is it more commercial? (Personal interests lie squarely in the research side of things.) 2. What are examples of you building a model based on initial data collected at the bench being then fed back to help design better experiments at the bench? 4. How much effort have you all put in to automated experiment design coupled with statistical/machine learning models? Bayesian design of experiments with ML models as part of the loop? 5. Are there "standard" scientific decision-making workflows for which automated machine learning modeling systems should be built to aid scientist? One example I am thinking of is related to an automated "local" model building system that I made at NIBR, which automatically fits standard models for chemical series, which could help scientists decide whether or not to continue pursuing that chemical series further or not. 14,000 ML models in an overnight by one person :). (The hard part was then building the system to get it into the hands of chemists........., which involved convincing management to resource engineers to build the thing. ## Informatics Systems 1. What struggles have you had with data systems over your tenure at Moderna? Can you give an example? 2. If one wants to access _data dumps_, is that easy? (At NIBR, I wasn't allowed to get read-only access to databases, which was frustrating -- this was a legacy IT policy that kept me from doing work at my fastest.) 3. Does Moderna have a system that handles the digital capture of raw laboratory data as it is being generated, _with metadata automatically captured_? For e.g., if RNA-seq experiments are done in different mouse genotypes, are those captured in a standardized format that makes it easy to analyze for confounders? As another e.g., if we're engineering a protein replacement therapy and its immunogenicity profile is being collected, are key information like T-cell HLA genotype, primary vs. immortalized cell line, and more being _automatically_ recorded? 4. Can laboratory notebook information be queried via APIs, and are they made easily available? (As far as I have seen, NIBR's ELN data dumping requires contacting someone in NX to do a data dump first...) 5. In the bench research, are experiments planned out with a vision towards being scaled and run systematically, or are they designed in a one-off fashion? More generally, are assays standardized and robotized, or is there constant assay development? If both, what's the balance that you have seen? ## Tech Stack Specific tools I'm curious to know if you all are using or not: 1. Dask 2. Conda 3. FastAPI 4. QHub 5. K8s Other questions: 1. What is the version control system that you all are using? 9. How about continuous integration system? Is there an IT team that manages this, or did you have to roll this on your own? 10. For burst compute, do you all have to rely on raw AWS with Terraform, or are there abstractions in place (e.g. in the style of QHub) to make access to burst compute a bit easier? (Another example is Coiled Computing.) 11. Is there a PaaS (platform as a service) of some kind, such as a K8s cluster, on which we can deploy ML model APIs? Alternatively, is there an ML model registry that serves and deploys such models' APIs? 12. How are models treated? As code? Are there versioning systems in place for models as a whole? I have been thinking about model versioning in the way of building versioned docker containers, or with API interfaces that _support workflows_. Is this culturally accepted at MTX? 13. Has security ever overbearingly or subtly stopped you from doing what you thought was right for the company? ## Span of science at MTX 1. What are the roles of the discovery chemistry, protein science teams? (It's sounding like a mini-NIBR organizationally, so I'm just curious.) 2. What's the DSAI relationship to "Computational Science" organizationally? (Headed by Wei Zheng.) 3. What's the "New Venture Lab"? 4. How many infectious disease projects are going on? This is related to my grad school work, so I have a particular soft spot for it, and from a scientific perspective, I've been curious to know if we could make a near-real-time evolving vaccine rather than a static one-shot vaccine that intercepts current and highest likelihood future sequences. ## Patenting/Publishing/Open Source/Conferences 1. Can I continue working on my personal open source projects (e.g. pyjanitor, nxviz, blogging and newsletter about DS in general), or does Moderna have an Apple-like policy where employees are not allowed to do any open source work even in personal time? If the latter, can this be negotiated? 2. Have you all published work at ML conferences (NeurIPS/ICLR/ICML)? 3. It remains important for me to keep connected to the SciPy, PyCon and PyData communities via their conferences, as that is a place where I get to learn new things and keep abreast of tooling. Is this supported? At NIBR, I did this thing where I went to these conferences, but still maintained a presence virtually at work to respond to issues if they came up. Is conference attendance supported? As long as the science is moving forward, would I be able to continue attending these conferences? 4. Have you been involved in machine learning patents with Moderna? It is one of my personal goals to have published a patent if I remain in an industrial computational scientist role. ## Culture 1. How have you all navigated the buy vs. build conundrum? What systems have you all chosen to build vs. just buy? Do you have an example of a decision that you were involved in? (I am asking because I was asked to evaluate external vendors/consulting firms/machine learning startups before...) 2. How have you all chosen projects to prioritize? If the answer is "by impact", then the natural question is, by what criteria is impact decided? How much is influenced by the smart people that are hired on the front lines vs. what leadership decides top-down? 3. On the DS team, have "good practices" (sane folder structure, documentation written alongside code, version control, data pulled in from accessible and defined locations) been encouraged? 4. Is there a strong implicit cultural leaning towards on-site work? Remote work has been enjoyable, and apart from the COVID-19 episode, I've also done another year of remote-to-Basel (where the team is based there and I'm the only one in the US) of supporting enzyme engineering with machine learning and bench-side decision-making. 5. Is there an implicit cultural leaning towards overworking? With a child at home now, being able to leave when necessary, and leave on time to pick her up without feeling guilted into working long hours, is important to me. 6. At NIBR, I get the sense that building models that are fit for a problem is not a priority, and that we should rely on published academic work or published GOOG/MSFT/OpenAI models. This is in sharp contrast to my personal conviction that models should be designed for specific leverage against a problem, and that we shouldn't _just_ rely on "large pre-trained models" (they have their place, but are not often what we need). What is your philosophical belief on this matter? ## Role 1. I saw in the description that the role involves: "Grow a tightly knit team focused on diverse research problems such as biological sequence design, image analysis, and cheminformatics". Does this imply becoming a team lead? By growing the team, what is the anticipated team size? 2. Why is cheminformatics on the radar of Moderna, when the primary therapeutic modality is RNA? 3. Statistics, bioinformatics, and DSAI -- what kind of relationship have you all had working together (or not)? 4. Is this role reporting into you, Andrew? If so, what is your preference for check-ins and updates? ## Financial 1. I currently have a base salary of approximately $150K, alongside unvested NVS stocks. What is the base salary range one can expect for this role? 4 years ago I wouldn't be asking about this question, but now I have a child to worry about, so finances have increased in importance for me.