Hello, I’m Cathy living in Pittsburgh, PA. I am a Computational Biologist by training and work mainly with transcriptomics and genomics datasets. However I’ve also had the chance to complete some interesting data analysis and machine learning projects.
Course Projects
- 2018: Adversarial Text Generation using Long Short Term Memory Networks with Christopher Kottke and Emilee Holtzapple. This was a project to apply various LSTM based methods to generate text. We used the text of Sherlock Holmes by Sir Arthur Conan Doyle taken from the Project Gutenberg dataset, processed so that all letters are lower case and non-ascii punctuation was removed. I worked on training a character based vs word based LSTM from on this dataset implemented in Keras. The character based model generated text that didn’t make much sense, but the word based model could generate text with consistent capitalization and punctuation.
- 2019: Course notes for Probabilistic Graphical models. Overview of Bayesian Networks, their properties, and how they can be helpful to model the joint probability distribution over a set of random variables. Code.
- 2019: Latent Variable Models to Uncover Neural Population Dynamics with Tze-hui Koh, Hillary Wehry and Darby Losey. This was a project to extend the published model g-LARA to model neuron states across brain regions with a low dimensionality. The data used are simulations based on the neural recordings from hte published paper, multi-electrode spike train recordings of neurons from the V1 and V2 visual areas in monkey. The original model g-LARA has a latent variable to represent each distinct neural population. The extension we implemented g-LAFA adds a modification to diagonally constrain the covariance matrix of the observation model so that it can represent directional influences between the populations. In this project, I used TCDF, an attention based CNN, to try to detect causal relationships between the two neural populations in simulation.
- 2021: [Donors choose data] with Haoyun Lei, (https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose)
Hackathon projects
- 2018: Analysis of interaction between mouse motor cortex and striatum with Abhinav Sharma, Akash Umakantha, and Elissa Ye. Won first prize at the 2018 BrainHub Neurohackathon.
- 2018: Processing method for classifying cancer pathology reports with Natalie Lie. Obtained second prize at the June 9-11th 2018 Cancer Informatics Hackathon. In this project my contribution was in data cleaning, implementing and training the sklearn svm on the cancer data registry provided.
- 2024: TrAILs: Patient-Trial Matching Agent with Xueying (Shirley) Jia, Andrew Chen, Yuning Zheng, and Channdavel Kong. Obtained second prize in biotech track of Nucleate BioHack.
This blog is built on top of the Jekyll Now theme.