Exploring ggh4x

The package ggh4x provides an extension to base ggplot2 package with some useful additional functionality.

Applying for and obtaining OPT

Having just gone through this ~80 day process, I am documenting it here. I applied for OPT on April 17 2023 online at the USCIS website, and recieved the notice of receipt the same day. I obtained the approval notice on July 15 online and got the physical EAD card on July 22, 2023.

Seurat v4.3 dotplot algorithm

Dotplot is one of the most common gene expression visualizations of Seurat object. The dotplot function of Seurat is an easy way to do this, but the default options can make the plot difficult to interpret. The full set of options is available in their website manual and described in more detail in their code.

Macbook setup

Set up a new macbook with programs for bioinformatics.

Seurat to Anndata conversions

A lot of single cell data packages are built in R, and the standard data formats in commonly used packages such as Seurat and SingleCellExperiment package count data with metadata in a single object. When moving the data over to python, we can preserve this structure using the Anndata format.

Quickly making large DAGs with DAGitty

The software DAGitty is really useful for representing causal diagrams and has wonderful documentation. It also comes with a really handy R package accessible through CRAN which can be used to query the DAG for causal effect identification, such as d-separation of variables.

Printing high resolution figures from R

I currently make most figures using R with ggplot2 before exporting to other programs which means that they need to be exported in high resolution.

Anki with LaTeX

Anki is a great spaced repetition flashcard app that I started using when I read Michael Nielsen’s post Augmenting Long-term Memory. I make the cards on my computer and review them on my phone with Ankidroid. I first used this system to learn since there are great public decks for that.

GMT file import in R

As outlined in the GSEA wiki, GMT files store functional pathway information for genes. However, as the figure there makes clear, each line uses tabs as separators and there may be a different number of genes per line.

R Markdown files

R Markdown files are a cross between markdown and R files since they can load and run chunks of code alongside markdown-formatted text.

Moving between species in R

It’s often useful to compare data against a published dataset from another species. These are the most common tasks I complete for this purpose (and the corresponding libraries in R). Unfortunately converting between species always seems to introduce missing identifiers and so I have tried to choose the method which avoids this as much as possible.

Transferring large files from google drive on command line using gdrive

Downloading large google drive files via gdrive on Mac and Linux seems to be faster than using the wifi connection directly or with something like curl.

Working with granges

GRanges seems to be the most standard data structure to represent genomic coordinates in R, supported by Bioconductor’s GenomicRanges package. The vignette describes useful associated attributes and functions such as names for each region.

Moving to the US from Canada as a graduate student

I am a Canadian citizen and moved to the US with F1 status when I started my degree. I couldn’t find that many accounts of how to do this online at the time of writing, so I’ve written up some observations I made (primarily about financial considerations).

Using GTF files to extract information about genes, transcripts and related features

The Ensembl gtf file contains the comprehensive gene and transcript information for model organisms e.g. human and mouse. It can be used in RNA-Seq alignment and quantification programs such as STAR.