Exploring ggh4x
The package ggh4x provides an extension to base ggplot2 package with some useful additional functionality.
Mostly Genomics
The package ggh4x provides an extension to base ggplot2 package with some useful additional functionality.
The default scaling of the dotplot function from Seurat is easy to use but the default options can make the plot difficult to interpret. The full set of options is available in their website manual and described in more detail in their code.
Set up a new macbook with programs for bioinformatics.
A lot of single cell data packages are built in R, and the standard data formats in commonly used packages such as Seurat and SingleCellExperiment package count data with metadata in a single object. When moving the data over to python, we can preserve this structure using the Anndata format.
The software DAGitty is really useful for representing causal diagrams and has wonderful documentation. It also comes with a really handy R package accessible through CRAN which can be used to query the DAG for causal effect identification, such as d-separation of variables.
I currently make most figures using R with ggplot2 before exporting to other programs which means that they need to be exported in high resolution.
Anki is a great spaced repetition flashcard app that I started using when I read Michael Nielsen’s post Augmenting Long-term Memory. I make the cards on my computer and review them on my phone with Ankidroid. I first used this system to learn since there are great public decks for that.
As outlined in the GSEA wiki, GMT files store functional pathway information for genes. However, as the figure there makes clear, each line uses tabs as separators and there may be a different number of genes per line.
R Markdown files are a cross between markdown and R files since they can load and run chunks of code alongside markdown-formatted text.
It’s often useful to compare data against a published dataset from another species. These are the most common tasks I complete for this purpose (and the corresponding libraries in R). Unfortunately converting between species always seems to introduce missing identifiers and so I have tried to choose the method which avoids this as much as possible.
Downloading large google drive files via gdrive on Mac and Linux seems to be faster than using the wifi connection directly or with something like curl.
GRanges seems to be the most standard data structure to represent genomic coordinates in R, supported by Bioconductor’s GenomicRanges package. The vignette describes useful associated attributes and functions such as names for each region.
I am a Canadian citizen and moved to the US with F1 status when I started my degree. I couldn’t find that many accounts of how to do this online at the time of writing, so I’ve written up some observations I made (primarily about financial considerations).
The Ensembl gtf file contains the comprehensive gene and transcript information for model organisms e.g. human and mouse. It can be used in RNA-Seq alignment and quantification programs such as STAR.