This is a basic tutorial in R for switching between the two most common data formats: wide and long. The example dataset we will use is made up of RT-PCR threshold cycle (Ct) values under two conditions (A and B), with two replicates each.
Most bioinformaticians don’t like to do experiments. We love the biological stories, but give us a pipette and we will quickly groan. I have spent the past two years in a virology lab, mainly working on computational problems, and eternally debating if I should plunge into the pool of experimental biology. Up until now, I have only dipped my toes in a few basic experiments, but I have recently decided to stop working on purely computational projects and transition to ones that require experimental skills. Here are my reasons.
"No cilantro, please" Yet I know when it arrives It will be present
The first time I tried cilantro I didn’t realize it; I just thought somebody had emptied a bottle of Old Spice on my pizza in an attempt to poison me. Cilantro tastes like soap to approximately 10% of the people who have had their genotype analyzed by 23andMe. The currently accepted explanation is that those of us who passionately despise cilantro were born with a genetic variant known as a single-nucleotide polymorphism (or SNP, pronounced ‘snip’).
I keep a 2000-lines long file full of R commands that I considered worthy of remembering at some point during the past 5 years. Life is too short, so here are 7 tips that don’t get enough publicity, ranked from most to least awesome.
I keep forgetting how to defend myself against the feeling of being stuck—writers call it writer’s block, painters call it feeling uninspired, runners don’t have a name for it, they just suck it up and keep running—, so this post is a reminder on the two techniques that help me neutralize the resistance to get back to work.
Whenever I read a paper that describes an experimental technique that I’m not familiar with (which is often, since the lab bench is not where bioinformatic students spend most their time), I compensate by doing background research. Lately, I have been reading a lot about the antiviral effects of interferon-stimulated genes (ISGs), and since John Schoggins is giving a seminar in my university on Monday, I thought I would post about a clever technique that he used in his Nature paper to quantify the impact of ISGs on viral replication.
I gave a presentation today in the Microbiology departmental retreat at Boston University, so I though I’d share it.
The goal of the research project I have been working on for the past two years, as part of my Bioinformatics PhD at Boston University, is to use blood samples from infected patients to determine which virus is causing the infection.
Traditional diagnostic methods, like detection by ELISA, are only effective after the virus has had enough time to replicate in the blood of the patient. However, for some viral infections, when this happens it is already too late for the treatment to be effective.
The approach that we have taken in my lab is to measure the transcriptional changes that take place in the peripheral blood cells, and identify patterns of expression that are unique for each type of infection. This indirect way of viral detection has the potential to reduce diagnosis time by several days, since it is known that these transcriptional changes precede the appearance of virus particles in the blood, and we believe that they are specific enough to discriminate between viruses.
D3 is an awesome tool to build interactive visualizations. I use it extensively for the initial stages of my research projects, but converting D3 charts from SVG to PDF and polishing them for publication can be a harrowing experience. This post describes a few techniques that make this process more pleasant. If you have any additional ideas, please leave a comment at the bottom of the post.
We finished Part 2 by creating an associative array, increasing the value of each of its keys for every line that contained a matching key, and used the
END rule to print each key’s final count.
Back in Part 1, we learned how to tell AWK to select specific lines from a tab-separated transcriptome file (using the rule
'$3 == "gene"') and return a specific column (using the
In this post, we are going to count how many exons make up each protein-coding gene.