Data Analysis in the Musser Lab

A Practical Guide to Reproducible Research

Author

Jacob Musser, in collaboration with Claude Code

Published

February 2026

Welcome

This guide teaches reproducible, collaborative data analysis workflows for the Musser Lab. Most lab projects use both R and Python: R is often best for statistical analysis, plotting, and many important legacy packages used for RNA sequencing and proteomic analysis (e.g. Seurat, DESeq2, and limma); Python is often best for automating analysis pipelines (including in bash), manipulating files, working with sequences and strings, and a growing number of RNAseq data analysis libraries (e.g. Scanpy, metacell) and plotting libraries (e.g. Matplotlib). The goal of this guide is to help you set up a standard workflow for creating computational projects that are well-managed, documented, collaborative, and reproducible.

Who This Guide Is For

This guide is for lab members who have some experience writing code — maybe you’ve used R in RStudio, written Python scripts, or worked through a data analysis tutorial. You should be comfortable with basics like variables, functions, and reading error messages, but you don’t need to be an expert. You also don’t need any prior experience with the specific tools we’ll use (Positron, conda, renv, Git) — we’ll walk you through all of those.

How to Use This Book

Start with Part 1: Quick Start. This is the hands-on introduction. You’ll install a few tools, then work through a real single-cell RNA-seq analysis in Positron — learning the IDE, Quarto documents, and interactive coding along the way. This is where every new lab member should begin.

Parts 2–4 are reference material that you’ll use as you need it:

  • Part 2: Core Tools — Deeper coverage of each tool in the lab stack: Positron, project organization, Quarto, renv, conda, and Git/GitHub. Come here when you need to understand how something works or troubleshoot it.
  • Part 3: Working with AI — How to work effectively with Claude Code as a thinking partner and coding collaborator. Covers AI fluency, getting started, project configuration, daily workflows, the lab toolkit, and safety.
  • Part 4: Workflows — Practical guides for common tasks: setting up a new project, collaborating with others, ensuring reproducibility, and troubleshooting.
ImportantWork in Progress

Parts 1–3 are complete. Part 4 (Workflows) is still being written — some chapters are drafts.

By the End of Your First Project

After working through Part 1, you should be able to:

  1. Open a project folder in Positron and navigate the IDE (file explorer, console, environment pane, plots pane)
  2. Run R code interactively in a Quarto document (.qmd) and see results in real time
  3. Perform a basic single-cell RNA-seq analysis — from loading a count matrix to generating a UMAP
  4. Render a Quarto document into a self-contained HTML report
  5. Use renv to manage R package dependencies for a project
WarningClaude Code

Throughout this guide, you’ll see orange boxes like this one. Each shows how Claude Code — an AI coding assistant — can help with that chapter’s topic. Each box has an example prompt and a brief explanation of what Claude Code will do.

These aren’t magic incantations — they’re examples of how to ask for help effectively. The key is being specific: include the error message, the file name, or what you’re trying to accomplish. Claude Code works best when you give it context.

You’ll install Claude Code in the Installation chapter and can start using these prompts right away.

Our Goal

We want every lab member to feel confident doing computational work — not just running scripts someone else wrote, but understanding and building their own analyses. The tools in this guide are chosen to make that easier: modern editors that help you explore data interactively, environment managers that keep your projects organized, version control that lets you share code and collaborate without fear of breaking things, and AI assistants like Claude Code that can help you write, debug, and learn faster. Everything we do is oriented around one principle: your analyses should be easy to reproduce and share with others in the lab and beyond.