Analysis of flow cytometry data using the Tidyverse suite • flod

Introduction

flod is an R package for statistical learning on flow cytometry data in R. Its primary concern is getting your data as fast as possible into R’s rich ecosystem for modeling; which way you chose is then up to you but it was developed with tidymodels in mind.

Additionally, flod also provides a few convenience functions for feature engineering, visualizing, and reporting based on domain knowledge of traditional flow cytometry analysis (e.g. using FlowJo).

Motivation

Flow cytometry is an abundant methodology for modern life science research. Typically, it involves analyzing whole cells (although particle applications do exist) for the expression of a specific set of proteins and/or other biomolecules. The analysis depends on reading light emission following labelling molecules of interest with probes that can be read by the machine. Typically using fluorescence emission from specific antibody-fluorochrome conjugates to your antigen of interest. The emission is read by the flow cytometer for each event, transformed into an expression value and stored in the universal .fcs file standard. The power of flow cytometry is that it comes with reasonable performance metrics across the board: the operator can analysis up to thousands of cells per second, for a number (up to more than a dozen) several markers, and its cost and difficult of use makes it an accessible machine in most modern facilities.

The motivation for developing flod is to adress one outstanding issue of flow cytometry: its analysis. Most flow cytometer experiments use a third party software for its analysis, which relies on a point-and-click GUI. While this has been pivotal to the huge success of flow cytometry, it makes the analysis suffer from low reproducibility, difficult to ensure integrity of data analysis that is time consuming that is largely repetitive. flod adress all these three points by lifting flow cytometry into R, where the operator can utilize the excellent ecosystem for data analysis and automation to ensure a reproducible and robust data analysis.