Research in the Deeds lab is focused on understanding the dynamics and function of complex molecular networks within cells. We use a variety of approaches to study this problem, including developing new data analytic tools, supervised and unsupervised machine learning, mathematical modeling, biophysical modeling, and experiment. See the sections below for more details on our recent work!
THE SURPRISING FEATURES OF SINGLE-CELL DATA
Over the past 20 years, technological advances have allowed us to characterize the molecular states and behavior of individual cells at an unprecedented scale. In particular, recent advances in single-cell RNA-sequencing (scRNA-seq) have provided us with data on the expression level of every gene in the genome across tens of thousands to millions of single cells. The data produced by these techniques are extremely complex and high-dimensional, necessitating the application of advanced computational techniques to extract meaningful biological insights from these data.
The figure to the left is a typical example of the kind of output this analysis generates–it is a 2-Dimensional UMAP projection of scRNA-seq performed on PBMC cells [citation]. Each dot is an individual cell, and the dots are colored according to the “cell type” assigned by a clustering algorithm typically used in scRNA-seq data analysis. Our group has shown that, while these plots may be aesthetically pleasing, they actually do not in any way represent the structure of the underlying data. In a recent preprint, our lab showed that the dimensionality-reduction steps typically employed in scRNA-seq analysis (e.g. UMAP in this example) distort over 95% of the local neighborhoods in these datasets. In other words, cells that are “close” in the UMAP picture are generally very “far apart” in the underlying gene expression space. Since dimensionality reduction tools are universally employed in scRNA-seq data analysis pipelines, it is unclear how much of the real underlying structure of the data is reflected in final published analyses.
Indeed, more recently, our group showed that there is essentially no evidence that scRNA-seq data is organized into discrete groups of distinct cell types. In other words, if you look at the underlying data, you don’t find a group of, say, B cells off in one corner of gene expression space, and a different group of T cells off in a different corner of gene expression space. Rather, in every single data set we have analyzed, all the cells of all the different types are completely on top of one another in gene expression space. This structure persists even after application of common feature selection and data transformation approaches used in the field. Our findings suggest that new analytical tools need to be developed that can analyze these data despite their complex structure. Our work also calls into question the concordance between available data and the predictions of “Waddington’s landscape,” an 80-year-old paradigm that describes the molecular basis of stable differentiation of different cell types during development.
Our lab is currently leveraging the above findings to develop novel approaches to dimensionality reduction, clustering, differential gene expression, and other common analyses in scRNA-seq.
THE DYNAMICS OF MACROMOLECULAR ASSEMBLY
Many key cellular functions, like protein synthesis and degradation, are carried out by complex “molecular machines.” One example is the proteasome Core Particle (CP), which degrades other proteins and is critical for regulating protein levels within the cell (see figure to the right). This machine is made up of 28 protein subunits, and the cell cannot synthesize the entire machine as you see it here. Rather, the cell makes each of thes 28 pieces separately, and they have to be assembled into a particular structure in order for the machine to function. Our lab uses a combination of mathematical modeling, biophysical modeling and experiment to understand how this process of self-assembly works.