modelBroswer_open_960Our lab is leveraging and building upon our protein-DNA sequence-to-affinity inference method that emerged as the top performer in a recent benchmark study that compared all 26 known protein-DNA-affinity supervised learning algorithms. The Riley lab also integrates these affinity models with DNA accessibility (DNaseI-seq) and transcriptome (RNA-seq) data in a Systems Biology approach to find TF-markers and other biomarkers of developmental and cancer phenotypes. For example, we are seeking to identify the vital TFs and possibly their mutations and/or splice variants in the aberrant gene regulatory pathway(s) that confer tumor formation, tumor growth, drug resistance, and/or metastasis of a cancer. Recent advances in high-throughput in vivo and in vitro protein-DNA binding assays and mRNA sequencing provide us now with a unique opportunity to integrate these measurements and create accurate and mechanistically insightful biophysical models. In turn, these models will enable us to push forward to achieve our scientific goal of understanding and controlling the mechanisms of gene regulation gone awry.


Selected Publications

MainFigures.rev42 - gcn4

Riley, et al. Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
eLife 4 (2015): e06397. PMC

Transcription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity of hundreds of eukaryotic transcription factors, yet algorithms for analyzing such data have not yet fully matured. Here we present a general framework (FeatureREDUCE) for building sequence-to-affinity models based on a biophysically interpretable and extensible model of protein-DNA interaction that can account for dependencies between nucleotides within the binding interface or multiple modes of binding. When training on protein binding microarray (PBM) data, we use robust regression and modeling of technology-specific biases to infer specificity models of unprecedented accuracy and precision. We provide quantitative validation of our results by comparing to gold-standard data when available.

weirauch.2013Weirauch et al. Evaluation of methods for modeling transcription factor sequence specificity
Nature Biotech. 31, 126–134 (2013)

Our biophysical sequence-to-affinity inference method, FeatureREDUCE, emerged as the top performer in a recent benchmark study that compared all 26 known PBM supervised learning algorithms. The FeatureREDUCE algorithm parameterizes the relative affinity of all possible DNA sequences in terms of a small set of free energy parameters associated with base pair substitutions and possible dependencies between positions. In addition, our algorithm accounts for the considerable PBM-specific biases in the data, and uses robust regression techniques to resist overfitting to noise. In all, we achieve quantification of relative binding affinities at an unprecedented level of accuracy.

exdLab.exdScr.exdUbx.12mers.loess_.2.croppedSlattery, Riley et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins
Cell 147, 1270-1282 (2011)

We present a new in vitro high-throughput protocol to determine the sequence specificity of TF-DNA complexes to achieve a quantification of relative binding affinities at an unprecedented level of accuracy. Our new SELEX-seq method combines classical protein-DNA SELEX (Systematic Evolution of Ligands by EXponential Enrichment) assays with massively parallel sequencing. Our biophysical model accounts for potential biases in the initial pool of dsDNA oligos, and includes two separate supervised learning methods to generate replicate affinity models that can be compared for quality assurance. Our new method led to 3 discoveries: (1) Hox DNA binding specificities change when they bind with the cofactor Exd, (2) the binding specificities of ExdHox heterodimers group into three classes, and (3) preferred binding sites of anterior and posterior Hox proteins have distinct shapes.


Riley et al. The p53HMM algorithm: using profile hidden markov models to detect p53-responsive genes
BMC Bioinformatics 2009, 10:111

We show that Profile Hidden Markov Models (PHMMs) can considerably boost predictive power over position weight matrices (PWMs) when the binding motif is degenerate and tolerates insertions and/or deletions at various positions. In addition, when the RE has a known repeated and/or palindromic motif, this prior knowledge can be used to correspond parameters in the model to exploit the redundancy in the motif. We present a novel “Corresponded Baum-Welch” training algorithm that significantly boosts the predictive power of the p53RE model. When the motif is not known, all possible motifs for the given size can be sampled, and cross-validation techniques leveraged, to infer the correct motif that maximizes predictive power. The maximally predictive p53-binding motif corresponds the four quarter-sites in a combined-palindromic structure.


Dr. Todd Riley
Assistant Professor of Biology
University of Massachusetts Boston
100 Morrissey Blvd. | ISC Building Room 4730
Boston, Massachusetts 02125
Phone: (617) 287-3236