Originally written and developed by Dr. Todd Riley while he was a postdoc in the Bussemaker Lab, FeatureREDUCE is the software tool that we are continuously enhancing in order to build ever-more-accurate, biophysical sequence-to-affinity models from high-throughput protein-DNA and protein-RNA binding data. FeatureREDUCE is now a collaborative research and software development project between the Riley and Bussemaker Labs. FeatureREDUCE provides us with a flexible, robust framework to incorporate high-order sequence and structural features into protein-NA affinity models. As an end result, FeatureREDUCE achieves quantification of relative binding affinities at an unprecedented level of accuracy – evidenced when our software emerged as the top performer in a recent benchmark study (Weirauch et al., Nature Biotech, 2013).

By directly correlating genome-wide mRNA expression, in-vitro TF binding data (e.g. PBM, SELEX-seq), or in-vivo TF binding data (e.g. ChIP-seq, ChIP-chip) with associated nucleotide sequences, FeatureREDUCE can discover the sequence-specific binding affinity of a TF from a single experiment or a set of replicate experiments. FeatureREDUCE can also build affinity models from both normally-distributed, real-valued binding data (e.g. PBM) and Poisson-distributed, integer-valued binding data (e.g. SELEX-seq).

FeatureREDUCE has the following functionality:

(1) Uses a refined representation of binding specificity, in which high-order sequence features (on top of the positional-independence model ) such as dependencies between nucleotides are detected and modeled explicitly using additional free energy parameters. The resulting FSAM (feature-specific affinity model) can be used to predict the relative binding affinity for any oligomer of a specified length.

(2) Accounts for certain biases that are specific to the PBM technology, including positional binding-biases along the length the PBM probes.

(3) Uses a robust, multivariate, gradient-descent method to find the highest-affinity k-mer to be used as the seed sequence.

(4) Has the ability to detect a symmetric motif (common when the TF binds as a homodimer), and then generate a more accurate and robust symmetric model (with about half as many parameters).

(5) Employs robust regression techniques, which prevents over-fitting and allows for improved estimation of biophysical parameters.

(6) Can also solve the nonlinear saturation model which includes the free-protein concentration parameter [P] in the objective function of the protein-DNA binding reaction at equilibrium.

(7) Includes a Poisson regression framework to build accurate affinity models from the integer-valued read counts found in high-throughput-sequencing binding data (e.g. SELEX-seq data).

Download the FeatureREDUCE package


Dr. Todd Riley
Assistant Professor of Biology
University of Massachusetts Boston
100 Morrissey Blvd. | ISC Building Room 4730
Boston, Massachusetts 02125
Phone: (617) 287-3236