Dr. Riley teaches both the undergraduate Bio-360 and graduate Bio-664 Bioinformatics courses. The undergraduate course uses popular online bioinformatics tools, while the graduate course introduces R programming with popular R Bioconductor packages. The undergraduate course also includes a 1 credit Bio-361 BioPython Lab.

SYLLABUS BIOLOGY 664

**Old Title**: Bioinformatics for Molecular Biologists

**Potential New Title**: Integrated Bioinformatics Using R for Both Wet and Dry Scientists

**Weekly Schedule**: 3:30 – 5:00 Tuesday & Thursday

**Location**: Wheatley Biology Conference Room W-3-022

**Instructor**: Todd Riley

**Office Hours**: Tuesdays and Thursdays from 10:30 to 12:00. Please email if you need to schedule a different time.

**Office**: ISC 4730 (4^{th} floor)

**Email**: todd.riley@umb.edu

**Phone**: 617-287-3236

**I. Overview**: This course has changed quite a bit – starting with the initial changes made last year. Members of our department have agreed that our graduate students need to learn how to use R to analyze their biological data. Loaded with hundreds of packages, the R programming environment really has become the de facto standard for analyzing many different types of biological data. Although R is sometimes clumsy, it is always very powerful – with the latest and greatest statistical tools and also great graphing capability for producing publication-quality figures. In short, we think it’s important that our graduate students learn at least some R to help them analyze their data. So the course has been revamped to include R labs. Each class will begin with a lecture and then end with an R lab where we will apply what we’ve learned. We will be changing the name of the course to reflect its old and new goals:

**Goal 1**: Provide our students with both the computational foundation and the statistical foundation to competently analyze biological data using the R statistical programming environment.

**Goal 2**: Provide a graduate level bioinformatics course that is accessible and rewarding for both wet lab scientists (e.g. molecular biologists) and dry scientists (e.g. machine learning, data mining, statistics).

**Goal 3**: Provide the skillset necessary to design, execute, and analyze a basic research project using R and to highlight the analysis in a “publishable” paper that contains high-quality figures generated in R.

**Getting the Most Out of This Class**: What you get out of a class is highly dependent upon what you put into it. If you are serious about learning how to analyze biological data using the latest and greatest tools to find answers to important questions in biology, you’ve come to the right place! I highly recommend that you commit yourself to diligently studying all the material in each chapter – including the exercises at the end of each chapter. It is also important not to look at the solutions to the exercises until after you have implemented your own solution. The best way to learn how to approach a problem and implement a sound solution is to go through the iterative process yourself.

**Note**: This course analyzes data from molecular and cellular biology – including genomics and systems biology. A similar course with a focus on ecology is taught by Jarret Byrnes. Here is the course info: http://jarrettbyrnes.info/biol697/. For those students who straddle both disciplines, I would suggest you take both courses! Also, this course serves as a possible follow-up to our Bio-360/361 undergraduate bioinformatics course and lab. In the Bio-361 lab, students learn bioinformatics skills using Biopython instead of R and Bioconductor. However, the undergraduate course is not a prerequisite for this course.

**Prerequisites**: An undergraduate course or a graduate course in molecular biology or genetics (Biol 370 or Biol 675 or permission of the instructor). You are also required to have a basic knowledge of algebra and introductory calculus (although no calculus will be used). Undergraduate courses in probability theory, computer science, and genetics are useful, but not required. Students who are new to programming should read chapter 1 of Adler before or during the first week of the course. Students who are new to genetics (or rusty) should read Shultz before or during the first week of the course.

Also, you must install the latest versions of R and R-studio on your laptop before the first day of class. If you are running Windows, please install Cygwin as well and choose “C:\” as your root directory during the installation. The default Cygwin settings for everything else will suffice. You will also need to add the “c:\bin” directory to your path. You can edit your path environment variable by going to Control Panel->System->Advanced system settings->Environment variables->System variables. (Cygin is a full suite of unix shells and utility programs ported to the Windows platform.) **Please bring your laptop to all classes**.

**II. Required Text**: Applied Statistics for Bioinformatics using R – 2^{nd} Edition (DRAFT) , Wim P. Krijnen and Todd R. Riley

**Chapters of the 2 ^{nd} Edition **are provided by the links below

**.**

**PLEASE NOTE: The new chapters below are updated often. After clicking on one of the links below, be sure to hit the “Refresh” button to make sure that you are getting the most recent version:**

- Newly Revised Chapter 1
- Newly Revised Chapter 2
- Newly Revised Chapter 3
- Newly Revised Chapter 4
- Newly Revised Chapter 5
- Newly Revised Chapter 6
- Newly Revised Chapter 7
- Newly Revised Chapter 8
- Newly Revised Chapter 9
- Newly Revised Chapter 10

**Ordering a Hard Copy from Campus Printing**: http://www.umb.edu/quinn_graphics/quinngraphics

Instructor’s Editorial Comments: I think that this book is great technically and really appropriate for this course since it’s completely centered around using R. On the downside, unfortunately in the 1^{st} edition there are some typos, grammatical errors, and some awkward English. Hopefully, the 2nd edition improves upon these weaknesses. I plan to go through most of the material in each chapter, and then finish with ChIP-seq and RNA-seq analysis in R which are not covered in the book.

**Highly Recommended Text**: Adler, J. (2009) R in a Nutshell: A Desktop Quick Reference. O’Reilly.

Purchase: [amazon]

Instructor’s Editorial Comments: This is a great R Reference Manual. In fact, I use this book and I’ve been programming in R for years.

**Recommended Text**: Shultz, M. (2009) The Stuff of Life: A Graphic Guide to Genetics and DNA

Purchase: [amazon]

Instructor’s Editorial Comments: This book provides a great up-to-date, cartoon-style overview of genetics that is both highly informative and somewhat entertaining.

**Useful Online References for R**

Quick-R: http://www.statmethods.net/

R & Bioconductor Manual: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual

Apply Family in R: http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-…

Apply Functions: http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm

Producing Simple Graphs with R: http://www.harding.edu/fmccown/R/

A Handbook of Statistical Analyses Using R – Brian S. Everitt and Torsten Hothorn (PDF)

Statistics Using R with Biological Examples – Kim Seefeld, Ernst Linder (PDF)

simpleR – John Verzani (PDF)

R Fundamentals and Programming Techniques – Thomas Lumley (PDF)

A list of tutorials in R from universities around the world: http://pairach.com/2012/02/26/r-tutorials-from-universities-around-the-world/

**III. Assignments and Final Independent Project**:

1. Problem Sets – 100% of your overall grade is determined by your grades on the problem sets

2. Independent Final Project – I also recommend submitting an extra-credit, final project on biological data of your choosing

**1. Problem Sets**: All problem sets will be done in an R script (*.r file). A major goal of this course is that each student learns how to write clear, concise, well-documented, reusable R code. Please follow the guidelines below:

- The R scripts must be uploaded into Blackboard under “Course Materials” by the due date before the beginning of class.
- Liberally comment your code with lines beginning with the “#” character. The purpose of commenting is to help others and yourself understand the logic behind your code later on. The more commenting you have, the more reusable your code will be later on!
- Define column and row labels in all matrices and data.frames – which makes your data structures more readable and understandable.
- Use named row and column referencing whenever possible – which will make your R code more reusable.
- Use numbered indexing as little as possible. Instead, use named referencing, nrow(), and ncol().
- The submitted R script must load any necessary libraries and/or datasets not included in the base installation of R.
- All lines inside the submitted R script must run error free.
- Use the following naming scheme for all your submitted R scripts: biology664.spring2014.hw
**N**.**firstName**.**lastName**.r (replace bold text). - Make sure that your full name is at the top of the text inside the R script.

**2. Independent Final Project**: Each student can also submit an independent, extra credit project that should be designed and executed by the student. This project can be of considerable benefit to the student if it is closely related to his/her thesis research project or professional research at work. Three potential ideas are to study a gene, pathway, and/or phenotype closely related to your research. For a gene or pathway you can study their expression and/or regulation related to stimuli or phenotypes. For a phenotypic study you can use RNA, protein, and/or epigenetic data to find potential biomarkers. I’m open to ideas, so feel free to run them by me. You are welcome to use you own data, but may need to augment with outside data if you don’t have enough. Of course, you need to use R for your analysis.

**A. Project Proposal**: If a student wishes to submit an independent project, he/she should prepare a brief proposal, 2-4 pages, describing the independent project and must submit this proposal no later than October 27. The proposal should be divided into four sections:

1. Background and objectives: A description of the background of the biological system and the question(s) that you hope to answer.

2. Computational methods: The computational methods that you intend to use to answer the question(s) in your proposal.

3. Discussion: A brief description of how you plan to evaluate the biological significance of the results of your computer analysis. It’s very important in science to motivate your audience to care about your work with its “Impact” or “Significance”.

4. Several references describing the background of your proposed project.

The proposal will not be graded, because its sole purpose is to determine whether the objectives of the project are reasonable and interesting.

Please note that the final project should be designed to test a biological hypothesis. I don’t consider projects that are purely technical, such as designing PCR primers, to be appropriate at the graduate level.

**B. Final Report**: The optional final report should be in the form of a scientific paper, divided into the following sections: (1) Abstract, (2) Background and objectives, (3) Computational methods, (4) Results and discussion, (5) Conclusions, (6) A brief description of how the conclusions of your analyses could be tested using biochemical or genetic techniques, (7) References.

**References**: Please follow the Cell Journal guidelines for references EXACTLY. I highly recommend that you use a referencing and bibliography software package like EndNote, Zotero, etc. It will make your life much easier! References in the text should include the authors’ names and dates:

– One author: (Pearson, 1996)

– Two authors: (Smith and Waterman, 1981)

– Three or more authors: (Altschul et al., 1990)

– Multiple references: (Pearson, 1996; Smith and Waterman, 1981; Altschul et al., 1990)

The references in the bibliography should also adhere to the Cell Journal format:

– Journal article: Lipman, D.J., Pearson, W.R. (1985). Rapid and sensitive protein similarity searches. Science 227, 1435-1441.

– Book chapter: Schuler G.D. (1998). Sequence alignment and database searching. In: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, AD Baxevani and BFF Ouellette, eds. Wiley Interscience, New York, NY.

**Organization**: Please try to organize the sequence information and the interpretations as clearly as possible. It is unreasonable to expect the reader to hunt through large numbers of pages to find data supporting a specific conclusion. There are two acceptable ways of organizing the figures. First, the sequence data and text can be integrated into the body of the paper. Second, the sequence data can be compiled into a series of clearly-labeled appendices.

**Figures**: Every figure should have a caption adequately describing the contents of the figure without having to resort to reading the main text. There must be **at least 4 figures** created by the student, and **at least 3 of them should be created in R**.

**Length**: The final report should be 10-15 pages double-spaced, not including computer output or references.

**IV. Classroom Policy**

**Honesty**: The Homework assignments are intended to be done individually. You can talk with each other, but all submitted work should be done strictly on your own. All students are expected to follow the University’s Code of Student Conduct. (If you are caught cheating, whether you are the giver, receiver, or collaborator, the consequences can be dire.)

**Accommodations**: Section 504 of the Americans with Disabilities Act of 1990 offers guidelines for curriculum modifications and adaptations for students with documented disabilities. If applicable, students may obtain adaptation recommendations from the Ross Center for Disability Services, Campus Center 2nd Floor, 2100 Street, Room 2010, 617-287-7430. The student must present these recommendations and discuss them with each professor within a reasonable period, preferably by the end of the Add/Drop period.

**V. Course Material**

**R Code**: Unfortunately, some R code in the textbook drifts off the page and is lost. Also, solutions are missing for some exercises and don’t work for others. To solve these problems, I’ve provided all the R Code from each chapter (including solutions to the exercises) in R scripts separated by chapter. I’ve also reformatted some of the code to increase readability. Please use these R scripts in class with RStudio:

Dr Kesseli’s Question R Script

Chapter 2 Supplemental R Script

Chapter 3 Supplemental R Script

Chapter 4 Supplemental R Script

Chapter 8 Supplemental R Script

**PDFs**: Other course material including lecture slides and papers will be posted below in the Course Schedule:

**VI. Course Schedule**

**Note**: The following course schedule is tentative and may change depending on the needs and wishes of the students. We may spend more time on certain course material and skip other material based on student feedback.

Tuesday, September 5 – Chapter 1: Brief Introduction into Using R – programming in RStudio

Thursday, September 7 – Chapter 1: Brief Introduction into Using R – vectors, lists, matrices, data.frames

- Lecture 2 Slides – Intro to Cancer and the Golub et al. paper
- Lecture 2 Handouts
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring – Golub et al., Science 1999

Tuesday, September 12 – Chapter 1: Brief Introduction into Using R – more matrices and data.frames

Thursday, September 14 – Chapter 2: Data Display and Descriptive Statistics – univariate data display

Tuesday, September 19 – Chapter 2: Data Display and Descriptive Statistics – descriptive statistics

- Online courseware to assist students in: (1) acquiring the fundamental toolset, and then (2) learning how to appy these tools effectively and efficiently in R while analyzing biological data.
- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)

- Using R Markdown with RStudio
- Knitr Website
- Knitr in a Nutshell – Markdown
- Knitr in a Nutshell – Latex
- Knitr with Markdown example
- Knitr with LaTeX example
- Short “Using Knitr” Video
- proTeXt for Windows and MacTex for Mac
- LaTeX Homepage
- LaTex Cheat Sheet
- Lecture 3 Slides – Correlation & Intro to Microarray Technologies
- Lecture 3 Handouts

Thursday, September 21 – Chapter 3: Important Distributions – binomial, Poisson, normal, cumulative

- Online courseware to assist students in: (1) acquiring the fundamental toolset, and then (2) learning how to appy these tools effectively and efficiently in R while analyzing biological data.
- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- JBstatistics – Introduction to Discrete Random Variables and Discrete Probability Distributions
- JBstatistics – Introduction to the Bernoulli Distribution
- JBstatistics – An Introduction to the Binomial Distribution
- JBstatistics – An Introduction to the Poisson Distribution
- JBstatistics – An Introduction to Continuous Probability Distributions
- JBstatistics – An Introduction to the Normal Distribution
- JBstatistics – Introduction to the Central Limit Theorem
- JBstatistics – The Relationship Between the Binomial and Poisson Distributions
- JBstatistics – The Normal Approximation to the Binomial Distribution
- JBstatistics – Finding Probabilities and Percentiles for a Continuous Probability Distribution

- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- Lecture 4 Slides – Binomial, Poisson, and Normal Distributions
- Lecture 4 Handouts
- Problem Set 2: Creating and manipulating data.frames part 2

Tuesday, September 26 – Chapter 3: Important Distributions – Χ^{2}, T, F, hypergeometric

- Online courseware to assist students in: (1) acquiring the fundamental toolset, and then (2) learning how to appy these tools effectively and efficiently in R while analyzing biological data.
- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- JBstatistics – An Introduction to the Chi-Square Distribution
- JBstatistics – An Introduction to the t Distribution (non-technical)
- JBstatistics – An Introduction to the t Distribution (Includes some mathematical details)
- JBstatistics – An Introduction to the F Distribution
- JBstatistics – An Introduction to the Geometric Distribution
- JBstatistics – An Introduction to the Hypergeometric Distribution
- JBstatistics – Overview of Some Discrete Probability Distributions (Binomial,Geometric,Hypergeometric,Poisson,NegB)

- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- Lecture 5 Slides – Z, T, F, and Hypergeometric Distributions
- Lecture 5 Handouts
- Problem Set 3: Probability distributions with data.frames

Thursday, September 28 – Chapter 4: Estimation and Inference – Z-test, t-Test, F-test, binomial

- Online courseware to assist students in: (1) acquiring the fundamental toolset, and then (2) learning how to appy these tools effectively and efficiently in R while analyzing biological data.
- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- JBstatistics – An Introduction to Hypothesis Testing
- JBstatistics – Z-Scores (As a Descriptive Measure of Relative Standing)
- JBstatistics – Z Tests for One Mean: Introduction
- JBstatistics – Z Tests for One Mean: The Rejection Region Approach
- JBstatistics – Z Tests for One Mean: The p-value
- JBstatistics – What is a p-value? (Updated and extended version)
- JBstatistics – Type I Errors, Type II Errors, and the Power of the Test
- JBstatistics – Calculating Power and the Probability of a Type II Error (A One-Tailed Example)
- JBstatistics – One-Sided Test or Two-Sided Test?
- JBstatistics – Introduction to Confidence Intervals
- JBstatistics – Intro to Confidence Intervals for One Mean (Sigma Known)
- JBstatistics – The Relationship Between Confidence Intervals and Hypothesis Tests
- JBstatistics – Standardizing Normally Distributed Random Variables
- JBstatistics – What Factors Affect the Power of a Z Test?
- JBstatistics – t Tests for One Mean: Introduction
- JBstatistics – t Tests for One Mean: An Example
- JBstatistics – Hypothesis Tests on One Mean: A t Test Example
- JBstatistics – Hypothesis tests on one mean: t test or z test?
- JBstatistics – Inference for Two Means: Introduction
- JBstatistics – Pooled-Variance t Tests and Confidence Intervals: Introduction (Two Means)
- JBstatistics – Welch (Unpooled Variance) t Tests and Confidence Intervals: Introduction (Two Means)
- JBstatistics – Pooled or Unpooled Variance t Tests and Confidence Intervals? (To Pool or not to Pool?)
- JBstatistics – Finding Percentiles and Areas for the F Distribution Using R
- JBstatistics – Hypothesis Tests for Equality of Two Variances (F-test)

- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- Lecture 6 Slides – Z, T, and F Tests
- Lecture 6 Handouts

Tuesday, October 3 – Chapter 4: Estimation and Inference – Χ^{2}-test, Fisher’s exact test, normality tests, outliers, Wilcoxon rank-sum test

- Online courseware to assist students in: (1) acquiring the fundamental toolset, and then (2) learning how to appy these tools effectively and efficiently in R while analyzing biological data.
- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- JBstatistics – Chi-square tests for count data: Finding the p-value
- JBstatistics – Chi-square tests: Goodness of Fit for the Binomial Distribution
- JBstatistics – Hypothesis Tests for One Population Variance (X
^{2}-test) - JBstatistics – t Tests for One Mean: Investigating the Normality Assumption
- JBstatistics – Pooled-Variance t Procedures: Investigating the Normality Assumption

- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- Lecture 7 Slides – F, Bionomial, X
^{2}, and Wilcoxon Rank Sum Tests - Lecture 7 Handouts

Thursday, October 5 – Chapter 5: Linear Models – lm, rlm, ANOVA

- Online courseware to assist students in: (1) acquiring the fundamental toolset, and then (2) learning how to appy these tools effectively and efficiently in R while analyzing biological data.
- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- JBstatistics – Introduction to Simple Linear Regression
- JBstatistics – Simple Linear Regression: Always Plot Your Data!
- JBstatistics – Simple Linear Regression: The Least Squares Regression Line
- JBstatistics – Simple Linear Regression: Interpreting Model Parameters
- JBstatistics – Simple Linear Regression: Transformations
- JBstatistics – Simple Linear Regression: An Example
- JBstatistics – Introduction to One-Way ANOVA
- JBstatistics – Finding the P-value in One-Way ANOVA
- JBstatistics – One-Way ANOVA: The Formulas
- JBstatistics – A One-Way ANOVA Example

- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)
- Lecture 8 Slides – Simple, Generalized, and Multivariate Regression and ANOVA
- Lecture 8 Handouts
- Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival – Chiaretti et al., Blood 2003
- A practical guide to understanding Kaplan-Meier curves – Rich et al., Otolaryngology–Head and Neck Surgery 2010
- Cox Proportional-Hazards Regression for Survival Data – Fox, 2002
- Problem Set 4: Hypothesis testing using data.frames

Tuesday, October 10 – Chapter 5: Linear Models – assumptions, robust test, R^{2}

- Online courseware to assist students in: (1) acquiring the fundamental toolset, and then (2) learning how to appy these tools effectively and efficiently in R while analyzing biological data.
- JBstatistics videos are excellent and numerous. He covers almost any topic in basic statistics. (Also, he has a Canadian accent. He says “zed” instead of “zee”.)

- Lecture 9 Slides – 2-way ANOVA, Assumptions, and Robust Regression
- Lecture 9 Handouts

Thursday, October 12 – Chapter 5: Linear Models – applications, exercises

Tuesday, October 17 – Chapter 6: Microarray Analysis – preprocessing, filtering, linear models, annotating

- Lecture 11 Slides – Microarray Processing and the MLL.b Dataset
- Lecture 11 Handouts
- Problem Set 5: Linear Models Using data.frames
- Classification of pediatric acute lymphoblastic leukemia by gene expression profiling – Ross et al., Blood 2003

Thursday, October 19 – Chapter 6: Microarray Analysis – GO analysis, interpreting

- Lecture 12 Slides – Bayesian Statistics, Contrasts, and Gene Ontologies
- Lecture 12 Handouts
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments – G Smyth 2004

Tuesday, October 24 – Chapter 6: Microarray Analysis – exercises

- Using the GEOquery package – Sean Davis
- Building R objects from ArrayExpress datasets – Audrey Kaumann
- Problem Set 6: Microarray Analysis Using data.frames

Thursday, October 26 – Chapter 7: Cluster Analysis and Trees – distance, single linkage, k-means

- Lecture 13 Slides – Heirarchical, K-means, Clustering and Guassian Mixture Models
- Lecture 13 Handouts

Tuesday, October 31 – Chapter 7: Cluster Analysis and Trees – correlation coefficient, PCA

- Lecture 14 Slides – Pearson and Spearman Correlations and PCA
- Lecture 14 Handouts
- Package ‘rpart’ – Ripley
- Bioconductor’s multtest package – Dudoit and Ge
- Applications of Multiple Testing Procedures: ALL Data – Pollard, et al.
- Problem Set 7: Cluster Analysis and Trees Using data.frames

Thursday, November 2 – Chapter 8: Classification Methods – ROC curves, AUROC, trees, Random Forests

- Lecture 15 Slides – Confusion Matrix, ROC Curves, CARTs, Decision Trees, and Random Forests
- Lecture 15 Handouts
- ROCR R Package – Sing, et al.
- randomForest R Package – Breiman and Cutler

Tuesday, November 7 – Chapter 8: Classification Methods – SVMs, neural nets, logistic regression

- Lecture 16 Slides – SVMs, Neural Nets, and Logistic Regression
- Lecture 16 Handouts
- SVM R Package e1071 – Meyer, et al.
- nnet R Package – Ripley and Venables
- Generalized Linear Models in R – Geyer
- Package ‘faraway’ – Julian Faraway

Thursday, November 9 – Chapter 9: Analyzing Sequences – querying, pattern-matching, motif-finding, PWMs

- Lecture 17 Slides – Sequence Motifs, Pattern Matching, Motif Finding, PWMs, and Sequence Logos
- Lecture 17 Handouts
- Package ‘seqinr’ – Charif, et al.
- Package ‘Biostrings’ – Pages, et al.
- Efficient genome searching with Biostrings and the BSgenome data packages – Herve Pages
- Biostrings Quick Overview – Herve Pages
- Biostrings/BSgenome Overview Slides- Herve Pages
- AAindex: Amino Acid Index Database – Kawashima, et al.
- The TFBSTools package overview – Ge Tan
- Package ‘PWMEnrich’ – Robert Stojnic

Tuesday, November 14 – Chapter 9: Analyzing Sequences – local and global alignments, and Substitution Matrices

- Lecture 18 Slides – Local and Global Alignments, Substitution Matrices, Pam, and Blosum Matrices
- Lecture 18 Handouts
- Pairwise Sequence Alignments – Patrick Aboyoun
- Problem Set 8: Classification Methods Using data.frames

Thursday, November 16 – Chapter 9: Analyzing Sequences – dynamic programming, BLAST, MUSCLE

- Lecture 19 Slides – Dynamic Programming, Needleman-Wunsch, Smith-Waterman, BLAST, MUSCLE
- Lecture 19 Handouts
- Package ‘annotate’ – Gentleman
- Package ‘muscle’ – Edgar, Kalinka

Tuesday, November 21 – RNA-seq Analysis, File Formats

- Lecture 20 Slides – RNA-seq Analysis and File Formats
- Lecture 20 Handouts
- easyRNASeq.r
- easyRNASeq Package
- easyRNASeq Overview
- DESeq Package
- DESeq Overview
- edgeR Package
- edgeR Overview
- Problem Set 9: Analyzing Sequences

Thursday, November 23 – Thanksgiving Vacation

Tuesday, November 28 – RNA-seq Analysis

- Computational methods for transcriptome annotation and quantification using RNA-seq – Garber, et al. 2011
- From RNA-seq reads to differential expression results – Oshlack, et al. 2010

Thursday, November 30 – Chapter 10: Markov Models – random sampling, transition matrix, stationary distribution

Tuesday, December 5 – Chapter 10: Markov Models – phylogenetic trees

- Lecture 22 Slides – Phylogenetic Trees, Maximum Parsimony, Maximum Likelihood
- Lecture 22 Handouts
- ape Package (Analyses of Phylogenetics and Evolution)
- phangorn Package

Thursday, December 7 – Chapter 10: Markov Models – Hidden Markov Models, profile HMMs, PFAM

- Lecture 23 Slides – Hidden Markov Models, profile HMMs, and PFAM
- Lecture 23 Handouts
- HMMCopy Package
- Integrating copy number polymorphisms into array CGH analysis using a robust HMM – Shah, et al. 2006
- Lecture 24 Slides – Presentation Skills
- Lecture 24 Handouts

Tuesday, December 12 – Extra make-up class

Tuesday, December 19 – Optional independent project due by 5pm!