Table of Contents
Click on a chapters to see its contents
Chapter 1: Biology in a Nutshell
1.1 Biological Overview
1.2 Cells
1.3 Inheritance
1.3.1 Mitosis and Meiosis
1.3.2 Recombination and Variation
1.3.3 Biological String Manipulation
1.3.4 Genes
1.3.5 Consequences of Variation: Evolution
1.4 Information Storage and Transmission
1.4.1 DNA
1.4.2 RNA
1.4.3 Proteins
1.4.4 Coding
1.5 Experimental Methods
1.5.1 Working with DNA and RNA
1.5.2 Working with Proteins
1.5.3 Types of Experiments
References
1.2 Cells
1.3 Inheritance
1.3.1 Mitosis and Meiosis
1.3.2 Recombination and Variation
1.3.3 Biological String Manipulation
1.3.4 Genes
1.3.5 Consequences of Variation: Evolution
1.4 Information Storage and Transmission
1.4.1 DNA
1.4.2 RNA
1.4.3 Proteins
1.4.4 Coding
1.5 Experimental Methods
1.5.1 Working with DNA and RNA
1.5.2 Working with Proteins
1.5.3 Types of Experiments
References
Chapter 2: Words
2.1 The Biological Problem
2.2 Biological Words: k = 1 (Base Composition)
2.3 Introduction to Probability
2.3.1 Probability Distributions
2.3.2 Independence
2.3.3 Expected Values and Variances
2.3.4 The Binomial Distribution
2.4 Simulating from Probability Distributions
2.5 Biological Words: k = 2
2.6 Introduction to Markov Chains
2.6.1 Conditional Probability
2.6.2 The Markov Property
2.6.3 A Markov Chain Simulation
2.7 Biological Words with k = 3: Codons
2.8 Larger Words
2.9 Summary and Applications
References
Exercises
2.2 Biological Words: k = 1 (Base Composition)
2.3 Introduction to Probability
2.3.1 Probability Distributions
2.3.2 Independence
2.3.3 Expected Values and Variances
2.3.4 The Binomial Distribution
2.4 Simulating from Probability Distributions
2.5 Biological Words: k = 2
2.6 Introduction to Markov Chains
2.6.1 Conditional Probability
2.6.2 The Markov Property
2.6.3 A Markov Chain Simulation
2.7 Biological Words with k = 3: Codons
2.8 Larger Words
2.9 Summary and Applications
References
Exercises
Chapter 3: Word Distributions and Occurrences
3.1 The Biological Problem
3.1.1 Restriction Endonucleases
3.1.2 The Problem in Computational Terms
3.2 Modeling the Number of Restriction Sites in DNA
3.2.1 Specifying the Model for a DNA Sequence
3.2.2 The Number of Restriction Sites
3.2.3 Test with Data
3.2.4 Poisson Approximation to the Binomial Distribution
3.2.5 The Poisson Process
3.3 Continuous Random Variables
3.4 The Central Limit Theorem
3.4.1 Confidence Interval for Binomial Proportion
3.4.2 Maximum Likelihood Estimation
3.5 Restriction Fragment Length Distributions
3.5.1 Application to Data
3.5.2 Simulating Restriction Fragment Lengths
3.6 k-word Occurrences
References
Exercises
3.1.1 Restriction Endonucleases
3.1.2 The Problem in Computational Terms
3.2 Modeling the Number of Restriction Sites in DNA
3.2.1 Specifying the Model for a DNA Sequence
3.2.2 The Number of Restriction Sites
3.2.3 Test with Data
3.2.4 Poisson Approximation to the Binomial Distribution
3.2.5 The Poisson Process
3.3 Continuous Random Variables
3.4 The Central Limit Theorem
3.4.1 Confidence Interval for Binomial Proportion
3.4.2 Maximum Likelihood Estimation
3.5 Restriction Fragment Length Distributions
3.5.1 Application to Data
3.5.2 Simulating Restriction Fragment Lengths
3.6 k-word Occurrences
References
Exercises
Chapter 4: Physical Mapping of DNA
4.1 The Biological Problem
4.2 The Double-Digest Problem
4.2.1 Stating the Problem in Computational Terms
4.2.2 Generating the Data
4.2.3 Computational Analysis of Double Digests
4.2.4 What Did We Just Do?
4.3 Algorithms
4.4 Experimental Approaches to Restriction Mapping
4.5 Building Contigs from Cloned Genome Fragments
4.5.1 How Many Clones Are Needed?
4.5.2 Building Restriction Maps from Mapped Clones
4.5.3 Progress in Contig Assembly
4.6 Minimal Tiling Clone Sets and Fingerprinting
4.2 The Double-Digest Problem
4.2.1 Stating the Problem in Computational Terms
4.2.2 Generating the Data
4.2.3 Computational Analysis of Double Digests
4.2.4 What Did We Just Do?
4.3 Algorithms
4.4 Experimental Approaches to Restriction Mapping
4.5 Building Contigs from Cloned Genome Fragments
4.5.1 How Many Clones Are Needed?
4.5.2 Building Restriction Maps from Mapped Clones
4.5.3 Progress in Contig Assembly
4.6 Minimal Tiling Clone Sets and Fingerprinting
Chapter 5: Genome Rearrangements
5.1 The Biological Problem
5.1.1 Modeling Conserved Synteny
5.1.2 Rearrangements of Circular Genomes
5.2 Permutations
5.2.1 Basic Concepts
5.2.2 Estimating Reversal Distances by Cycle Decomposition
5.2.3 Estimating Reversal Distances Between Two Permutations
5.3 Analyzing Genomes with Reversals of Oriented Conserved Segments
5.4 Applications to Complex Genomes
5.4.1 Synteny Blocks
5.4.2 Representing Genome Rearrangements
5.4.3 Results from Comparison of Human and Mouse Genomes
5.1.1 Modeling Conserved Synteny
5.1.2 Rearrangements of Circular Genomes
5.2 Permutations
5.2.1 Basic Concepts
5.2.2 Estimating Reversal Distances by Cycle Decomposition
5.2.3 Estimating Reversal Distances Between Two Permutations
5.3 Analyzing Genomes with Reversals of Oriented Conserved Segments
5.4 Applications to Complex Genomes
5.4.1 Synteny Blocks
5.4.2 Representing Genome Rearrangements
5.4.3 Results from Comparison of Human and Mouse Genomes
Chapter 6: Sequence Alignment
6.1 The Biological Problem
6.2 Basic Example
6.3 Global Alignment: Formal Development
6.4 Local Alignment: Rationale and Formulation
6.5 Number of Possible Global Alignments
6.6 Scoring Rules
6.7 Multiple Alignment
6.8 Implementation
6.2 Basic Example
6.3 Global Alignment: Formal Development
6.4 Local Alignment: Rationale and Formulation
6.5 Number of Possible Global Alignments
6.6 Scoring Rules
6.7 Multiple Alignment
6.8 Implementation
Chapter 7: Rapid Alignment Methods: FASTA and BLAST
7.1 The Biological Problem
7.2 Search Strategies
7.2.1 Word Lists and Comparison by Content
7.2.2 Binary Searches
7.2.3 Rare Words and Sequence Similarity
7.3 Looking for Regions of Similarity Using FASTA .
7.4 BLAST
7.4.1 Anatomy of BLAST: Finding Local Matches
7.4.2 Anatomy of BLAST: Scores
7.5 Scoring Matrices for Protein Sequences
7.5.1 Rationale for Scoring: Statement of the Problem
7.5.2 Calculating Elements of the Substitution Matrices
7.5.3 How Do We Create the BLOSUM Matrices?
7.6 Tests of Alignment Methods
7.2 Search Strategies
7.2.1 Word Lists and Comparison by Content
7.2.2 Binary Searches
7.2.3 Rare Words and Sequence Similarity
7.3 Looking for Regions of Similarity Using FASTA .
7.4 BLAST
7.4.1 Anatomy of BLAST: Finding Local Matches
7.4.2 Anatomy of BLAST: Scores
7.5 Scoring Matrices for Protein Sequences
7.5.1 Rationale for Scoring: Statement of the Problem
7.5.2 Calculating Elements of the Substitution Matrices
7.5.3 How Do We Create the BLOSUM Matrices?
7.6 Tests of Alignment Methods
Chapter 8: DNA Sequence Assembly
8.1 The Biological Problem
8.2 Reading DNA
8.2.1 Biochemical Preliminaries
8.2.2 Dideoxy Sequencing
8.2.3 Analytical Tools: DNA Sequencers
8.3 The Three-Step Method: Overlap, Layout, and Multiple Alignment 8.4 High-Throughput Genome Sequencing
8.4.1 Computational Tools
8.4.2 Genome-Sequencing Strategies
8.4.3 Whole-Genome Shotgun Sequencing of Eukaryotic Genomes
8.2 Reading DNA
8.2.1 Biochemical Preliminaries
8.2.2 Dideoxy Sequencing
8.2.3 Analytical Tools: DNA Sequencers
8.3 The Three-Step Method: Overlap, Layout, and Multiple Alignment 8.4 High-Throughput Genome Sequencing
8.4.1 Computational Tools
8.4.2 Genome-Sequencing Strategies
8.4.3 Whole-Genome Shotgun Sequencing of Eukaryotic Genomes
Chapter 9: Signals in DNA
9.1 The Biological Problem
9.1.1 How Are Binding Sites on DNA Identified Experimentally?
9.1.2 How Do Proteins Recognize DNA?
9.1.3 Identifying Signals in Nucleic Acid Sequences 9.2 Representing Signals in DNA: Independent Positions
9.2.1 Probabilistic Framework
9.2.2 Practical Issues 9.3 Representing Signals in DNA: Markov Chains
9.4 Entropy and Information Content
9.5 Signals in Eukaryotic Genes
9.6 Using Scores for Classification
9.1.1 How Are Binding Sites on DNA Identified Experimentally?
9.1.2 How Do Proteins Recognize DNA?
9.1.3 Identifying Signals in Nucleic Acid Sequences 9.2 Representing Signals in DNA: Independent Positions
9.2.1 Probabilistic Framework
9.2.2 Practical Issues 9.3 Representing Signals in DNA: Markov Chains
9.4 Entropy and Information Content
9.5 Signals in Eukaryotic Genes
9.6 Using Scores for Classification
Chapter 10: Similarity, Distance, and Clustering
10.1 The Biological Problem
10.2 Characters
10.3 Similarity and Distance
10.3.1 Dissimilarities and Distances Measured on Continuous Scales
10.3.2 Scaling Continuous Character Values 10.4 Clustering
10.4.1 Agglomerative Hierarchical Clustering
10.4.2 Interpretations and Limitations of Hierarchical Clustering 10.5 K-means
10.6 Classification
10.2 Characters
10.3 Similarity and Distance
10.3.1 Dissimilarities and Distances Measured on Continuous Scales
10.3.2 Scaling Continuous Character Values 10.4 Clustering
10.4.1 Agglomerative Hierarchical Clustering
10.4.2 Interpretations and Limitations of Hierarchical Clustering 10.5 K-means
10.6 Classification
Chapter 11: Measuring Expression of Genome Information
11.1 The Biological Problem
11.2 How Are Transcript Levels Measured?
11.3 Principles and Practice of Microarray Analysis
11.3.1 Basics of Nucleic Acids Used for Microarrays
11.3.2 Making and Using Spotted Microarrays 11.4 Analysis of Microarray Data
11.4.1 Normalization
11.4.2 Statistical Background
11.4.3 Experimental Design 11.5 Data Interpretation
11.5.1 Clustering of Microarray Expression Data
11.5.2 Principal Components Analysis
11.5.3 Confirmation of Results 11.6 Examples of Experimental Applications
11.6.1 Gene Expression in Human Fibroblasts
11.6.2 Gene Expression During Drosophila Development
11.6.3 Gene Expression in Diffuse Large B-cell Lymphomas
11.6.4 Analysis of the Yeast Transcriptome Using SAGE 11.7 Protein Expression
11.7.1 2DE/MALDI-MS
11.7.2 Protein Microarrays
11.8 The End of the Beginning
11.2 How Are Transcript Levels Measured?
11.3 Principles and Practice of Microarray Analysis
11.3.1 Basics of Nucleic Acids Used for Microarrays
11.3.2 Making and Using Spotted Microarrays 11.4 Analysis of Microarray Data
11.4.1 Normalization
11.4.2 Statistical Background
11.4.3 Experimental Design 11.5 Data Interpretation
11.5.1 Clustering of Microarray Expression Data
11.5.2 Principal Components Analysis
11.5.3 Confirmation of Results 11.6 Examples of Experimental Applications
11.6.1 Gene Expression in Human Fibroblasts
11.6.2 Gene Expression During Drosophila Development
11.6.3 Gene Expression in Diffuse Large B-cell Lymphomas
11.6.4 Analysis of the Yeast Transcriptome Using SAGE 11.7 Protein Expression
11.7.1 2DE/MALDI-MS
11.7.2 Protein Microarrays
11.8 The End of the Beginning
Chapter 12: Inferring the Past: Phylogenetic Trees
12.1 The Biological Problem
12.1.1 Example: Relationships Among HIV Strains
12.1.2 Example: Relationships Among Human Populations
12.1.3 Reading Trees 12.2 Tree Terminology
12.2.1 Conventions
12.2.2 Numbers of Trees 12.3 Parsimony and Distance Methods
12.3.1 Parsimony Methods
12.3.2 Distance Methods 12.4 Models for Mutations and Estimation of Distances
12.4.1 A Stochastic Model for Base Substitutions
12.4.2 Estimating Distances 12.5 Maximum Likelihood Methods
12.5.1 Representing a Tree
12.5.2 Computing Probabilities on a Tree
12.5.3 Maximum Likelihood Estimation
12.5.4 Statistics and Trees 12.6 Problems with Tree-Building
12.1.1 Example: Relationships Among HIV Strains
12.1.2 Example: Relationships Among Human Populations
12.1.3 Reading Trees 12.2 Tree Terminology
12.2.1 Conventions
12.2.2 Numbers of Trees 12.3 Parsimony and Distance Methods
12.3.1 Parsimony Methods
12.3.2 Distance Methods 12.4 Models for Mutations and Estimation of Distances
12.4.1 A Stochastic Model for Base Substitutions
12.4.2 Estimating Distances 12.5 Maximum Likelihood Methods
12.5.1 Representing a Tree
12.5.2 Computing Probabilities on a Tree
12.5.3 Maximum Likelihood Estimation
12.5.4 Statistics and Trees 12.6 Problems with Tree-Building
Chapter 13: Genetic Variation in Populations
13.1 The Biological Problem
13.2 Mendelian Concepts
13.3 Variation in Human Populations
13.3.1 Describing Variation Across Populations
13.3.2 Population Structure 13.4 Effects of Recombination
13.4.1 Relationship Between Recombination and Distance
13.4.2 Genetic Markers 13.5 Linkage Disequilibrium (LD)
13.5.1 Quantitative Description of LD
13.5.2 How Rapidly Does LD Decay?
13.5.3 Factors Affecting Linkage Disequilibrium 13.6 Linkage Disequilibrium in the Human Genome
13.7 Modeling Gene Frequencies in Populations
13.7.1 The Wright-Fisher Model
13.7.2 The Wright-Fisher Model as a Markov Chain
13.7.3 Including Mutation 13.8 Introduction to the Coalescent
13.8.1 Coalescence for Pairs of Genes
13.8.2 The Number of Differences Between Two DNA Sequences
13.8.3 Coalescence in larger samples
13.8.4 Estimating the Mutation Parameter 13.9 Concluding Comments
13.2 Mendelian Concepts
13.3 Variation in Human Populations
13.3.1 Describing Variation Across Populations
13.3.2 Population Structure 13.4 Effects of Recombination
13.4.1 Relationship Between Recombination and Distance
13.4.2 Genetic Markers 13.5 Linkage Disequilibrium (LD)
13.5.1 Quantitative Description of LD
13.5.2 How Rapidly Does LD Decay?
13.5.3 Factors Affecting Linkage Disequilibrium 13.6 Linkage Disequilibrium in the Human Genome
13.7 Modeling Gene Frequencies in Populations
13.7.1 The Wright-Fisher Model
13.7.2 The Wright-Fisher Model as a Markov Chain
13.7.3 Including Mutation 13.8 Introduction to the Coalescent
13.8.1 Coalescence for Pairs of Genes
13.8.2 The Number of Differences Between Two DNA Sequences
13.8.3 Coalescence in larger samples
13.8.4 Estimating the Mutation Parameter 13.9 Concluding Comments
Chapter 14: Comparative Genomics
14.1 Compositional Measures
14.2 Transposable Elements
14.3 Sequence Organization within Chromosomes
14.3.1 Conservation of Synteny and Segmental Duplication
14.3.2 Identifying Conserved Segments and Segmental Duplications
14.3.3 Genome Evolution by Whole-Genome Duplication 14.4 Gene Content
14.4.1 Gene Prediction from Local Sequence Context
14.4.2 Exon and Intron Statistics
14.4.3 Comparative Methods for Identifying Genes
14.4.4 Gene Numbers 14.5 Predicted Proteome
14.5.1 Assigning Gene Function by Orthology
14.5.2 Assigning Gene Function by Patterns of Occurrence
14.5.3 Gene Content Within and Between Organisms 14.6 New Biological Perspectives from Genomics
14.2 Transposable Elements
14.3 Sequence Organization within Chromosomes
14.3.1 Conservation of Synteny and Segmental Duplication
14.3.2 Identifying Conserved Segments and Segmental Duplications
14.3.3 Genome Evolution by Whole-Genome Duplication 14.4 Gene Content
14.4.1 Gene Prediction from Local Sequence Context
14.4.2 Exon and Intron Statistics
14.4.3 Comparative Methods for Identifying Genes
14.4.4 Gene Numbers 14.5 Predicted Proteome
14.5.1 Assigning Gene Function by Orthology
14.5.2 Assigning Gene Function by Patterns of Occurrence
14.5.3 Gene Content Within and Between Organisms 14.6 New Biological Perspectives from Genomics