Contents

Prefaceix
0Introduction1
0.1Molecular Biology3
0.2Mathematics, Statistics, and Computer Science3
1Some Molecular Biology5
1.1DNA and Proteins6
1.1.1The Double Helix6
1.2The Central Dogma7
1.3The Genetic Code8
1.4Transfer RNA and Protein Sequences12
1.5Genes Are Not Simple16
1.5.1Starting and Stopping16
1.5.2Control of Gene Expression16
1.5.3Split Genes17
1.5.4Jumping Genes18
1.6Biological Chemistry18
2Restriction Maps29
2.1Introduction29
2.2Graphs31
2.3Interval Graphs33
2.4Measuring Fragment Sizes38
3Multiple Maps41
3.1Double Digest Problem42
3.1.1Multiple Solutions in the Double Digest Problem43
3.2Classifying Multiple Solutions48
3.2.1Reflections48
3.2.2Overlap Equivalence48
3.2.3Overlap Size Equivalence51
3.2.4More Graph Theory52
3.2.5From One Path to Another53
3.2.6Restriction Maps and the Border Block Graph56
3.2.7Cassette Transformations of Restriction Maps58
3.2.8An Example61
4Algorithms for DDP65
4.lAlgorithms and Complexity65
4.2DDP is NP-Complete67
4.3Approaches to DDP68
4.3.1Integer Programming68
4.3.2Partition Problems69
4.3.3TSP70
4.4Simulated Annealing: TSP and DDP70
4.4.1Simulated Annealing70
4.4.2Traveling Salesman Problem75
4.4.3DDP76
4.4.4Circular Maps78
4.5Mapping with Real Data79
4.5.1Fitting Data to a Map80
4.5.2Map Algorithms81
5Cloning and Clone Libraries83
5.1A Finite Number of Random Clones85
5.2Libraries by Complete Digestion85
5.3Libraries by Partial Digestion87
5.3.1The Fraction of Clonable Bases88
5.3.2Sampling, Approach 191
5.3.3Designing Partial Digest Libraries92
5.4Genomes per Microgram98
6Physical Genome Maps: Oceans, Islands, and Anchors101
6.1Mapping by Fingerprinting102
6.1.1Oceans and Islands102
6.1.2Divide and Conquer110
6.1.3Two Pioneering Experiments111
6.1.4Evaluating a Fingerprinting Scheme114
6.2Mapping by Anchoring119
6.2.1Oceans, Islands and Anchors119
6.2.2Duality Between Clones and Anchors126
6.3An Overview of Clone Overlap127
6.4Putting It Together129
7Sequence Assembly135
7.1Shotgun Sequencing135
7.1.1SSP is NP-complete137
7.1.2Greedy Is At Most Four Times Optimal138
7.1.3Assembly in Practice143
7.1.4Sequence Accuracy145
7.1.5Expected Progress147
7.2Sequencing by Hybridization148
7.2.1Other SBH Designs154
7.3Shotgun Sequencing Revisited156
8Databases and Rapid Sequence Analysis161
8.1DNA and Protein Sequence Databases162
8.1.1Description of the Entries in a Sequence Data File163
8.1.2Sample Sequence Data File164
8.1.3Statistical Summary166
8.2A Tree Representation of a Sequence167
8.3Hashing a Sequence168
8.3.1A Hash Table169
8.3.2Hashing in Linear Time170
8.3.3Hashing and Chaining170
8.4Repeats in a Sequence171
8.5Sequence Comparison by Hashing172
8.6Sequence Comparison with at most l Mismatches176
8.7Sequence Comparison by Statistical Content180
9Dynamic Programming Alignment of l'wo Sequences183
9.1The Number of Alignments186
9.2Shortest and Longest Paths in a Network .190
9.3Global Distance Alignment192
9.3.1Indel Functions194
9.3.2Position-Dependent Weights197
9.4Global Similarity Alignment198
9.5Fitting One Sequence into Another201
9.6Local Alignment and Clumps202
9.6.1Self-Comparison206
9.6.2Tandem Repeats207
9.7Linear Space Algorithms209
9.8Tracebacks212
9.9Inversions215
9.1Map Alignment219
9.11Parametric Sequence Comparisons223
9.11.1One-dimension Parameter Sets225
9.11.2Into Two-Dimensions228
10Multiple Sequence Alignment233
10.1The Cystic Fibrosis Gene233
10.2Dynamic Programming in r-Dimensions236
10.2.1Reducing the Volume237
10.3Weighted-Average Sequences238
10.3.1Aligning Alignments242
10.3.2Center of Gravity Sequences242
10.4Profile Analysis242
10.4.1Statistical Significance244
10.5Alignment by Hidden Markov Models245
10.6Consensus Word Analysis248
10.6.1Analysis by Words249
10.6.2Consensus Alignment250
10.6.3More Complex Scoring251
11Probability and Statistics for Sequence Alignment253
11.1Global Alignment254
11.1.1Alignment Given254
11.1.2Alignment Unknown255
11.1.3Linear Growth of Alignment Score256
11.1.4The Azuma-Hoeffding Lemma257
11.1.5Large Deviations from the Mean259
11.1.6Large Deviations for Binomials261
11.2Local Alignment263
11.2.1Laws of Large Numbers263
11.3Extreme Value Distributions275
11.4The Chein-Stein Method278
11.5Poisson Approximation and Long Matches280
11.5.1Headruns280
11.5.2Exact Matching Between Sequences282
11.5.3Approximate Matching288
11.6Sequence Alignment with Scores294
11.6.1A Phase Transition294
11.6.2Practical p-Values299
12Probability and Statistics for Sequence Patterns305
12.1A Central Limit Theorem307
12.1.1Generalized Words .313
12.1.2Estimating Probabilities313
12.2Nonoverlapping Pattern Counts314
12.2.1Renewal Theory for One Pattern314
12.2.2Li's Method and Multiple Patterns318
12.3Poisson Approximation321
12.4Site Distributions323
12.4.1Intersite Distances324
13RNA Secondary Structure327
13.1Combinatorics327
13.1.1Counting More Shapes332
13.2Minimum Free-energy Structures334
13.2.1Reduction of Computation Time for Hairpins336
13.2.2Linear Destabilization Functions338
13.2.3Multibranch Loops339
13.3Consensus folding340
14Trees and Sequences345
14.1Trees345
14.1.1Splits347
14.1.2Metrics on Trees351
14.2Distance353
14.2.1AdditiveTrees353
14.2.2Ultrametric Trees357
14.2.3Nonadditive Distances359
14.3Parsimony361
14.4Maximum Likelihood Trees367
14.4.1Continuous Time Markov Chains367
14.4.2Estimating the Rate of Change369
14.4.3Likelihood and Trees372
15Sources and Perspectives377
15.1Molecular Biology377
15.2Physical Maps and Clone Libraries377
15.3Sequence Assembly379
15.4Sequence Comparisons379
15.4.1Databases and Rapid Sequence Analysis379
15.4.2Dynamic Programming for Two Sequences380
15.4.3Multiple Sequence Alignment382
15.5Probability and Statistics382
15.5.1Sequence Alignment382
15.5.2Sequence Patterns383
15.6RNA Secondary Structure384
15.7Trees and Sequences385
References387
IProblem Solutions and Hints401
IIMathematical Notation421
Algorithm Index423
Author Index425
Subject Index428

Previous Level

USC Computational Biology Home Page
http://www-hto.usc.edu/books/msw/msg/toc.html, webmaster@hto.usc.edu, 30 August 1996