Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide.

Click to enlarge

Print Price: $88.00

Format:
Hardback
176 pp.
156 mm x 234 mm

ISBN-13:
9780192867735

Publication date:
July 2023

Imprint: OUP UK

Essential Statistics for Data Science

A Concise Crash Course

Mu Zhu

Essential Statistics for Data Science: A Concise Crash Course is for students entering a serious graduate program or advanced undergraduate teaching in data science without knowing enough statistics. The three part text introduces readers to the basics of probability and random variables and guides them towards relatively advanced topics in both frequentist and Bayesian in a matter of weeks.

Part I, Talking Probability explains the statistical approach to analysing data with a probability model to describe the data generating process. Part II, Doing Statistics demonstrates how the unknown quantities in data i.e. it's parameters is applicable in statistical interference. Part III, Facing Uncertainty explains the importance of explicity describing how much uncertainty is caused by parameters with intrinsic scientific meaning and how to take that into account when making decisions.

Essential Statistics for Data Science: A Concise Crash Course provides an in-depth introduction for beginners, while being more focused than a typical undergraduate text, but still lighter and more accessible than an average graduate text.

Readership : Advanced undergraduate and graduate students in quantitative disciplines; software engineers; and various information technology (IT) professionals.

Prologue
I Talking Probability
1. 1 The Eminence of Models
1.A For brave eyes only
2. Building Vocabulary
2.1. Probability
2.1.1 Basic rules
2.2. Conditional probability
2.2.1 Independence
2.2.2 Law of total probability
2.2.3 Bayes law
2.3. Random variables
2.3.1 Summation and integration
2.3.2 Expectations and variances
2.3.3 Two simple distributions
2.4. The bell curve
3. Gaining Fluency
3.1. Multiple random quantities
3.1.1 Higher-dimensional problems
3.2. Two <"hard>" problems
3.2.1 Functions of random variables
3.2.2 Compound distributions
3A. Sums of independent random variables
3.A.1 Convolutions
3.A.2 Moment generating functions
3.A.3 Formulae for expectations and variances
II Doing Statistics
4. An Overview of Statistics
4.1. Frequentist approach
4.1.1 Functions of random variables
4.2. Bayesian approach
4.2.1 Compound distributions
4.3. Two more distributions
4.3.1 Poisson distribution
4.3.2 Gamma distribution
4.A. Expectation and variance of the Poisson
4.B. Waiting time in Poisson process
5. The Frequentist Approach
5.1. Maximum likelihood estimation
5.1.1 Random variables that are i.i.d.
5.1.2 Problems with covariates
5.2 Statistical properties of estimators
5.3 Some advanced techniques
5.3.1 EM algorithm
5.3.2 Latent variables
5.A. Finite mixture models
6. The Bayesian Approach
6.1. Basics
6.2. Empirical Bayes
6.3. Hierarchical Bayes
6.A. General sampling algorithms
6.A.1 Metropolis algorithm
6.A.2 Some theory
6.A.3 Metropolis-Hastings algorithm
III Facing Uncertainty
7. Interval Estimation
7.1. Uncertainty quantification
7.1.1 Bayesian version
7.1.2 Frequentist version
7.2. Main difficulty
7.3. Two useful methods
7.3.1 Likelihood ratio
7.3.2 Bootstrap
8. Tests of Significance
8.1. Basics
8.1.1 Relation to interval estimation
8.1.2 The p-value
8.2. Some challenges
8.2.1 Multiple testing
8.2.2 Six degrees of separation
8.A. Intuition of Benjamini-Hockberg
IV Appendices
A. Some Further Topics
A.1 Graphical models
A.2 Regression models
A.3 Data collection
Epilogue
Bibliography
Index

There are no Instructor/Student Resources available at this time.

Mu Zhu is Professor in the Department of Statistics & Actuarial Science at the University of Waterloo, and Fellow of the American Statistical Association. He received his AB magna cum laude in applied mathematics from Harvard University, and his PhD in statistics from Stanford University. He is currently Director of the Graduate Data Science Program at Waterloo.

Making Sense - Margot Northey

Special Features

A very short (but serious) course that can be taught in just a few weeks
Still goes from "the very beginning" [e.g., P(not A)=1-P(A)] to relatively advanced materials (e.g., EM algorithm, Gibbs sampler) despite its lightning pace and introductory nature
Treats Bayesian inference right after frequentist point estimation, but before topics such as interval estimation and significance testing, unlike most other texts