This textbook aims to be an approachable introduction to the world of data science. In this book, we define data science as the process of generating insight from data through reproducible and auditable processes. If you analyze some data and give your analysis to a friend or colleague, they should be able to re-run the analysis from start to finish and get the same result you did (reproducibility). They should also be able to see and understand all the steps in the analysis, as well as the history of how the analysis developed (auditability). Creating reproducible and auditable analyses allows both you and others to easily double-check and validate your work.
At a high level, in this book, you will learn how to identify common problems in data science, and how to solve those problems with reproducible and auditable workflows.