Get the latest tech news

Data Science at the Command Line, 2nd Edition (2021)


This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools—useful whether you work with Windows, macOS, or Linux.

Traditional computer and data science curricula all too often mistake the command line as an obsolete relic instead of teaching it as the modern and vital toolset that it is. Only well into my career did I come to grasp the elegance and power of the command line for easily exploring messy datasets and even creating reproducible data pipelines for work. Dan Nguyen, data scientist, former news application developer at ProPublica, and former Lorry I. Lokey Visiting Professor in Professional Journalism at Stanford University

Get the Android app

Or read this on Hacker News

Read more on:

Photo of command line

command line

Photo of data science

data science

Photo of 2nd edition

2nd edition

Related news:

News photo

The Nature of Code (2nd Edition)

News photo

Angle-grinder: Slice and dice logs on the command line

News photo

Francine Bennett uses data science to make AI more responsible