Data Science in production: Crossing the chasm

“History doesn’t repeat itself but it often rhymes“

The debate about "notebooks in production" might seem new, but in fact it has an old precursor: spreadsheets. Both Excel and Jupyter democratized computing in a way that has been absolutely transformative, while at the same time created some interesting new problems. As a result, "notebooks are bad" has become a running gag among data practitioners, but invalidating complete categories of tools does not help empower data scientists nor overcome their limitations.

In this talk we will seek to understand the reason why notebooks (and spreadsheets) are still popular despite their shortcomings, we will surface some actions we can take to improve the situation, and will offer some ideas on how we can help bridge the gap between the iterative, experimental nature of data science with the need of structure and predictability of production systems.

Juan Luis (he/him/él) is an Aerospace Engineer with a passion for STEM, programming, outreach, and sustainability. He has a decade of experience as developer advocate, software engineer, and Python trainer in several industries, and currently he works as Principal Product Manager for Kedro, an open source Python framework for data science, at QuantumBlack, AI by McKinsey.

He has made significant contributions to the PyData stack and published several open-source packages, the most important one being poliastro, an open-source Python library for orbital mechanics used at space agencies, satellite companies, and universities.

After founding the Python España non-profit and co-organizing the first seven PyCons in Spain, he became a Python Software Foundation Fellow in 2017. Nowadays he is the lead organizer of the PyData Madrid monthly meetups.

