Santiago Valdarrama’s Post

View profile for Santiago Valdarrama, graphic

Computer scientist and writer. I teach hard-core Machine Learning at ml.school.

On why you should use Jupyter Notebooks: "Notebooks are a great way to tell a story, and telling stories is what all fields should be about. Especially computer science." This is from an interview with Doug Blank, Head of Research at @cometml. He continues: "To me, a Jupyter notebook is a blank sheet of paper. You can write a story in it. And if you change the name of one of the characters in paragraph one, you have to change the name of the character throughout the whole story." People often tell me they don't like notebooks because people write bad code in them. Nonsense. Bad developers write bad code. Notebooks have nothing to do with that. "Some educators feel that it's too open-ended and too flexible, but I disagree — it's a new way of doing computing (...)." I always recommend that developers learn how to use notebooks. Not as their primary way of writing software but as an alternative tool they can use to experiment, troubleshoot, and become more effective at their work. The math in my head is simple: A developer who knows how to use notebooks effectively is better than a developer who doesn't. Notebooks aren't a replacement for what you do. They are a boost. The rest of the interview is pretty good: https://lnkd.in/e7ezJsx7

Interview with Doug Blank, Head of Research at Comet | ML Contests

Interview with Doug Blank, Head of Research at Comet | ML Contests

mlcontests.com

James Bullock

Sr. Machine Learning Engineer

5mo

My flow: stage 3 folders on root dev, test, and prod. Each has a .yml, .env, and requirements.txt with a .venv and then your ipynbs. Use the main.ipynb as a SOP of sorts. Document your project. Put html div links for the table contents. I will often have DEV and PROD Bools so I can easily toggle back and troubleshoot specific blocks. Once I’m good, I convert the ipynb to a .py and move to test. Copy over the other files. New venv. Reduce the bloat from the Jupyter kernel. Add async. Unit test here. Finally, move to prod. New .venv. Reduce more bloat from requirements.txt. Lean. Clean up code. Make it bullet proof. Use dev containers to test running lean in a Linux container or whatever. Now… you have the it all. Documentation with dev, unit testing and features with .py that tricky with ipynb, and then your final. Parallelize and async. Then containerize or at least build some scripts to run your environment and code with an agent. Ignore your .env in ignores and only push those in prod at container build time through arguments. This way - whichever way the project goes - you’re ready. And you go from a 1-2 GB project to a 200-300 gb lean quick Linux deployable but still have the others for reference/docs.

With a few best practices, it is easy to write good notebooks.

Owen Price

Data & Analytics, Microsoft MVP

5mo

Notebooks can be incredibly liberating - after all, what are they other than a pre-built framework for exploration? You don't have to worry about how charts are rendered (mostly). You don't have to worry about proper annotation (hi, Markdown). You can actually work on what's important - the data. And when you're comfortable with the data and the workflow, you migrate that code into a more robust set of objects or scripts. That extra time spent migrating is more than made up for by the time saved when using the workbook. The important thing is that you accept that there will be migration later, and structure your notebook accordingly.

That’s right!! If you dislike people who use notebooks to write bad code, then you don’t dislike notebooks… you dislike those people.

Like
Reply
Felipe Aguirre Martinez, PhD

Co-Founder & Chief Data Scientist @ Prediktia | Entrepreneur | Mentor | Community leader | AI Tinkerers

5mo

Santiago Valdarrama have you tried marimo? It takes the notebook experience to a new level. I still think Jupyter is better when you are exploring initial ideas. It has less constraints

Like
Reply
Rivka Boord

Data Analytics | R | SQL | Excel | Tableau | Sports Analytics

5mo

I find Jupyter notebooks annoying when it comes to running multiple lines of code at the same time. They're great for learning, though.

Like
Reply
Adam McKinnon, PhD.

Head of People Platforms and Analytics @ Reece Group | People Analytics | HR Tech | Board Director

5mo

Engaging analogy. How do you integrate Jupyter Notebooks effectively in your workflow? 🚀

Like
Reply
Ilia Karelin

Data Scientist/Data Engineer | Author of “Prosper” | Building prosperinoss.com

5mo

Agreed Santiago!

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics