Insights
The Notebooks era — How notebooks are changing the way we develop code​
7 min read
By Julien Kervizic
The past, present, & future of notebooks

The age of notebooks

Increased options

With Machine Learning becoming more mainstream, there is keen interest for notebooks. Beside the open sourced, Jupyter and Zeppelin notebooks, cloud providers have released their own version, Amazon has released EMR Notebooks, Google Colab and Microsoft Azure notebooks. More traditional software vendors such as Oracle have jumped into the fray, and so have some startups such as dataiku

Jupyter also created two offshoot projects, JupyterHub and JupyterLab which provide additional functionalities compared to the core

JupyterHub is a multi users distribution of Jupyter, allowing user authentication and authorization and making it possible to host notebooks on a server rather than a local machine.

JupyterLab is an evolution of Jupyter Notebook that is moving towards an IDE like environment. Using JupyterLab it is possible to edit multiple files at once, have a view on the different files within a folder…

An extensible ecosystem

Alongside this flurry of solution, also come with enhanced support and extension to some of these notebooks. Jupyter Notebooks are now supported as part of VS Code, extensions such as Papermill allow for the parametrization of notebooks, scrapbook a library for managing Jupyter’s notebook outputs, reviewNB a code revision systems dedicated to notebooks or the binder project which aims the setup your notebooks within reproducible environments.

These extension are not counting the numerous widgets included as part of other libraries that make use of Notebooks’ rich output and interactiveness.

Success stories

Notebooks have also benefited from quite a few success stories from companies such as Netflix, which have written extensively about their use of Notebooks and how they are used for production use cases.

Pros and cons to Leveraging Notebooks

Notebooks make for quick prototype and rich experiences

Notebooks bring the interactiveness of a REPL with an UI for added flexibility and better code editing. This makes for quick feedback during development and allow to develop code at a faster speed than if it had to be run as one monolithic piece of code.

The way Notebook setup code in blocks, which can be run independently, makes it so that the code is not necessarily executed in a linear fashion, this scan make it easier to experiment and trial certain operations without having to re-run the full workflow

Notebooks also provide a great to share code in context, including specific annotations that make the code and analysis more easily interpretable, or by providing rich outputs, which makes it particularly well suited for analysis tasks that requires both numerical and visual outputs. Extra widgets exists to allow to further deep dive quickly into certain areas of the data

This richness can be further enriched by widget components such as an entry form widgets, that makes it possible to create simple applications.

However they can raise issues when attempting to set it up for productions

There can be a number of issues that can arise when trying to leverage notebooks for production use rather than an analytical use case.

Interactiveness of notebooks makes for quick prototype, often without the care that is being put into a production code. Code that is made for quick prototyping is often not setup using the same abstraction that would be used in a normal development flow.

It is also easy to get tangled into issues by running code out of sequences and having variables still defined within the kernel. Developing, using code out of sequence, can yield to erroneous or non-reproducible results. There needs to be the care and diligence of re-running the notebook once the flow has been fully setup.

Furthermore the notebook environment isn’t well suited for dealing with larger code base, full notebook usually needing to be re-run based on changes in external files rather than having only the cell dependents on the changes update, need to have to use an editor and the notebook.

Notebooks also don’t have the same level support for code review, that you can find with normal .py files, nor an integrated way to manage the code versioning workflow within the code editor or notebook application.

Notebooks benefits from a rich and diverse ecosystem

Jupyter notebook and to a lesser extent Zeppelin benefit from a varied ecosystem of extensions both official and unofficial. These extensions enhance certain functionalities, making the notebook more interactive or making it easier to push notebooks to production. Beside these extensions, there are also a number of components and tools that extend Notebooks’ functionality.

NbExtensions

NbExtensions provides a collection of unofficial extensions for use with Jupyter Notebook. Some of the extensions .provided, allow the use of Latex Cell, push to github gist, automatic code formatting …

NbExtensions offers a configuration UI, that let you easily enable or disable specific extensions. Some of the features of the extensions contained are described in this article.

Magic command

Jupyter/iPython have support for “magic commands”, which extend the range of functionality of the notebook beyond that of the interpreter. There are built in magic commands such as %%timeit which will output the time it takes to execute a particular cell.

Jupyter notebook offers for the creation of custom magic commands beyond those provided by default. Cython for instance offers a magic command %%cython -a to trace calls made in Python vs. C.

Papermill

Papermill is an extension to Jupyter that allows to parametrize notebooks, allowing the code within the notebook to be re-run with different parameters in a systematic way. This setup makes it possible to push the notebook code to production without having to export all of it to python file, and allows to be able to keep the richness of output of the notebooks.

There are a couple of ways to for Papermill to interact with notebooks, either through Python code, or using its’ CLI. There is also a specific Papermill operator within Airflow exists allowing to break some of the gap to set up notebooks in production.

Binder

Binder is a tool, leveraging docker, that allows to create reproducible environments for notebooks. It supports Jupyter notebook, Jupyterlab as well as a few other notebooks or notebook-like interfaces.

Widgets

Widgets provide a way to add interactive functionalities to notebooks, there is a set of default widgets provided by the ipywidgets library, some python library such as Bokeh integrate some of these functionalities.

These widget let you create, small interactive applications within your notebook, that can let you explore some particular aspect of a dataset without having to code for it.

Review Nb

Reviewnb is a paid application that allows to perform code reviews on Jupyter Notebooks. It integrate with Github and allows for the creation of conversation thread on individual notebook cells.

nbgitpuller

nbgitpuller is an utility that lets distribute the content of a git repository without having to understand git itself. It provides an automatic merging behavior that retains the changes made locally.

Commuter

Commuter provides a way to explore both local and remote directory and allows Jupyter to read the content of these notebooks.

Knowledge Repo

Airbnb’s Knowledge repo is a tool that is aimed to facilitate the sharing notebooks and information within an organization. It let you browse and read a series of notebooks that have been published within an organization, search for keywords or filter by tags.

Nbconvert

Nbconvert is a utility that allows to convert a notebook to a different file format, such as a PDF. Nbconvert has for instance, been used alongside with papermill for report generation.

Summary

Notebooks are very powerful tool for analysis and prototyping, they benefit from large ecosystem of plugins and tools that enhance their functionality. For certain types of operations such as working on larger codebase, they are not perfectly suited, but we are starting to see some evolutions and tools that palliate some of these issues.

Privacy Policy
Sitemap
Cookie Preferences
© 2024 WiseAnalytics