3.1. Jupyter Notebook Tips and Tricks#

3.1.1. Identify your bottleneck with line_profiler#

Want to identify the bottleneck in your Python code?

Try the module line_profiler for Python.

With line_profiler, you will get a line-by-line profiling of your functions.

So you can exactly see the execution time for every line.

Below you can see how to use line_profiler within a Jupyter Notebook.

  • Use the %load_ext magic command to load the line_profiler extension.

  • Use the %lprun magic command to profile a specific cell or function in the notebook.

!pip install line_profiler
%load_ext line_profiler

def my_function(x):
    for x in range(1, 10000):
      x = x**2
      x = x / 400
    y = x + x
    return y
    
%lprun -u 1e-3 -f my_function my_function(10)
'''
Timer unit: 0.001 s

Total time: 0.0160793 s
File: <ipython-input-18-790da5f104f0>
Function: my_function at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def my_function(x):
     2      9999          2.6      0.0     16.3      for x in range(1, 10000):
     3      9999          6.1      0.0     37.7        x = x**2
     4      9999          7.4      0.0     46.1        x = x / 400
     5         1          0.0      0.0      0.0      y = x + x
     6         1          0.0      0.0      0.0      return y
'''

3.1.2. Render Live loss of Deep Learning Models in Jupyter Notebooks#

Plot your live training loss in Jupyter Notebooks with livelossplot.

livelossplot lets you track your model’s training process in real time, only adding one callback.

A nice alternative to TensorBoard, if you want to train a small model and visualize its progress quickly.

!pip install livelossplot
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Flatten, Dense, Activation

from livelossplot import PlotLossesKeras

(X_train, y_train), (X_test, y_test) = mnist.load_data()

Y_train = to_categorical(y_train)
Y_test = to_categorical(y_test)
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.

model = Sequential()

model.add(Flatten(input_shape=(28, 28, 1)))
model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

plotlosses = PlotLossesKeras()

model.fit(X_train, Y_train,
          epochs=12,
          validation_data=(X_test, Y_test),
          callbacks=[plotlosses],
          verbose=False)

3.1.3. Generate LaTeX Expressions from Python Code#

With latexify, you can compile Python source code to a beautiful LaTeX expression.

In a quick and easy way!

Useful, when you don’t want to write the LaTeX expression by yourself.

!pip install latexify-py
import latexify
import math
@latexify.function
def solve(a, b, c):
    return (-b + math.sqrt(b**2 - 4*a*c)) / (2*a)

print(solve)
solve

3.1.4. Display Scikit-Learn Pipelines as HTML#

You can display an interactive HTML diagram of scikit-learn pipelines.

Just set the config to diagram (you can still switch back to text).

import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, RobustScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn import set_config

numeric_preprocessor = Pipeline(
    steps=[
        ("imputation_mean", SimpleImputer(missing_values=np.nan, strategy="mean")),
        ("scaling", RobustScaler()),
    ]
)

categorical_preprocessor = Pipeline(
    steps=[
        (
            "imputation_constant",
            SimpleImputer(fill_value="missing", strategy="constant"),
        ),
        ("one_hot", OneHotEncoder(handle_unknown="ignore")),
    ]
)

preprocessor = ColumnTransformer(
    [
        ("categorical", categorical_preprocessor, ["state", "gender"]),
        ("numerical", numeric_preprocessor, ["age", "weight"]),
    ]
)

pipe = make_pipeline(preprocessor, RandomForestClassifier())
set_config(display="diagram")
pipe

3.1.5. Autoreload your Modules in Jupyter Notebook#

When you work on a new Python module and test it in a Jupyter notebook

You have to reload the module when you change it.

autoreload makes sure all modules will be reimported.

Just by adding two lines.

%load_ext autoreload
%autoreload 2

from my_module import my_function1, my_function2
my_function1()

my_function2()

3.1.6. Apply Code Quality Tools to Jupyter Notebooks#

Do you look for Code quality within Jupyter Notebooks?

One limitation of widespread tools like black, flake8 or isort is the incompatibility with notebooks.

With nbqa, you can effortlessly apply code quality tools to notebooks.

See below how we can apply black and isort on notebooks.

Say hello to clean(er) notebooks.

!pip install "nbqa[toolchain]"
!nbqa black my_notebook.ipynb
!nbqa isort my_notebook.ipynb --float-to-top

3.1.7. Create and Reuse Juptyer Notebook Templates#

If you are a heavy JupyterLab user,

do you create a new notebook from scratch every time?

With jupytertemplate, you don’t need to.

jupytertemplate lets you create Notebook templates you can reuse every time.

Do you have the same structure for every EDA? Create a template and reuse it.

No need to create a notebook and structure it manually.

!jupyter labextension install jupyterlab_templates
!jupyter server extension enable --py jupyterlab_templates
# Add the following to jupyter_notebook_config.py

c.JupyterLabTemplates.allowed_extensions = ["*.ipynb"]
c.JupyterLabTemplates.template_dirs = ['list', 'of', 'template', 'directories']
c.JupyterLabTemplates.include_default = True
c.JupyterLabTemplates.include_core_paths = True

3.1.8. Remove Output Cells Automatically with nbstripout#

Are you tracking outputs of Jupyter Notebooks in Git?

Stop that.

Output cells in Notebooks can contain large amounts of data, such as the results of computations or visualizations.

With nbstripout, you can strip them out.

It helps you to reduce the size of committed changes and the risk of pushing sensitive data.

!pip install nbstripout
!nbstripout FILE.ipynb

3.1.9. Bring LLMs Into Your Notebook with jupyter-ai#

Bring GenAI into your Jupyter Notebooks with jupyter-ai

jupyter-ai lets you use LLMs from vendors like OpenAI, Huggingface and Anthropic within your Notebook cells.

You can just ask for a code snippet and the result will be rendered into your Notebook.

%env PROVIDER_API_KEY=YOUR_API_KEY_HERE
!pip install jupyter_ai
%load_ext jupyter_ai
%%ai chatgpt
Provide a hello world function in Python

3.1.10. Modern Alternative to Jupyter Notebook#

Forget Jupyter Notebooks.

Marimo Notebook is the future.

Marimo Notebooks are a git-friendly, reactive and interactive alternative to Jupyter Notebooks by providing the following features:

  • Automatically re-running affected cells when changing something

  • Notebooks are executed in a deterministic order, with no hidden state

  • Easily deployable

  • Interactive elements

Just give it a try, it’s open-source too!

!pip install marimo
!marimo edit