Introduction to Jupyter Notebook (Part 1)#

gfbio Winterschool 2022, Sophie Wolf

Jupyter refers to Julia, Python and R

Jupyter Notebook (formerly IPython Notebooks) is a web-based open-source interactive computational environment.


A notbook contains a list of input and output cells, which can contain:

# code
1+1
2

Text formatted with Markdown

import matplotlib.pyplot as plt

plt.plot(range(10))
plt.title('Display plot in notebook')
plt.show()
_images/1.1_Jupyter_Introduction_5_0.png

Creating a new notebook#

  • Kernels

We will be using Python 3 and R kernels.

First orientation#

  • user interface tour

  • command mode and edit mode

  • keyboard shortcuts

print("Winterschool 2022")
Winterschool 2022
title = "Winterschool 2022"
title
'Winterschool 2022'
"Winterschool 2022"
'Winterschool 2022'

Supress output using ;

"Winterschool 2023";

🤖 Try it!#

Take a few minutes to explore the keyboard shortcuts.


Ordering of executions#

Be mindful: You can execute Jupyter cells in any order. The execution order is displayed as numbers to the left of each code cell.

species = "Nepeta cataria"
species
'Theobroma cacao'
species = "Theobroma cacao"

A few remarks on Markdown#

Markdown is a simple markup language for creating formatted text using a plain-text editor.


Some functions#

  • italics

  • bold

  • code block here

You can create things like this nice table:

Some traits in TRY database

ID

Unit

Leaf area (in case of compound leaves: leaflet, undefined if petiole is in- or excluded)

3113

mm2

Leaf area per leaf dry mass (specific leaf area, SLA or 1/LMA): undefined if petiole is in- or excluded)

3117

m2/kg

Stem specific density (SSD) or wood density (stem dry mass per stem fresh volume)

4

g/cm3

Leaf carbon (C) content per leaf dry mass

13

mg/g

Leaf nitrogen (N) content per leaf dry mass

14

mg/g

Leaf phosphorus (P) content per leaf dry mass

15

mg/g

Plant height vegetative

3106

m

Or insert an image from your local machine or url.

iNaturalist observation: Fouquieria splendens

This image is a citizen science observation from the project iNaturalist.

Line magic and cell magic#

Jupyter has a whole library of so-called line or cell magic. These allow you to switch to other programming languages within on single Jupyter notebook.

bash command#

To access the terminal simply use an ! before your bash command.

!pwd
/Users/sophiewolf/Documents/GitHub/Jupyter_Workshop_Winterschool_2022

For example, you could use it to install a new package:

!pip install tqdm

There are many in-built options for line and cell magics. They always start with:

  • % for line magic

  • %% for cell magic

To list all the built-in available magic commands type the following:

%lsmagic
Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

Line magic refers only to the one line. So all code before and after this line within the same cell will be interpreted as the kernel language, in our case Python 3. There are many bash line magic commands, such as %ls or %pwd.

%ls

print("Hi! I'm Python code.")
!echo "And I'm bash script."
Data/
Figures/
README.md
Winterschool_2022_Additional_Materials.ipynb
Winterschool_2022_Jupyter_Data_Analysis.ipynb
Winterschool_2022_Jupyter_Introduction.ipynb
Winterschool_2022_R_Kernel.ipynb
requirements.txt
Hi! I'm Python code.
And I'm bash script.

Cell magic applies to the whole cell, so all code will be interpreted in reference to the magic command. As a result, the following command will generate an error. Cell magic must always be placed at the top of the cell.

%%bash

ls
echo "Hi! I'm bash script."
print("And I'm Python code.")
Data
Figures
README.md
Winterschool_2022_Additional_Materials.ipynb
Winterschool_2022_Jupyter_Data_Analysis.ipynb
Winterschool_2022_Jupyter_Introduction.ipynb
Winterschool_2022_R_Kernel.ipynb
requirements.txt
Hi! I'm bash script.
bash: line 4: syntax error near unexpected token `"And I'm Python code."'
bash: line 4: `print("And I'm Python code.")'
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In [15], line 1
----> 1 get_ipython().run_cell_magic('bash', '', '\nls\necho "Hi! I\'m bash script."\nprint("And I\'m Python code.")\n')

File ~/miniforge3/envs/winterschool/lib/python3.8/site-packages/IPython/core/interactiveshell.py:2417, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2415 with self.builtin_trap:
   2416     args = (magic_arg_s, cell)
-> 2417     result = fn(*args, **kwargs)
   2418 return result

File ~/miniforge3/envs/winterschool/lib/python3.8/site-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    151 else:
    152     line = script
--> 153 return self.shebang(line, cell)

File ~/miniforge3/envs/winterschool/lib/python3.8/site-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell)
    300 if args.raise_error and p.returncode != 0:
    301     # If we get here and p.returncode is still None, we must have
    302     # killed it but not yet seen its return code. We don't wait for it,
    303     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    304     rc = p.returncode or -9
--> 305     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'\nls\necho "Hi! I\'m bash script."\nprint("And I\'m Python code.")\n'' returned non-zero exit status 2.

Since Jupyter cells can be executed in any order, you might need to check your notebooks variables. The following commands can be very useful:

# current variable names

%who
plt	 species	 title	 
# current variables, incl. type and data

%whos
Variable   Type      Data/Info
------------------------------
plt        module    <module 'matplotlib.pyplo<...>es/matplotlib/pyplot.py'>
species    str       Theobroma cacao
title      str       Winterschool 2022

Remove a specific variable from environment:

%reset_selective species
Once deleted, variables cannot be recovered. Proceed (y/[n])?  y

Remove all variables from environment:

%reset
Once deleted, variables cannot be recovered. Proceed (y/[n])? n
Nothing done.

How to embed a video:

%%HTML
<iframe width="700" height="500" 
    src="https://www.youtube.com/embed/HW29067qVWk"
    frameborder="0"
    allowfullscreen></iframe>

🤖 Try it!#

Take a few minutes to play around with the line and cell magic commands. Create some variables and remove them again.


Visualize plots#

# Example from matplotlib documentation

import numpy as np # arrays and such
import matplotlib.pyplot as plt # plotting

# Fixing random state for reproducibility
np.random.seed(19680801)


N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2  # 0 to 15 point radii

plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()
_images/1.1_Jupyter_Introduction_46_0.png

Execution time#

If you want to test the speed of your code, the %%time magic can be very useful.

%%time
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()
_images/1.1_Jupyter_Introduction_49_0.png
CPU times: user 57.1 ms, sys: 4.01 ms, total: 61.1 ms
Wall time: 59.2 ms

Look up documentation#

If you want to check the documentation of the function you want to use, simply click inside the function and press Shift+Tab. Or, if that doesn’t help, look it up using your favorite search engine!

plt.scatter(x, y);
_images/1.1_Jupyter_Introduction_52_0.png

Dataframes#

import pandas as pd #handles dataframes in Python
import numpy as np #arrays and such in Python

# create sample dataframe

df = pd.DataFrame(np.random.randn(100,5), columns=["A","B","C","D","E"])
df
A B C D E
0 0.835275 0.181993 1.232291 -0.996842 -0.804238
1 1.833230 0.084046 -0.466226 -0.458791 -0.623695
2 0.645133 -1.851581 0.843342 1.093867 0.456576
3 0.273131 -1.916821 0.162999 0.920437 -0.667275
4 -0.046662 -0.613771 -0.374934 0.516941 0.538914
... ... ... ... ... ...
95 -1.207959 -0.517363 0.597141 0.588914 -0.872500
96 0.691405 0.009598 -0.211532 -0.821576 0.920173
97 -1.025475 0.269079 1.641999 -1.113975 -0.174968
98 -0.787913 -0.093945 -0.791022 -1.639523 -1.884071
99 -0.512947 0.432264 -1.149004 0.731894 -1.413364

100 rows × 5 columns

# display only the first 5 rows
df.head(5)
A B C D E
0 0.835275 0.181993 1.232291 -0.996842 -0.804238
1 1.833230 0.084046 -0.466226 -0.458791 -0.623695
2 0.645133 -1.851581 0.843342 1.093867 0.456576
3 0.273131 -1.916821 0.162999 0.920437 -0.667275
4 -0.046662 -0.613771 -0.374934 0.516941 0.538914

🤖 Try it!#

Use the scatter plot function above together with data from the sample dataframe df. Look up the plt.scatter() documentation to add features to your plot.

# play around with scatter plot function

Using R within a Python Juypter notebook#

R Magic#

To use R within a Python Juypter notebook, which was intitiated using a Python kernel, we need so-called R magic.

# enables the %%R magic, needs to be installed and then activated only once per Notebook 
%load_ext rpy2.ipython

As we’ve seen before, % denotes line magic, while %% denotes cell magic. R magic uses the same syntax.

%R x <- c(1, 2, 3)

x
array([0.7003673 , 0.74275081, 0.70928001, 0.56674552, 0.97778533,
       0.70633485, 0.24791576, 0.15788335, 0.69769852, 0.71995667,
       0.25774443, 0.34154678, 0.96876117, 0.6945071 , 0.46638326,
       0.7028127 , 0.51178587, 0.92874137, 0.7397693 , 0.62243903,
       0.65154547, 0.39680761, 0.54323939, 0.79989953, 0.72154473,
       0.29536398, 0.16094588, 0.20612551, 0.13432539, 0.48060502,
       0.34252181, 0.36296929, 0.97291764, 0.11094361, 0.38826409,
       0.78306588, 0.97289726, 0.48320961, 0.33642111, 0.56741904,
       0.04794151, 0.38893703, 0.90630365, 0.16101821, 0.74362113,
       0.63297416, 0.32418002, 0.92237653, 0.23722644, 0.82394557])

R variables can also be used across different cells, as long as you call R magic every time.

%R x
array([1., 2., 3.])
%%R

x <- append(x, c(5,6,7))
plot(x)
_images/1.1_Jupyter_Introduction_67_0.png

Move variables from Python environment to R, and vice versa#

%whos
Variable   Type         Data/Info
---------------------------------
N          int          50
area       ndarray      50: 50 elems, type `float64`, 400 bytes
colors     ndarray      50: 50 elems, type `float64`, 400 bytes
df         DataFrame               A         B   <...>n\n[100 rows x 5 columns]
np         module       <module 'numpy' from '/Us<...>kages/numpy/__init__.py'>
pd         module       <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
plt        module       <module 'matplotlib.pyplo<...>es/matplotlib/pyplot.py'>
title      str          Winterschool 2022
x          ndarray      50: 50 elems, type `float64`, 400 bytes
y          ndarray      50: 50 elems, type `float64`, 400 bytes
%Rpush df
%R df
A B C D E
0 0.835275 0.181993 1.232291 -0.996842 -0.804238
1 1.833230 0.084046 -0.466226 -0.458791 -0.623695
2 0.645133 -1.851581 0.843342 1.093867 0.456576
3 0.273131 -1.916821 0.162999 0.920437 -0.667275
4 -0.046662 -0.613771 -0.374934 0.516941 0.538914
... ... ... ... ... ...
95 -1.207959 -0.517363 0.597141 0.588914 -0.872500
96 0.691405 0.009598 -0.211532 -0.821576 0.920173
97 -1.025475 0.269079 1.641999 -1.113975 -0.174968
98 -0.787913 -0.093945 -0.791022 -1.639523 -1.884071
99 -0.512947 0.432264 -1.149004 0.731894 -1.413364

100 rows × 5 columns

%%R

correlation <- cor(df$A, df$B)
correlation
[1] -0.04536287
correlation = %Rget correlation
correlation
array([-0.04536287])

Create a new notebook using an R kernel#

See file Winterschool_2022_R_Kernel.ipynb

Practical example with vegetation data#

# import required packages

import pandas as pd #for data frames
import numpy as np #
from matplotlib import pyplot as plt #for plotting

sPlotOpen#

sPlotOpen (Sabatini et al, 2021) is an open-access and environmentally and spatially balanced subset of the global sPlot vegetation plots data set v2.1 (Bruelheide et al, 2019).

For future reference, sPlotOpen Data is available at the iDiv Data Repository. For this study we used version 52, which you can download using the following link: https://idata.idiv.de/ddm/Data/ShowData/3474

The data is stored in various tab-separated files:

  • sPlotOpen_header(2).txt : contains information on each plot, such as coordinates, date, biome, country, etc.

  • sPlotOpen_DT(1).txt : contains information per plot and species with abundance and relative cover

  • sPlotOpen_CWM_CWV(1).txt : contains information on trait community weighted means and variances for each plot and 18 traits (ln-transformed)

For this example, we will look at the trait community weighted means.

cwm = pd.read_csv("Data/sPlotOpen_CWM_CWV(1).txt", sep= "\t")

View the first 5 rows of data frame. Note: All values are in natural logarithm.

cwm.head()
PlotObservationID TraitCoverage_cover Species_richness TraitCoverage_pa LeafArea_CWM StemDens_CWM SLA_CWM LeafC_perdrymass_CWM LeafN_CWM LeafP_CWM ... Seed_length_CWV LDMC_CWV LeafNperArea_CWV LeafNPratio_CWV Leaf_delta_15N_CWV Seed_num_rep_unit_CWV Leaffreshmass_CWV Stem_cond_dens_CWV Disp_unit_leng_CWV Wood_vessel_length_CWV
0 16 0.277778 3 0.333333 3.678311 -1.047293 2.890748 6.128157 2.873263 1.114036 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 17 0.038462 2 0.500000 3.678311 -1.047293 2.890748 6.128157 2.873263 1.114036 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 18 0.047619 4 0.250000 3.678311 -1.047293 2.890748 6.128157 2.873263 1.114036 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 20 0.666667 3 0.333333 3.686063 -0.907135 2.903715 6.136791 2.929729 0.739181 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 22 0.538462 7 0.571429 3.899842 -0.900514 2.917708 6.131968 2.955072 0.733698 ... 0.011436 0.041385 0.022313 0.017075 0.186384 1.315851 0.306499 0.163156 0.052239 0.002832

5 rows × 40 columns

View information on the dataframe.

cwm.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95104 entries, 0 to 95103
Data columns (total 40 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   PlotObservationID       95104 non-null  int64  
 1   TraitCoverage_cover     95104 non-null  float64
 2   Species_richness        95104 non-null  int64  
 3   TraitCoverage_pa        95104 non-null  float64
 4   LeafArea_CWM            94622 non-null  float64
 5   StemDens_CWM            94622 non-null  float64
 6   SLA_CWM                 94622 non-null  float64
 7   LeafC_perdrymass_CWM    94622 non-null  float64
 8   LeafN_CWM               94622 non-null  float64
 9   LeafP_CWM               94622 non-null  float64
 10  PlantHeight_CWM         94622 non-null  float64
 11  SeedMass_CWM            94622 non-null  float64
 12  Seed_length_CWM         94622 non-null  float64
 13  LDMC_CWM                94622 non-null  float64
 14  LeafNperArea_CWM        94622 non-null  float64
 15  LeafNPratio_CWM         94622 non-null  float64
 16  Leaf_delta_15N_CWM      94622 non-null  float64
 17  Seed_num_rep_unit_CWM   94622 non-null  float64
 18  Leaffreshmass_CWM       94622 non-null  float64
 19  Stem_cond_dens_CWM      94622 non-null  float64
 20  Disp_unit_leng_CWM      94622 non-null  float64
 21  Wood_vessel_length_CWM  94622 non-null  float64
 22  LeafArea_CWV            92268 non-null  float64
 23  StemDens_CWV            92268 non-null  float64
 24  SLA_CWV                 92268 non-null  float64
 25  LeafC_perdrymass_CWV    92268 non-null  float64
 26  LeafN_CWV               92268 non-null  float64
 27  LeafP_CWV               92268 non-null  float64
 28  PlantHeight_CWV         92268 non-null  float64
 29  SeedMass_CWV            92268 non-null  float64
 30  Seed_length_CWV         92268 non-null  float64
 31  LDMC_CWV                92268 non-null  float64
 32  LeafNperArea_CWV        92268 non-null  float64
 33  LeafNPratio_CWV         92268 non-null  float64
 34  Leaf_delta_15N_CWV      92268 non-null  float64
 35  Seed_num_rep_unit_CWV   92268 non-null  float64
 36  Leaffreshmass_CWV       92268 non-null  float64
 37  Stem_cond_dens_CWV      92268 non-null  float64
 38  Disp_unit_leng_CWV      92268 non-null  float64
 39  Wood_vessel_length_CWV  92268 non-null  float64
dtypes: float64(38), int64(2)
memory usage: 29.0 MB

🤖 Try it!#

  1. Plot histograms of two trait cwm’s you are interested in. Extra credit: Plot both histograms inside one graph.

  2. Save figure(s) as PDF.

  3. Check via your Jupyter notebook if the figure was gererated properly.

  4. Plot a scatter plot using one trait x values and another trait on as y values.

  5. Calculate Pearson’s correlation coefficient to quantify the linear relationship of trait x and trait y.

Extra credit: Try and use the keyboard shortcuts to move around the notebook. Note: There are, of course, many different ways to answer these questions.

# plot trait cwm histogramm and export as PDF (give it a unique name!)

Check if image was generated properly.

# plot scatterplot of two traits in relation
# calculate Pearson's correlation coefficient r
# Hint 1: Use the pandas function DataFrame.corr() 
# Hint 2: Subsetting a pandas dataframe works like this: df[["variable_1", "variable_2"]]
# Or switch over to R using line or cell magic

Export your notebook#

Go to File > Download as at the top left in your Jupyter notebook window. You can download your notebook as:

  • html, which you could incorporate into your web-documentation, for example

  • as a PDF

  • in Jupyter notebook format .ipynb

  • as Python code .py

and many more.

🤖 Try it!#

Export this notebook as an html file and view it in your browser.


Requirements / Packages used in session#

import session_info
session_info.show()
Click to view session information
-----
matplotlib          3.5.1
numpy               1.21.5
pandas              1.4.2
session_info        1.0.0
-----
Click to view modules imported as dependencies
PIL                         9.2.0
appnope                     0.1.3
asttokens                   NA
backcall                    0.2.0
backports                   NA
cffi                        1.15.1
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.6.3
decorator                   5.1.1
defusedxml                  0.7.1
entrypoints                 0.4
executing                   1.2.0
ipykernel                   6.17.1
ipython_genutils            0.2.0
jedi                        0.18.1
jinja2                      3.1.2
kiwisolver                  1.4.4
markupsafe                  2.1.1
matplotlib_inline           0.1.6
mpl_toolkits                NA
packaging                   21.3
parso                       0.8.3
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
platformdirs                2.5.2
prompt_toolkit              3.0.32
psutil                      5.9.4
ptyprocess                  0.7.0
pure_eval                   0.2.2
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.8.0
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.13.0
pyparsing                   3.0.9
pytz                        2022.6
pytz_deprecation_shim       NA
rpy2                        3.5.1
six                         1.16.0
stack_data                  0.6.0
tornado                     6.2
traitlets                   5.5.0
tzlocal                     NA
wcwidth                     0.2.5
zmq                         24.0.1
-----
IPython             8.6.0
jupyter_client      7.4.4
jupyter_core        5.0.0
notebook            6.5.2
-----
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:16) [Clang 12.0.1 ]
macOS-11.2-arm64-arm-64bit
-----
Session information updated at 2022-11-10 15:58