Introduction to Jupyter Notebook (Part 1)#

gfbio Winterschool 2022, Sophie Wolf

Jupyter refers to Julia, Python and R

Jupyter Notebook (formerly IPython Notebooks) is a web-based open-source interactive computational environment.

A notbook contains a list of input and output cells, which can contain:

# code
1+1

Text formatted with Markdown

import matplotlib.pyplot as plt

plt.plot(range(10))
plt.title('Display plot in notebook')
plt.show()

_images/1.1_Jupyter_Introduction_5_0.png

Creating a new notebook#

Kernels

We will be using Python 3 and R kernels.

First orientation#

user interface tour
command mode and edit mode
keyboard shortcuts

print("Winterschool 2022")

Winterschool 2022

title = "Winterschool 2022"

title

'Winterschool 2022'

"Winterschool 2022"

'Winterschool 2022'

Supress output using ;

"Winterschool 2023";

🤖 Try it!#

Take a few minutes to explore the keyboard shortcuts.

Ordering of executions#

Be mindful: You can execute Jupyter cells in any order. The execution order is displayed as numbers to the left of each code cell.

species = "Nepeta cataria"

species

'Theobroma cacao'

species = "Theobroma cacao"

A few remarks on Markdown#

Markdown is a simple markup language for creating formatted text using a plain-text editor.

Some functions#

italics
bold
code block here

You can create things like this nice table:

Some traits in TRY database	ID	Unit
Leaf area (in case of compound leaves: leaflet, undefined if petiole is in- or excluded)	3113	mm²
Leaf area per leaf dry mass (specific leaf area, SLA or 1/LMA): undefined if petiole is in- or excluded)	3117	m²/kg
Stem specific density (SSD) or wood density (stem dry mass per stem fresh volume)	4	g/cm³
Leaf carbon (C) content per leaf dry mass	13	mg/g
Leaf nitrogen (N) content per leaf dry mass	14	mg/g
Leaf phosphorus (P) content per leaf dry mass	15	mg/g
Plant height vegetative	3106	m

Or insert an image from your local machine or url.

iNaturalist observation: Fouquieria splendens

This image is a citizen science observation from the project iNaturalist.

Line magic and cell magic#

Jupyter has a whole library of so-called line or cell magic. These allow you to switch to other programming languages within on single Jupyter notebook.

bash command#

To access the terminal simply use an ! before your bash command.

!pwd

/Users/sophiewolf/Documents/GitHub/Jupyter_Workshop_Winterschool_2022

For example, you could use it to install a new package:

!pip install tqdm

There are many in-built options for line and cell magics. They always start with:

% for line magic
%% for cell magic

To list all the built-in available magic commands type the following:

%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

Line magic refers only to the one line. So all code before and after this line within the same cell will be interpreted as the kernel language, in our case Python 3. There are many bash line magic commands, such as %ls or %pwd.

%ls

print("Hi! I'm Python code.")
!echo "And I'm bash script."

Data/
Figures/
README.md
Winterschool_2022_Additional_Materials.ipynb
Winterschool_2022_Jupyter_Data_Analysis.ipynb
Winterschool_2022_Jupyter_Introduction.ipynb
Winterschool_2022_R_Kernel.ipynb
requirements.txt
Hi! I'm Python code.
And I'm bash script.

Cell magic applies to the whole cell, so all code will be interpreted in reference to the magic command. As a result, the following command will generate an error. Cell magic must always be placed at the top of the cell.

%%bash

ls
echo "Hi! I'm bash script."
print("And I'm Python code.")

Data
Figures
README.md
Winterschool_2022_Additional_Materials.ipynb
Winterschool_2022_Jupyter_Data_Analysis.ipynb
Winterschool_2022_Jupyter_Introduction.ipynb
Winterschool_2022_R_Kernel.ipynb
requirements.txt
Hi! I'm bash script.

bash: line 4: syntax error near unexpected token `"And I'm Python code."'
bash: line 4: `print("And I'm Python code.")'

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In [15], line 1
----> 1 get_ipython().run_cell_magic('bash', '', '\nls\necho "Hi! I\'m bash script."\nprint("And I\'m Python code.")\n')

File ~/miniforge3/envs/winterschool/lib/python3.8/site-packages/IPython/core/interactiveshell.py:2417, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2415 with self.builtin_trap:
   2416     args = (magic_arg_s, cell)
-> 2417     result = fn(*args, **kwargs)
   2418 return result

File ~/miniforge3/envs/winterschool/lib/python3.8/site-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    151 else:
    152     line = script
--> 153 return self.shebang(line, cell)

File ~/miniforge3/envs/winterschool/lib/python3.8/site-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell)
    300 if args.raise_error and p.returncode != 0:
    301     # If we get here and p.returncode is still None, we must have
    302     # killed it but not yet seen its return code. We don't wait for it,
    303     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    304     rc = p.returncode or -9
--> 305     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'\nls\necho "Hi! I\'m bash script."\nprint("And I\'m Python code.")\n'' returned non-zero exit status 2.

Since Jupyter cells can be executed in any order, you might need to check your notebooks variables. The following commands can be very useful:

# current variable names

%who

plt	 species	 title

# current variables, incl. type and data

%whos

Variable   Type      Data/Info
------------------------------
plt        module    <module 'matplotlib.pyplo<...>es/matplotlib/pyplot.py'>
species    str       Theobroma cacao
title      str       Winterschool 2022

Remove a specific variable from environment:

%reset_selective species

Once deleted, variables cannot be recovered. Proceed (y/[n])?  y

Remove all variables from environment:

%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? n
Nothing done.

How to embed a video:

%%HTML
<iframe width="700" height="500" 
    src="https://www.youtube.com/embed/HW29067qVWk"
    frameborder="0"
    allowfullscreen></iframe>

🤖 Try it!#

Take a few minutes to play around with the line and cell magic commands. Create some variables and remove them again.

Visualize plots#

# Example from matplotlib documentation

import numpy as np # arrays and such
import matplotlib.pyplot as plt # plotting

# Fixing random state for reproducibility
np.random.seed(19680801)


N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2  # 0 to 15 point radii

plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

_images/1.1_Jupyter_Introduction_46_0.png

Execution time#

If you want to test the speed of your code, the %%time magic can be very useful.

%%time
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

_images/1.1_Jupyter_Introduction_49_0.png

CPU times: user 57.1 ms, sys: 4.01 ms, total: 61.1 ms
Wall time: 59.2 ms

Look up documentation#

If you want to check the documentation of the function you want to use, simply click inside the function and press Shift+Tab. Or, if that doesn’t help, look it up using your favorite search engine!

plt.scatter(x, y);

_images/1.1_Jupyter_Introduction_52_0.png

Dataframes#

import pandas as pd #handles dataframes in Python
import numpy as np #arrays and such in Python

# create sample dataframe

df = pd.DataFrame(np.random.randn(100,5), columns=["A","B","C","D","E"])
df

	A	B	C	D	E
0	0.835275	0.181993	1.232291	-0.996842	-0.804238
1	1.833230	0.084046	-0.466226	-0.458791	-0.623695
2	0.645133	-1.851581	0.843342	1.093867	0.456576
3	0.273131	-1.916821	0.162999	0.920437	-0.667275
4	-0.046662	-0.613771	-0.374934	0.516941	0.538914
...	...	...	...	...	...
95	-1.207959	-0.517363	0.597141	0.588914	-0.872500
96	0.691405	0.009598	-0.211532	-0.821576	0.920173
97	-1.025475	0.269079	1.641999	-1.113975	-0.174968
98	-0.787913	-0.093945	-0.791022	-1.639523	-1.884071
99	-0.512947	0.432264	-1.149004	0.731894	-1.413364

100 rows × 5 columns

# display only the first 5 rows
df.head(5)

	A	B	C	D	E
0	0.835275	0.181993	1.232291	-0.996842	-0.804238
1	1.833230	0.084046	-0.466226	-0.458791	-0.623695
2	0.645133	-1.851581	0.843342	1.093867	0.456576
3	0.273131	-1.916821	0.162999	0.920437	-0.667275
4	-0.046662	-0.613771	-0.374934	0.516941	0.538914

🤖 Try it!#

Use the scatter plot function above together with data from the sample dataframe df. Look up the plt.scatter() documentation to add features to your plot.

# play around with scatter plot function

Using R within a Python Juypter notebook#

R Magic#

To use R within a Python Juypter notebook, which was intitiated using a Python kernel, we need so-called R magic.

# enables the %%R magic, needs to be installed and then activated only once per Notebook 
%load_ext rpy2.ipython

As we’ve seen before, % denotes line magic, while %% denotes cell magic. R magic uses the same syntax.

%R x <- c(1, 2, 3)

x

array([0.7003673 , 0.74275081, 0.70928001, 0.56674552, 0.97778533,
       0.70633485, 0.24791576, 0.15788335, 0.69769852, 0.71995667,
       0.25774443, 0.34154678, 0.96876117, 0.6945071 , 0.46638326,
       0.7028127 , 0.51178587, 0.92874137, 0.7397693 , 0.62243903,
       0.65154547, 0.39680761, 0.54323939, 0.79989953, 0.72154473,
       0.29536398, 0.16094588, 0.20612551, 0.13432539, 0.48060502,
       0.34252181, 0.36296929, 0.97291764, 0.11094361, 0.38826409,
       0.78306588, 0.97289726, 0.48320961, 0.33642111, 0.56741904,
       0.04794151, 0.38893703, 0.90630365, 0.16101821, 0.74362113,
       0.63297416, 0.32418002, 0.92237653, 0.23722644, 0.82394557])

R variables can also be used across different cells, as long as you call R magic every time.

%R x

array([1., 2., 3.])

%%R

x <- append(x, c(5,6,7))
plot(x)

_images/1.1_Jupyter_Introduction_67_0.png

Move variables from Python environment to R, and vice versa#

%whos

Variable   Type         Data/Info
---------------------------------
N          int          50
area       ndarray      50: 50 elems, type `float64`, 400 bytes
colors     ndarray      50: 50 elems, type `float64`, 400 bytes
df         DataFrame               A         B   <...>n\n[100 rows x 5 columns]
np         module       <module 'numpy' from '/Us<...>kages/numpy/__init__.py'>
pd         module       <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
plt        module       <module 'matplotlib.pyplo<...>es/matplotlib/pyplot.py'>
title      str          Winterschool 2022
x          ndarray      50: 50 elems, type `float64`, 400 bytes
y          ndarray      50: 50 elems, type `float64`, 400 bytes

%Rpush df

%R df

	A	B	C	D	E
0	0.835275	0.181993	1.232291	-0.996842	-0.804238
1	1.833230	0.084046	-0.466226	-0.458791	-0.623695
2	0.645133	-1.851581	0.843342	1.093867	0.456576
3	0.273131	-1.916821	0.162999	0.920437	-0.667275
4	-0.046662	-0.613771	-0.374934	0.516941	0.538914
...	...	...	...	...	...
95	-1.207959	-0.517363	0.597141	0.588914	-0.872500
96	0.691405	0.009598	-0.211532	-0.821576	0.920173
97	-1.025475	0.269079	1.641999	-1.113975	-0.174968
98	-0.787913	-0.093945	-0.791022	-1.639523	-1.884071
99	-0.512947	0.432264	-1.149004	0.731894	-1.413364

100 rows × 5 columns

%%R

correlation <- cor(df$A, df$B)
correlation

[1] -0.04536287

correlation = %Rget correlation
correlation

array([-0.04536287])

Create a new notebook using an R kernel#

See file Winterschool_2022_R_Kernel.ipynb

Practical example with vegetation data#

# import required packages

import pandas as pd #for data frames
import numpy as np #
from matplotlib import pyplot as plt #for plotting

sPlotOpen#

sPlotOpen (Sabatini et al, 2021) is an open-access and environmentally and spatially balanced subset of the global sPlot vegetation plots data set v2.1 (Bruelheide et al, 2019).

For future reference, sPlotOpen Data is available at the iDiv Data Repository. For this study we used version 52, which you can download using the following link: https://idata.idiv.de/ddm/Data/ShowData/3474

The data is stored in various tab-separated files:

sPlotOpen_header(2).txt : contains information on each plot, such as coordinates, date, biome, country, etc.
sPlotOpen_DT(1).txt : contains information per plot and species with abundance and relative cover
sPlotOpen_CWM_CWV(1).txt : contains information on trait community weighted means and variances for each plot and 18 traits (ln-transformed)

For this example, we will look at the trait community weighted means.

cwm = pd.read_csv("Data/sPlotOpen_CWM_CWV(1).txt", sep= "\t")

View the first 5 rows of data frame. Note: All values are in natural logarithm.

cwm.head()

	PlotObservationID	TraitCoverage_cover	Species_richness	TraitCoverage_pa	LeafArea_CWM	StemDens_CWM	SLA_CWM	LeafC_perdrymass_CWM	LeafN_CWM	LeafP_CWM	...	Seed_length_CWV	LDMC_CWV	LeafNperArea_CWV	LeafNPratio_CWV	Leaf_delta_15N_CWV	Seed_num_rep_unit_CWV	Leaffreshmass_CWV	Stem_cond_dens_CWV	Disp_unit_leng_CWV	Wood_vessel_length_CWV
0	16	0.277778	3	0.333333	3.678311	-1.047293	2.890748	6.128157	2.873263	1.114036	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	17	0.038462	2	0.500000	3.678311	-1.047293	2.890748	6.128157	2.873263	1.114036	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	18	0.047619	4	0.250000	3.678311	-1.047293	2.890748	6.128157	2.873263	1.114036	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	20	0.666667	3	0.333333	3.686063	-0.907135	2.903715	6.136791	2.929729	0.739181	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	22	0.538462	7	0.571429	3.899842	-0.900514	2.917708	6.131968	2.955072	0.733698	...	0.011436	0.041385	0.022313	0.017075	0.186384	1.315851	0.306499	0.163156	0.052239	0.002832

5 rows × 40 columns

View information on the dataframe.

cwm.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95104 entries, 0 to 95103
Data columns (total 40 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 PlotObservationID       95104 non-null  int64  
 TraitCoverage_cover     95104 non-null  float64
 Species_richness        95104 non-null  int64  
 TraitCoverage_pa        95104 non-null  float64
 LeafArea_CWM            94622 non-null  float64
 StemDens_CWM            94622 non-null  float64
 SLA_CWM                 94622 non-null  float64
 LeafC_perdrymass_CWM    94622 non-null  float64
 LeafN_CWM               94622 non-null  float64
 LeafP_CWM               94622 non-null  float64
PlantHeight_CWM         94622 non-null  float64
SeedMass_CWM            94622 non-null  float64
Seed_length_CWM         94622 non-null  float64
LDMC_CWM                94622 non-null  float64
LeafNperArea_CWM        94622 non-null  float64
LeafNPratio_CWM         94622 non-null  float64
Leaf_delta_15N_CWM      94622 non-null  float64
Seed_num_rep_unit_CWM   94622 non-null  float64
Leaffreshmass_CWM       94622 non-null  float64
Stem_cond_dens_CWM      94622 non-null  float64
Disp_unit_leng_CWM      94622 non-null  float64
Wood_vessel_length_CWM  94622 non-null  float64
LeafArea_CWV            92268 non-null  float64
StemDens_CWV            92268 non-null  float64
SLA_CWV                 92268 non-null  float64
LeafC_perdrymass_CWV    92268 non-null  float64
LeafN_CWV               92268 non-null  float64
LeafP_CWV               92268 non-null  float64
PlantHeight_CWV         92268 non-null  float64
SeedMass_CWV            92268 non-null  float64
Seed_length_CWV         92268 non-null  float64
LDMC_CWV                92268 non-null  float64
LeafNperArea_CWV        92268 non-null  float64
LeafNPratio_CWV         92268 non-null  float64
Leaf_delta_15N_CWV      92268 non-null  float64
Seed_num_rep_unit_CWV   92268 non-null  float64
Leaffreshmass_CWV       92268 non-null  float64
Stem_cond_dens_CWV      92268 non-null  float64
Disp_unit_leng_CWV      92268 non-null  float64
Wood_vessel_length_CWV  92268 non-null  float64
dtypes: float64(38), int64(2)
memory usage: 29.0 MB

🤖 Try it!#

Plot histograms of two trait cwm’s you are interested in. Extra credit: Plot both histograms inside one graph.
Save figure(s) as PDF.
Check via your Jupyter notebook if the figure was gererated properly.
Plot a scatter plot using one trait x values and another trait on as y values.
Calculate Pearson’s correlation coefficient to quantify the linear relationship of trait x and trait y.

Extra credit: Try and use the keyboard shortcuts to move around the notebook. Note: There are, of course, many different ways to answer these questions.

# plot trait cwm histogramm and export as PDF (give it a unique name!)

Check if image was generated properly.

# plot scatterplot of two traits in relation

# calculate Pearson's correlation coefficient r
# Hint 1: Use the pandas function DataFrame.corr() 
# Hint 2: Subsetting a pandas dataframe works like this: df[["variable_1", "variable_2"]]
# Or switch over to R using line or cell magic

Export your notebook#

Go to File > Download as at the top left in your Jupyter notebook window. You can download your notebook as:

html, which you could incorporate into your web-documentation, for example
as a PDF
in Jupyter notebook format .ipynb
as Python code .py

and many more.

🤖 Try it!#

Export this notebook as an html file and view it in your browser.

Requirements / Packages used in session#

import session_info
session_info.show()

Click to view session information

-----
matplotlib          3.5.1
numpy               1.21.5
pandas              1.4.2
session_info        1.0.0
-----

Click to view modules imported as dependencies

PIL                         9.2.0
appnope                     0.1.3
asttokens                   NA
backcall                    0.2.0
backports                   NA
cffi                        1.15.1
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.6.3
decorator                   5.1.1
defusedxml                  0.7.1
entrypoints                 0.4
executing                   1.2.0
ipykernel                   6.17.1
ipython_genutils            0.2.0
jedi                        0.18.1
jinja2                      3.1.2
kiwisolver                  1.4.4
markupsafe                  2.1.1
matplotlib_inline           0.1.6
mpl_toolkits                NA
packaging                   21.3
parso                       0.8.3
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
platformdirs                2.5.2
prompt_toolkit              3.0.32
psutil                      5.9.4
ptyprocess                  0.7.0
pure_eval                   0.2.2
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.8.0
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.13.0
pyparsing                   3.0.9
pytz                        2022.6
pytz_deprecation_shim       NA
rpy2                        3.5.1
six                         1.16.0
stack_data                  0.6.0
tornado                     6.2
traitlets                   5.5.0
tzlocal                     NA
wcwidth                     0.2.5
zmq                         24.0.1

-----
IPython             8.6.0
jupyter_client      7.4.4
jupyter_core        5.0.0
notebook            6.5.2
-----
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:16) [Clang 12.0.1 ]
macOS-11.2-arm64-arm-64bit
-----
Session information updated at 2022-11-10 15:58

Jupyter Notebook Tutorial

Introduction to Jupyter Notebook (Part 1)

Contents

Introduction to Jupyter Notebook (Part 1)#

Creating a new notebook#

First orientation#

🤖 Try it!#

Ordering of executions#

A few remarks on Markdown#

Some functions#

Line magic and cell magic#

bash command#

🤖 Try it!#

Visualize plots#

Execution time#

Look up documentation#

Dataframes#

🤖 Try it!#

Using R within a Python Juypter notebook#

R Magic#

Move variables from Python environment to R, and vice versa#

Create a new notebook using an R kernel#

Practical example with vegetation data#

sPlotOpen#

🤖 Try it!#

Export your notebook#

🤖 Try it!#

Requirements / Packages used in session#