This is a fixed-text formatted version of a Jupyter notebook.

Try online You can also contribute with your own notebooks in this GitHub repository. Source files: using_numpy.ipynb | using_numpy.py

# Rapid introduction on using numpy, scipy, matplotlib¶

This is meant to be a very brief reminder. It is strongly suggested to refer to more detailed introductions and tutorials see for instance: - A Whirlwind tour of Python - Python data science handbook - Scipy lectures

## Introduction¶

Here we will look at : - basic features regarding array manipulation and indexing - do a bit of plotting with matplotlib - use a number of useful scipy features - see an example of vectorization with a simple Monte Carlo problem

## numpy: arrays, indexing etc¶

In [1]:

import numpy as np

In [2]:

np.array([3,4,5])

Out[2]:

array([3, 4, 5])

In [3]:

np.array([[1, 2],[3,4]])

Out[3]:

array([[1, 2],
[3, 4]])

In [4]:

### linearly spaced 1D array
np.linspace(1.,10.,10)

Out[4]:

array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])

In [5]:

### log spaced 1D array
np.logspace(0.,1.,10)

Out[5]:

array([  1.        ,   1.29154967,   1.66810054,   2.15443469,
2.7825594 ,   3.59381366,   4.64158883,   5.9948425 ,
7.74263683,  10.        ])

In [6]:

### 1D array of zeros
np.zeros(5)

Out[6]:

array([ 0.,  0.,  0.,  0.,  0.])

In [7]:

### 2D array of zeros
np.zeros((3,3))

Out[7]:

array([[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  0.,  0.]])


### Types and casts¶

See numpy dtypes

In [8]:

x_int = np.logspace(0.,1.,10).astype('int')   # cast array as int
print(x_int)

[ 1  1  1  2  2  3  4  5  7 10]

In [9]:

x_int[1] = 2.34   # 2.34 is cast as int
print(x_int[1])

2

In [10]:

array_string = np.array(['a','b','c','d'])
array_string.dtype    # 1 character string

Out[10]:

dtype('<U1')

In [11]:

array_string[1]='bbbb'   # 'bbbb' is cast on 1 character string
array_string[1]

Out[11]:

'b'

In [12]:

array_string = np.array(['a','b','c','d'],dtype=np.dtype('S10'))
array_string[1] = 'bbbb'   # 'bbbb' is cast on 10 character string
array_string[1]

Out[12]:

b'bbbb'


### array indexing & slicing¶

In [13]:

x = np.arange(10)

In [14]:

x[-1]   # last element

Out[14]:

9

In [15]:

x[3:6]  # subarray

Out[15]:

array([3, 4, 5])

In [16]:

x[1::2] # stride

Out[16]:

array([1, 3, 5, 7, 9])

In [17]:

x[::-1] # stride

Out[17]:

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [18]:

x = np.array([np.arange(10*i,10*i+5) for i in range(5)])
x

Out[18]:

array([[ 0,  1,  2,  3,  4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])

In [19]:

print("first column : ", x[:,0])
print("last row     : ", x[-1,:])

first column :  [ 0 10 20 30 40]
last row     :  [40 41 42 43 44]

In [20]:

b=x[-1,:]   # This is a view not a copy!
b[:] += 1

print(x) # the initial matrix is changed!

[[ 0  1  2  3  4]
[10 11 12 13 14]
[20 21 22 23 24]
[30 31 32 33 34]
[41 42 43 44 45]]

In [21]:

# Fancy indexing
print(x % 2 == 1)

[[False  True False  True False]
[False  True False  True False]
[False  True False  True False]
[False  True False  True False]
[ True False  True False  True]]

In [22]:

x[x % 2 == 1] = 0
print(x)

[[ 0  0  2  0  4]
[10  0 12  0 14]
[20  0 22  0 24]
[30  0 32  0 34]
[ 0 42  0 44  0]]


In [23]:

x = np.linspace(1, 5, 5) + 4   # 4 is broadcast to 5 element array
x

Out[23]:

array([ 5.,  6.,  7.,  8.,  9.])

In [24]:

y = np.zeros((3, 5)) + x   # x is broadcast to (3,5) array
y

Out[24]:

array([[ 5.,  6.,  7.,  8.,  9.],
[ 5.,  6.,  7.,  8.,  9.],
[ 5.,  6.,  7.,  8.,  9.]])


## Plotting with matplotlib¶

We will see some plotting: - Simple plots - Histograms with matplotlib

In [25]:

# This is for embedding figures in the notebook
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')         # Fancy style


## Vectorization or loops: A very simple MC¶

We want to solve a simple statistical question. Assume a Poisson random process of mean mu. What is the density probability function pdf(n_val) of having at least one realization of the Poisson process out of N larger than n_val?

See for instance this paper

While this problem has an analytical solution we would like to test it with a simple MC.

We will first do it as one would do it with a C code and we will progressively vectorize the problem. We will use a timer to compare performance.

In [26]:

### Define the function
def poisson_sample_maximum(mu, N, Ntrials):
"""
Generate a set of Ntrials random variables defined as the maximum of N
random Poisson R.V. of mean mu
"""
res = np.zeros(Ntrials)
### Do a loop
for i in range(Ntrials):
### Generate N random varslues
Y = np.random.poisson(mu, size=(N))
### Take the maximum
res[i] = np.max(Y)

return res

mu = 5
N = 10
Ntrials = 1000000

%timeit values = poisson_sample_maximum(mu, N, Ntrials)

1 loop, best of 3: 12.1 s per loop


It does work, but no so fast…

To do it in a efficient and pythonic way we have to avoid loops as much as possible.

The idea here will then be to do all trials at once requiring random.poisson to produce a 2D matrix of size Nxtrials

In [27]:

### Define a better function
def poisson_sample_maximum_better(mu, N, Ntrials):
"""
Generate a set of Ntrials random variables defined as the maximum of N
random Poisson R.V. of mean mu
"""
### Generate N*Ntrials random values in N x Ntrials matrix
Y = np.random.poisson(mu,size=(N,Ntrials))
### Return the maximum in each row
return np.max(Y,0)

mu = 5
N = 10
Ntrials = 1000000

%timeit values = poisson_sample_maximum_better(mu, N, Ntrials)

1 loop, best of 3: 1.1 s per loop


We can now compare the distribution of MC simulated values to the actual analytical function.

In [28]:

values = poisson_sample_maximum_better(mu,N,Ntrials)

### Make and plot the normalized histogram
### We define the binning ouselves to have bins for each integer
bins = np.arange(0, 10 * mu)
histo = plt.hist(values, bins=bins, normed=True, log=True)

### Now compare to the analytical solution
from scipy.special import gammaincc

### Define a lambda function to compute analytical solution
proba = lambda nv, Nr, mu_p : gammaincc(nv + 1, mu_p) ** Nr - gammaincc(nv, mu_p) ** Nr

x = 0.5 * (bins[:-1] + bins[1:])
y = proba(bins[:-1], N, mu)
plt.plot(x, y)
plt.ylim(1e-6,1)   # restrict y range

Out[28]:

(1e-06, 1)


## Exercices¶

• write a vectorized function that takes an array of int and returns an array where square integers are replaced by their square root and the others are left unchanged
In [29]:

### A solution
def replace_square(n):
sqrt_n = np.sqrt(n)
return n + (sqrt_n == sqrt_n.astype(int))*(-n + sqrt_n)

print(replace_square(7.0))
print(replace_square(np.arange(26)))

7.0
[  0.   1.   2.   3.   2.   5.   6.   7.   8.   3.  10.  11.  12.  13.  14.
15.   4.  17.  18.  19.  20.  21.  22.  23.  24.   5.]

In [30]:

### or using where
def replace_square2(n):
sqrt_n = np.sqrt(n)
return np.where(sqrt_n == sqrt_n.astype(int),
sqrt_n, n)

print(replace_square2(7.0))
print(replace_square2(np.arange(26)))

7.0
[  0.   1.   2.   3.   2.   5.   6.   7.   8.   3.  10.  11.  12.  13.  14.
15.   4.  17.  18.  19.  20.  21.  22.  23.  24.   5.]