This is a fixed-text formatted version of a Jupyter notebook.

You can contribute with your own notebooks in this GitHub repository.

Source files: using_numpy.ipynb | using_numpy.py

Rapid introduction on using numpy, scipy, matplotlib

This is meant to be a very brief reminder. It is strongly suggested to refer to more detailed introductions and tutorials see for instance: - A Whirlwind tour of Python - Python data science handbook - Scipy lectures

Introduction

Here we will look at : - basic features regarding array manipulation and indexing - do a bit of plotting with matplotlib - use a number of useful scipy features - see an example of vectorization with a simple Monte Carlo problem

numpy: arrays, indexing etc

In [1]:
import numpy as np
In [2]:
np.array([3,4,5])
Out[2]:
array([3, 4, 5])
In [3]:
np.array([[1, 2],[3,4]])
Out[3]:
array([[1, 2],
       [3, 4]])
In [4]:
### linearly spaced 1D array
np.linspace(1.,10.,10)
Out[4]:
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])
In [5]:
### log spaced 1D array
np.logspace(0.,1.,10)
Out[5]:
array([  1.        ,   1.29154967,   1.66810054,   2.15443469,
         2.7825594 ,   3.59381366,   4.64158883,   5.9948425 ,
         7.74263683,  10.        ])
In [6]:
### 1D array of zeros
np.zeros(5)
Out[6]:
array([ 0.,  0.,  0.,  0.,  0.])
In [7]:
### 2D array of zeros
np.zeros((3,3))
Out[7]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Types and casts

See numpy dtypes

In [8]:
x_int = np.logspace(0.,1.,10).astype('int')   # cast array as int
print(x_int)
[ 1  1  1  2  2  3  4  5  7 10]
In [9]:
x_int[1] = 2.34   # 2.34 is cast as int
print(x_int[1])
2
In [10]:
array_string = np.array(['a','b','c','d'])
array_string.dtype    # 1 character string
Out[10]:
dtype('<U1')
In [11]:
array_string[1]='bbbb'   # 'bbbb' is cast on 1 character string
array_string[1]
Out[11]:
'b'
In [12]:
array_string = np.array(['a','b','c','d'],dtype=np.dtype('S10'))
array_string[1] = 'bbbb'   # 'bbbb' is cast on 10 character string
array_string[1]
Out[12]:
b'bbbb'

array indexing & slicing

In [13]:
x = np.arange(10)
In [14]:
x[-1]   # last element
Out[14]:
9
In [15]:
x[3:6]  # subarray
Out[15]:
array([3, 4, 5])
In [16]:
x[1::2] # stride
Out[16]:
array([1, 3, 5, 7, 9])
In [17]:
x[::-1] # stride
Out[17]:
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
In [18]:
x = np.array([np.arange(10*i,10*i+5) for i in range(5)])
x
Out[18]:
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
In [19]:
print("first column : ", x[:,0])
print("last row     : ", x[-1,:])
first column :  [ 0 10 20 30 40]
last row     :  [40 41 42 43 44]
In [20]:
b=x[-1,:]   # This is a view not a copy!
b[:] += 1

print(x) # the initial matrix is changed!
[[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [41 42 43 44 45]]
In [21]:
# Fancy indexing
print(x % 2 == 1)
[[False  True False  True False]
 [False  True False  True False]
 [False  True False  True False]
 [False  True False  True False]
 [ True False  True False  True]]
In [22]:
x[x % 2 == 1] = 0
print(x)
[[ 0  0  2  0  4]
 [10  0 12  0 14]
 [20  0 22  0 24]
 [30  0 32  0 34]
 [ 0 42  0 44  0]]

Broadcasting

In [23]:
x = np.linspace(1, 5, 5) + 4   # 4 is broadcast to 5 element array
x
Out[23]:
array([ 5.,  6.,  7.,  8.,  9.])
In [24]:
y = np.zeros((3, 5)) + x   # x is broadcast to (3,5) array
y
Out[24]:
array([[ 5.,  6.,  7.,  8.,  9.],
       [ 5.,  6.,  7.,  8.,  9.],
       [ 5.,  6.,  7.,  8.,  9.]])

Plotting with matplotlib

We will see some plotting: - Simple plots - Histograms with matplotlib

In [25]:
# This is for embedding figures in the notebook
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')         # Fancy style

Vectorization or loops: A very simple MC

We want to solve a simple statistical question. Assume a Poisson random process of mean mu. What is the density probability function pdf(n_val) of having at least one realization of the Poisson process out of N larger than n_val?

See for instance this paper

While this problem has an analytical solution we would like to test it with a simple MC.

We will first do it as one would do it with a C code and we will progressively vectorize the problem. We will use a timer to compare performance.

In [26]:
### Define the function
def poisson_sample_maximum(mu, N, Ntrials):
    """
    Generate a set of Ntrials random variables defined as the maximum of N
    random Poisson R.V. of mean mu
    """
    res = np.zeros(Ntrials)
    ### Do a loop
    for i in range(Ntrials):
        ### Generate N random varslues
        Y = np.random.poisson(mu, size=(N))
        ### Take the maximum
        res[i] = np.max(Y)

    return res

mu = 5
N = 10
Ntrials = 1000000

%timeit values = poisson_sample_maximum(mu, N, Ntrials)
1 loop, best of 3: 12.1 s per loop

It does work, but no so fast…

To do it in a efficient and pythonic way we have to avoid loops as much as possible.

The idea here will then be to do all trials at once requiring random.poisson to produce a 2D matrix of size Nxtrials

In [27]:
### Define a better function
def poisson_sample_maximum_better(mu, N, Ntrials):
    """
    Generate a set of Ntrials random variables defined as the maximum of N
    random Poisson R.V. of mean mu
    """
    ### Generate N*Ntrials random values in N x Ntrials matrix
    Y = np.random.poisson(mu,size=(N,Ntrials))
    ### Return the maximum in each row
    return np.max(Y,0)

mu = 5
N = 10
Ntrials = 1000000

%timeit values = poisson_sample_maximum_better(mu, N, Ntrials)
1 loop, best of 3: 1.1 s per loop

We can now compare the distribution of MC simulated values to the actual analytical function.

In [28]:
values = poisson_sample_maximum_better(mu,N,Ntrials)

### Make and plot the normalized histogram
### We define the binning ouselves to have bins for each integer
bins = np.arange(0, 10 * mu)
histo = plt.hist(values, bins=bins, normed=True, log=True)

### Now compare to the analytical solution
from scipy.special import gammaincc

### Define a lambda function to compute analytical solution
proba = lambda nv, Nr, mu_p : gammaincc(nv + 1, mu_p) ** Nr - gammaincc(nv, mu_p) ** Nr

x = 0.5 * (bins[:-1] + bins[1:])
y = proba(bins[:-1], N, mu)
plt.plot(x, y)
plt.ylim(1e-6,1)   # restrict y range
Out[28]:
(1e-06, 1)
../_images/notebooks_using_numpy_37_1.png

Exercices

  • write a vectorized function that takes an array of int and returns an array where square integers are replaced by their square root and the others are left unchanged
In [29]:
### A solution
def replace_square(n):
    sqrt_n = np.sqrt(n)
    return n + (sqrt_n == sqrt_n.astype(int))*(-n + sqrt_n)

print(replace_square(7.0))
print(replace_square(np.arange(26)))
7.0
[  0.   1.   2.   3.   2.   5.   6.   7.   8.   3.  10.  11.  12.  13.  14.
  15.   4.  17.  18.  19.  20.  21.  22.  23.  24.   5.]
In [30]:
### or using where
def replace_square2(n):
    sqrt_n = np.sqrt(n)
    return np.where(sqrt_n == sqrt_n.astype(int),
                    sqrt_n, n)

print(replace_square2(7.0))
print(replace_square2(np.arange(26)))
7.0
[  0.   1.   2.   3.   2.   5.   6.   7.   8.   3.  10.  11.  12.  13.  14.
  15.   4.  17.  18.  19.  20.  21.  22.  23.  24.   5.]