Note

You are not reading the most up to date version of Gammapy documentation.
Access the latest stable version v1.3 or the list of Gammapy releases.

This is a fixed-text formatted version of a Jupyter notebook.

You can contribute with your own notebooks in this GitHub repository.

Source files: using_numpy.ipynb | using_numpy.py

Rapid introduction on using numpy, scipy, matplotlib¶

This is meant to be a very brief reminder. It is strongly suggested to refer to more detailed introductions and tutorials see for instance: - A Whirlwind tour of Python - Python data science handbook - Scipy lectures

Introduction¶

Here we will look at : - basic features regarding array manipulation and indexing - do a bit of plotting with matplotlib - use a number of useful scipy features - see an example of vectorization with a simple Monte Carlo problem

numpy: arrays, indexing etc¶

In [1]:

import numpy as np

In [2]:

np.array([3,4,5])

Out[2]:

array([3, 4, 5])

In [3]:

np.array([[1, 2],[3,4]])

Out[3]:

array([[1, 2],
       [3, 4]])

In [4]:

### linearly spaced 1D array
np.linspace(1.,10.,10)

Out[4]:

array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])

In [5]:

### log spaced 1D array
np.logspace(0.,1.,10)

Out[5]:

array([  1.        ,   1.29154967,   1.66810054,   2.15443469,
         2.7825594 ,   3.59381366,   4.64158883,   5.9948425 ,
         7.74263683,  10.        ])

In [6]:

### 1D array of zeros
np.zeros(5)

Out[6]:

array([ 0.,  0.,  0.,  0.,  0.])

In [7]:

### 2D array of zeros
np.zeros((3,3))

Out[7]:

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Types and casts¶

See numpy dtypes

In [8]:

x_int = np.logspace(0.,1.,10).astype('int')   # cast array as int
print(x_int)

[ 1  1  1  2  2  3  4  5  7 10]

In [9]:

x_int[1] = 2.34   # 2.34 is cast as int
print(x_int[1])

In [10]:

array_string = np.array(['a','b','c','d'])
array_string.dtype    # 1 character string

Out[10]:

dtype('<U1')

In [11]:

array_string[1]='bbbb'   # 'bbbb' is cast on 1 character string
array_string[1]

Out[11]:

'b'

In [12]:

array_string = np.array(['a','b','c','d'],dtype=np.dtype('S10'))
array_string[1] = 'bbbb'   # 'bbbb' is cast on 10 character string
array_string[1]

Out[12]:

b'bbbb'

array indexing & slicing¶

In [13]:

x = np.arange(10)

In [14]:

x[-1]   # last element

Out[14]:

In [15]:

x[3:6]  # subarray

Out[15]:

array([3, 4, 5])

In [16]:

x[1::2] # stride

Out[16]:

array([1, 3, 5, 7, 9])

In [17]:

x[::-1] # stride

Out[17]:

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [18]:

x = np.array([np.arange(10*i,10*i+5) for i in range(5)])
x

Out[18]:

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

In [19]:

print("first column : ", x[:,0])
print("last row     : ", x[-1,:])

first column :  [ 0 10 20 30 40]
last row     :  [40 41 42 43 44]

In [20]:

b=x[-1,:]   # This is a view not a copy!
b[:] += 1

print(x) # the initial matrix is changed!

[[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [41 42 43 44 45]]

In [21]:

# Fancy indexing
print(x % 2 == 1)

[[False  True False  True False]
 [False  True False  True False]
 [False  True False  True False]
 [False  True False  True False]
 [ True False  True False  True]]

In [22]:

x[x % 2 == 1] = 0
print(x)

[[ 0  0  2  0  4]
 [10  0 12  0 14]
 [20  0 22  0 24]
 [30  0 32  0 34]
 [ 0 42  0 44  0]]

Broadcasting¶

In [23]:

x = np.linspace(1, 5, 5) + 4   # 4 is broadcast to 5 element array
x

Out[23]:

array([ 5.,  6.,  7.,  8.,  9.])

In [24]:

y = np.zeros((3, 5)) + x   # x is broadcast to (3,5) array
y

Out[24]:

array([[ 5.,  6.,  7.,  8.,  9.],
       [ 5.,  6.,  7.,  8.,  9.],
       [ 5.,  6.,  7.,  8.,  9.]])

Plotting with matplotlib¶

We will see some plotting: - Simple plots - Histograms with matplotlib

In [25]:

# This is for embedding figures in the notebook
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')         # Fancy style

Vectorization or loops: A very simple MC¶

We want to solve a simple statistical question. Assume a Poisson random process of mean mu. What is the density probability function pdf(n_val) of having at least one realization of the Poisson process out of N larger than n_val?

See for instance this paper

While this problem has an analytical solution we would like to test it with a simple MC.

We will first do it as one would do it with a C code and we will progressively vectorize the problem. We will use a timer to compare performance.

In [26]:

### Define the function
def poisson_sample_maximum(mu, N, Ntrials):
    """
    Generate a set of Ntrials random variables defined as the maximum of N
    random Poisson R.V. of mean mu
    """
    res = np.zeros(Ntrials)
    ### Do a loop
    for i in range(Ntrials):
        ### Generate N random varslues
        Y = np.random.poisson(mu, size=(N))
        ### Take the maximum
        res[i] = np.max(Y)

    return res

mu = 5
N = 10
Ntrials = 1000000

%timeit values = poisson_sample_maximum(mu, N, Ntrials)

1 loop, best of 3: 12.1 s per loop

It does work, but no so fast…

To do it in a efficient and pythonic way we have to avoid loops as much as possible.

The idea here will then be to do all trials at once requiring random.poisson to produce a 2D matrix of size Nxtrials

In [27]:

### Define a better function
def poisson_sample_maximum_better(mu, N, Ntrials):
    """
    Generate a set of Ntrials random variables defined as the maximum of N
    random Poisson R.V. of mean mu
    """
    ### Generate N*Ntrials random values in N x Ntrials matrix
    Y = np.random.poisson(mu,size=(N,Ntrials))
    ### Return the maximum in each row
    return np.max(Y,0)

mu = 5
N = 10
Ntrials = 1000000

%timeit values = poisson_sample_maximum_better(mu, N, Ntrials)

1 loop, best of 3: 1.1 s per loop

We can now compare the distribution of MC simulated values to the actual analytical function.

In [28]:

values = poisson_sample_maximum_better(mu,N,Ntrials)

### Make and plot the normalized histogram
### We define the binning ouselves to have bins for each integer
bins = np.arange(0, 10 * mu)
histo = plt.hist(values, bins=bins, normed=True, log=True)

### Now compare to the analytical solution
from scipy.special import gammaincc

### Define a lambda function to compute analytical solution
proba = lambda nv, Nr, mu_p : gammaincc(nv + 1, mu_p) ** Nr - gammaincc(nv, mu_p) ** Nr

x = 0.5 * (bins[:-1] + bins[1:])
y = proba(bins[:-1], N, mu)
plt.plot(x, y)
plt.ylim(1e-6,1)   # restrict y range

Out[28]:

(1e-06, 1)

../_images/notebooks_using_numpy_37_1.png

Exercices¶

write a vectorized function that takes an array of int and returns an array where square integers are replaced by their square root and the others are left unchanged

In [29]:

### A solution
def replace_square(n):
    sqrt_n = np.sqrt(n)
    return n + (sqrt_n == sqrt_n.astype(int))*(-n + sqrt_n)

print(replace_square(7.0))
print(replace_square(np.arange(26)))

7.0
[  0.   1.   2.   3.   2.   5.   6.   7.   8.   3.  10.  11.  12.  13.  14.
  15.   4.  17.  18.  19.  20.  21.  22.  23.  24.   5.]

In [30]:

### or using where
def replace_square2(n):
    sqrt_n = np.sqrt(n)
    return np.where(sqrt_n == sqrt_n.astype(int),
                    sqrt_n, n)

print(replace_square2(7.0))
print(replace_square2(np.arange(26)))

7.0
[  0.   1.   2.   3.   2.   5.   6.   7.   8.   3.  10.  11.  12.  13.  14.
  15.   4.  17.  18.  19.  20.  21.  22.  23.  24.   5.]

Navigation