import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
%config InlineBackend.figure_format = 'retina'
%matplotlib inline
plt.rcParams.update({'font.size':16})
So far we've covered how to start a new notebook and how to add markdown to cells. Now we'll look at how to code in notebooks. In Semester 1 you will have covered the basics of Python using Spyder. Python in notebooks works the same way, with your code in code cells rather than on the command line.
Python routines exist in packages. The first thing to do is to load the packages you need for your analysis using
import numpy
In the next cell we import the packages needed to plot a graph and fit a line to our data. You'll notice that some packages are loaded slightly differently, e.g.
import matplotlib.pyplot as plt
Here we're loading the pyplot
sub-package of matplotlib
. So we don't need to type matplotlib.pyplot
every time we want to use it, we load it as plt
.
You can call packages whatever you want when you load them with import package as name
, but some packages have commonly used short names. It's good to get in the habit of using the common short names; it makes using examples from the internet much easier!
Package name | Short name |
---|---|
numpy |
np |
matplotlib.pyplot |
plt |
pandas |
pd |
The code cells in notebooks work exactly the same way as if you typed something on the command line in Spyder. It's up to you how you split your code up over the cells. It's a good idea to try to have each cell do just one thing, like defining a function, setting variables or plotting a graph. This way if you do get an error it's easier to spot where the problem may be.
It's common practice to do any module importing at the top of your notebook. If you realise later on that you need to import a module that you hadn't thought of before, you can always go back and add it to the top cell. Just remember to rerun the cell.
## In this cell I'm assigning some variables.
a = 1
b = 6
c = 789
d = np.arange(0,10,0.1) ## this is a useful command that we'll cover later
## in this cell I'm doing some maths
## I'm also printing the result here, so I can check that I'm getting what I expect.
result = a*b + c
print(result)
## Now I'm going to plot a graph.
##Â Keeping this separate from the other code cells so I don't have to rerun everything every time I want to change my plot
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(1,1,1)
ax.plot(d, b*np.sin(d), 'k-')
ax.set_xlabel('d')
ax.set_ylabel('$b \sin(d)$')
ax.set_title('A simple plot')
plt.show()
The nice thing about using notebooks is that plots are stored in your notebook right alongside the code used to create them. You can still save them to a file in the usual way:
fig.savefig('simple_plot.png', dpi=300)
When you're saving figures you can also specify the resolution by setting the dpi
variable in savefig
. You shold save figures with a good resolution so that they show up clearly if you use them in something like a lab report later.
If you want to change something in your plot, you don't have to re-write the whole code, you can just go back and edit and run the cell with the plotting code. You don't need to rerun the cells above unless you change something in those.
Here we're going to repeat one of the exercises from the C5 worksheet you did in Semester 1. You will need the data_2.dat file you used in that exercise.
Previously you were asked to write a script that did all the steps. Here we're going to break the process down into separate cells.
If you haven't already, add import
statements to the first code cell in your notebook to import numpy
and matplotlib.pyplot
.
We can now read in the data file using numpy.loadtxt
:
time, height, unc = np.loadtxt("./data/data_2.dat", skiprows=2, unpack=True, delimiter=",")
You will need to edit the path "./data/data2_2.dat"
to point to where you have the file saved.
As a reminder, skiprows=2
tells np.loadtxt
to ignore the first two rows of the file, unpack=True
tells the function to unpack the rows into the time
, height
and unc
variables, and delimeter=","
tells it that the columns are separated by commas.
The next thing we should do is check that the data has been read in as we expect. We can do that by printing the variables and checking the shape of each array.
print(time, time.shape)
print(height, height.shape)
print(unc, unc.shape)
We can see that each of the arrays has 13 elements and the numbers look right. It's always a good idea to check that files have been read in as you expect. If you're reading in a lot of data, you can always just print the start of an array rather than the whole thing. In the next cell we're just printing the first 5 elements of the arrays.
print(time[:5], time.shape)
print(height[:5], height.shape)
print(unc[:5], unc.shape)
The next thing we want to do is plot a graph so we can estimate the gravitational constant, $g$.
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(1,1,1)
ax.errorbar(time, height, yerr=unc, marker='o', ms=3,c='k', ls='None', capsize=4)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Height (m)')
ax.set_title('Estimating $g$')
plt.show()
We have a nice plot, but how do we estimate $g$? This is where notebooks come in handy. Rather than work this out on a separate piece of paper, we can keep all our calculations in the notebook alongside our code.
Starting from the SUVAT equations:
$$ s = u t + \dfrac{1}{2} a t^2 $$where $s$ is displacement from the intial positoin, $u$ is the initial velocity, $a$ is accelleration and $t$ is time.
In this case, the initial velocity, $u=0$ and, assuming that the only source of acceleration is gravity, $a=g$.
So we can now say
$$ s = \dfrac{1}{2} g t^2 $$Now we can fit a function of this form to our data to get an estimate of $g$.
First we have to define a function to fit, then we can use curve_fit
to fit this to our data. Aha! But to use curve fit, we need to import it, so we should add from scipy.optimize import curve_fit
to our other import statements at the top of our notebook (and remember to rerun that cell).
# defining our function to fit to the data
def displacement(t, g):
return 0.5 * g * t**2
# Now we can use curve_fit to fit this function to the data and get out a value and uncertainty for g
popt, pcov = curve_fit(displacement,time,height)
g = popt[0]
unc_g = np.sqrt(float(pcov[0][0]))
# and print the results to see what we get
print(g, unc_g)
Hmmmmm. This looks weird. I was expecting $g$ to be negative... Let's plot it and see what's going on.
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(1,1,1)
ax.plot(time, displacement(time, g), 'b-')
ax.errorbar(time, height, yerr=unc, marker='o', ms=4,c='k', ls='None', capsize=4)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Height (m)')
ax.set_title('Estimating $g$')
plt.show()
There's definately something going wrong here.
Looking back at what we did earlier, we defined
$$ s = \dfrac{1}{2} g t^2 $$But how does it know where the object started from? That must be the mistake, we need to tell the function that the object starts at some position $h_0$ and falls down to the ground. Let's define a new function:
or $$ h = h_0 + \dfrac{1}{2} g t^2 $$
where $h$ is the height and $h_0$ is the starting height of the object.
# defining our function to fit to the data
def position(t, g, h0):
return h0 + 0.5 * g * t**2
# Now we can use curve_fit to fit this function to the data and get out a value and uncertainty for g
popt, pcov = curve_fit(position,time,height)
g = popt[0]
unc_g = np.sqrt(float(pcov[0][0]))
# and print the results to see what we get
print(g, unc_g)
# we should also check that it's made a sensible estimate for the starting height, h0
h0 = popt[1]
print(h0)
This is a much more sensible answer! But we should still check that the function looks like it fits the data.
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(1,1,1)
ax.plot(time, position(time, g, height[0]), 'b-')
ax.errorbar(time, height, yerr=unc, marker='o', ms=4,c='k', ls='None', capsize=4)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Height (m)')
ax.set_title('Estimating $g$')
plt.show()
Much better! I am now reassured that I can still do GCSE physics...
This is one of the benefits of using notebooks. I have all my notes for what I was trying to do alongside the code, so it didn't take too long to figure out what the problem is.
To make this plot actually useful we can add some extra information. We need a legend to show what the lines and sybols mean, and want to change the plot title to tell us the value we calculated for $g$.
To make a legend we have to first give the things we've plotted labels. In the cell below, I've added label='Best Fit'
and label='Data'
to the plot
and errorbar
commands that plotted the line and the data. We add the legend to the plot using ax.legend
.
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(1,1,1)
# plot and errorbar now have an extra parameter called 'label'
ax.plot(time, position(time, g, height[0]), 'b-', label='Best Fit')
ax.errorbar(time, height, yerr=unc, marker='o', ms=4,c='k', ls='None', capsize=4, label='Data')
ax.set_xlabel('Time (s)')
ax.set_ylabel('Height (m)')
ax.set_title('Estimating $g$')
## This is where we're adding the legend
ax.legend(loc='upper right')
plt.show()
To change the plot title to tell us the value of $g$ we have two options. We could hard-code the value into the ax.set_title
call, e.g.
ax.set_title('$g = -9.7477 m s^-2$')
But what if we re-ran the code and found a different value? We'd have to remember to update the title every time. The better option is to create a new string that takes our value for $g$ and inserts it into the title.
First we create a new string that contains the variable g
and formats it niceley. In the cell below we're using \$ to format the equation, and the parts in curly brackets define the fomatting of the numbers we want to print. `{0:.3f}` means take the first variable in the `.format` brackets and format it as a float with 3 decimal places. `{1:.3f}` does the same for the second variable (because remember Python starts counting from zero). To get the $m s^{-2}$ units formatted correctly we have to use an extra pair of curly brackets so Python knows that we're not searching for the -2'th element in the format
brackets.
title_string = "$g = {0:.3f} \pm {1:.3f}$ m s$^{{-2}}$".format(g, unc_g)
print(title_string)
Now we have our title string we can edit the code for the plot to have it display our new title.
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(1,1,1)
ax.plot(time, position(time, g, height[0]), 'b-', label='Best Fit')
ax.errorbar(time, height, yerr=unc, marker='o', ms=4,c='k', ls='None', capsize=4, label='Data')
ax.set_xlabel('Time (s)')
ax.set_ylabel('Height (m)')
title_string = "$g = {0:.3f} \pm {1:.3f}$ m s$^{{-2}}$".format(g, unc_g)
ax.set_title(title_string)
ax.legend(loc='upper right')
plt.show()
One thing you may have noticed is that I now have 4 versions of the same plot. This is so you can see the steps of adding each of the different elements. While it can be useful to keep different plots at different stages, there's no need to make a new version of the plot each time. You can just go back to the cell and change whatever you want.