In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

%config InlineBackend.figure_format = 'retina'
%matplotlib inline

plt.rcParams.update({'font.size':16}) 

Further exercises with pandas

Work through the exercises in this section to get more familiar with pandas. Similar to the Introduction to Jupyter Notebooks exercises, I haven't provided full solutions to all of these exercises.

As a change from (astro)physics data, today we're going to be looking at penguins.

Emperor_Penguin_Manchot_empereur.jpg

To get started, download the palmerpenguins dataset and save it in the folder you're using for this weeks work. Remember, keeping all your data files and notebooks in one place saves you the hassle of messing about with paths every time.

Throughout these exercises you should be writing what you are doing and why you're doing it in the markdown cells alongside your code. I know I keep saying this, but it is important. Could I be giving you a clue to what I'm expecting you to do for your coursework? You could say that, but I couldn't possibly comment.

If you want to know more about where the dataset we're using here comes from, you can check the palmerpenguins website

Read in your data

First things first. Read the palmerpenguins penguins_raw.csv file into a dataframe.

Check what it looks like. Do the columns have sensible names? If not, change them.

Check to see if there are any NaN values. Hint: There is one column that might have a lot of NaNs. We don't necessarily want to get rid of rows that have NaNs only in that column. Look at your dataframe and check which column this is. Should we do anything about it?

solution

Penguins love histograms

It's time to make some more histograms!

Make a histogram showing the distribution of body mass for our penguins.

Your dataframe should have a column called Species. Use the unique() function on that column to get a list of the different species of penguins in the data. Create a new plot with histograms of body mass for each species of penguin. As always, give your plot sensible axis labels, units, a legend, and experiment to find a good bin size so that the histograms can be compared fairly.

Hint: unique() works similarly to the min() and max() functions we used in the previous section.

solution

What a big bill you have?

I would like to apologise in advance too all of the penguin enthusiats and/or NatSci students who most likely know a lot more about penguins and biology than I do...

After some googling, I have discovered that the "culmen" is the ridge on top of a penguin's bill. We shall now investigate whether there is a relation between the size of a penguin's bill and its flipper size. Perhaps there is a relation? Perhaps there is not. There's only one way to find out.

  • Plot culmen length and depth as a function of body mass for each species of penguin.
  • Plot flipper length as a function of body mass for each species of penguin.
  • Assume that a penguin's beak can be approximated as a cylinder of length equal to the culmen length and radius equal to half of the culmen depth. Add a new column to your dataframe corresponding to the volume of the penguins beak.
  • Plot beak volume as a function of body mass for each species. Define a function to relate the two and find the best fit relation. Do heavier penguins have larger beaks?

solution

In [ ]: