Exercises

SPL Top 500 Data Exploration

These exercises explore checkout data from the Seattle Public Library for the Top 500 “Greatest” Novels — the novels most widely held in libraries according to OCLC. For more context about the dataset, see the data essay.

Concepts covered:

Groupby and aggregation (sum of checkouts)
Sorting and ranking (top N values)
Time series line plots (monthly checkouts over time)
String filtering (finding titles that contain a keyword)
Pivot tables
Correlation matrices and heatmaps — a correlation coefficient measures how closely two variables move together, ranging from -1 (perfect inverse relationship) to 1 (perfect positive relationship), with 0 meaning no relationship

Load the data

Code

import pandas as pd
import plotly.express as px

spl_df = pd.read_csv("https://responsible-datasets-in-context.s3.us-west-2.amazonaws.com/top_500_spl_df.csv")
spl_df.head()

Exercise 1

Find the top 10 authors and top 10 books by total checkouts in the SPL Top 500 dataset. Display them as styled tables.

Save the results as top_authors and top_books.

Code

# Your code here

Discuss/consider: Which authors and books are most popular at the Seattle Public Library? Are there any surprises?

Exercise 2

Create a time series line plot of monthly checkouts for “Pride and Prejudice” over time.

Filter the data for “Pride and Prejudice”, group by year and month, and plot the results.

Code

# Your code here

Discuss/consider: What patterns do you notice in the checkout trends? Are there any seasonal patterns or notable changes over time?

Exercise 3

Calculate the correlation between the monthly checkout patterns of Harry Potter books and display the results as a heatmap.

Filter for Harry Potter titles, pivot the data so each book is a column, compute the correlation matrix, and visualize it.

Code

# Your code here

Discuss/consider: Which Harry Potter books have the most correlated checkout patterns? What might explain these correlations?

SPL Top 500 Data Exploration (Exercise)

Other Formats

Exercises

SPL Top 500 Data Exploration

Load the data

Exercise 1

Exercise 2

Exercise 3