Code
import pandas as pd
import plotly.express as px
spl_df = pd.read_csv("https://responsible-datasets-in-context.s3.us-west-2.amazonaws.com/top_500_spl_df.csv")
spl_df.head()February 25, 2026
These exercises explore checkout data from the Seattle Public Library for the Top 500 “Greatest” Novels — the novels most widely held in libraries according to OCLC. For more context about the dataset, see the data essay.
Concepts covered:
Find the top 10 authors and top 10 books by total checkouts in the SPL Top 500 dataset. Display them as styled tables.
Save the results as top_authors and top_books.
Discuss/consider: Which authors and books are most popular at the Seattle Public Library? Are there any surprises?
Create a time series line plot of monthly checkouts for “Pride and Prejudice” over time.
Filter the data for “Pride and Prejudice”, group by year and month, and plot the results.
Discuss/consider: What patterns do you notice in the checkout trends? Are there any seasonal patterns or notable changes over time?
Calculate the correlation between the monthly checkout patterns of Harry Potter books and display the results as a heatmap.
Filter for Harry Potter titles, pivot the data so each book is a column, compute the correlation matrix, and visualize it.
Discuss/consider: Which Harry Potter books have the most correlated checkout patterns? What might explain these correlations?