11-13-2025, 01:48 PM
pandas Cheat Sheet — Data Analysis for Science
pandas is one of the most important tools in scientific coding.
It lets you load, analyse, clean, filter, and visualise data quickly.
This sheet gives you everything you need to get started.
-----------------------------------------------------------------------
1. Importing pandas
-----------------------------------------------------------------------
2. Creating DataFrames
From a dictionary:
From a CSV file:
From an Excel file:
-----------------------------------------------------------------------
3. Basic Exploration
-----------------------------------------------------------------------
4. Selecting Data
Single column:
Multiple columns:
Row by index:
Row by label:
Slicing rows:
-----------------------------------------------------------------------
5. Filtering Data (Very Important)
-----------------------------------------------------------------------
6. Sorting Data
-----------------------------------------------------------------------
7. Adding & Modifying Columns
Add new column:
(Will produce NaN when Time = 0 — normal.)
Apply a function:
-----------------------------------------------------------------------
8. Removing Data
Delete column:
Drop rows with missing data:
Fill missing values:
-----------------------------------------------------------------------
9. Grouping & Aggregating (Statistics)
Group by a column:
Multiple operations:
-----------------------------------------------------------------------
10. Merging & Joining DataFrames
Merge two tables by column:
-----------------------------------------------------------------------
11. Plotting with pandas + matplotlib
Requires matplotlib installed.
Scatter example:
-----------------------------------------------------------------------
12. Exporting Data
-----------------------------------------------------------------------
13. Common Mistakes
❌ Using Python lists instead of DataFrames
✔ pandas is built for scientific data
❌ Forgetting df = pd.read_csv("file.csv")
✔ must load data before analysis
❌ Using wrong column names
✔ check df.columns
❌ Mixing .loc and .iloc
✔ loc = labels
✔ iloc = positions
❌ Not resetting index after filtering
✔ use df.reset_index(drop=True) if needed
-----------------------------------------------------------------------
Summary
pandas gives you:
• DataFrames
• filtering & sorting
• statistics
• grouping
• importing & exporting
• plotting
• data cleaning
It’s one of the most important tools for science, maths, and research.
pandas is one of the most important tools in scientific coding.
It lets you load, analyse, clean, filter, and visualise data quickly.
This sheet gives you everything you need to get started.
-----------------------------------------------------------------------
1. Importing pandas
Code:
import pandas as pd-----------------------------------------------------------------------
2. Creating DataFrames
From a dictionary:
Code:
data = {
"Time": [0, 1, 2, 3],
"Speed": [0, 3, 7, 12]
}
df = pd.DataFrame(data)From a CSV file:
Code:
df = pd.read_csv("data.csv")From an Excel file:
Code:
df = pd.read_excel("file.xlsx")-----------------------------------------------------------------------
3. Basic Exploration
Code:
df.head() # first 5 rows
df.tail() # last 5 rows
df.shape # (rows, columns)
df.columns # list of column names
df.info() # data types & summary
df.describe() # stats summary-----------------------------------------------------------------------
4. Selecting Data
Single column:
Code:
df["Speed"]Multiple columns:
Code:
df[["Time", "Speed"]]Row by index:
Code:
df.iloc[0] # first rowRow by label:
Code:
df.loc[2] # row where index = 2Slicing rows:
Code:
df[0:3]-----------------------------------------------------------------------
5. Filtering Data (Very Important)
Code:
df[df["Speed"] > 5]
df[df["Time"] == 3]
df[(df["Time"] >= 2) & (df["Speed"] < 10)]-----------------------------------------------------------------------
6. Sorting Data
Code:
df.sort_values("Speed")
df.sort_values("Time", ascending=False)-----------------------------------------------------------------------
7. Adding & Modifying Columns
Add new column:
Code:
df["Acceleration"] = df["Speed"] / df["Time"](Will produce NaN when Time = 0 — normal.)
Apply a function:
Code:
df["Double"] = df["Speed"].apply(lambda x: x * 2)-----------------------------------------------------------------------
8. Removing Data
Delete column:
Code:
df.drop(columns=["Double"], inplace=True)Drop rows with missing data:
Code:
df.dropna()Fill missing values:
Code:
df.fillna(0)-----------------------------------------------------------------------
9. Grouping & Aggregating (Statistics)
Group by a column:
Code:
df.groupby("Time")["Speed"].mean()Multiple operations:
Code:
df.groupby("Time").agg({
"Speed": ["mean", "max", "min"]
})-----------------------------------------------------------------------
10. Merging & Joining DataFrames
Merge two tables by column:
Code:
merged = pd.merge(df1, df2, on="ID")-----------------------------------------------------------------------
11. Plotting with pandas + matplotlib
Requires matplotlib installed.
Code:
import matplotlib.pyplot as plt
df.plot(x="Time", y="Speed")
plt.show()Scatter example:
Code:
df.plot(kind="scatter", x="Time", y="Speed")
plt.show()-----------------------------------------------------------------------
12. Exporting Data
Code:
df.to_csv("output.csv", index=False)
df.to_excel("output.xlsx", index=False)-----------------------------------------------------------------------
13. Common Mistakes
❌ Using Python lists instead of DataFrames
✔ pandas is built for scientific data
❌ Forgetting df = pd.read_csv("file.csv")
✔ must load data before analysis
❌ Using wrong column names
✔ check df.columns
❌ Mixing .loc and .iloc
✔ loc = labels
✔ iloc = positions
❌ Not resetting index after filtering
✔ use df.reset_index(drop=True) if needed
-----------------------------------------------------------------------
Summary
pandas gives you:
• DataFrames
• filtering & sorting
• statistics
• grouping
• importing & exporting
• plotting
• data cleaning
It’s one of the most important tools for science, maths, and research.
