Login

***Leejohnston*** · 11-13-2025, 01:48 PM

pandas Cheat Sheet — Data Analysis for Science

pandas is one of the most important tools in scientific coding.
It lets you load, analyse, clean, filter, and visualise data quickly.

This sheet gives you everything you need to get started.

-----------------------------------------------------------------------

1. Importing pandas

Code:
import pandas as pd

-----------------------------------------------------------------------

2. Creating DataFrames

From a dictionary:

Code:
data = {

    "Time": [0, 1, 2, 3],

    "Speed": [0, 3, 7, 12]

}

df = pd.DataFrame(data)

From a CSV file:

Code:
df = pd.read_csv("data.csv")

From an Excel file:

Code:
df = pd.read_excel("file.xlsx")

-----------------------------------------------------------------------

3. Basic Exploration

Code:
df.head()          # first 5 rows

df.tail()          # last 5 rows

df.shape          # (rows, columns)

df.columns        # list of column names

df.info()          # data types & summary

df.describe()      # stats summary

-----------------------------------------------------------------------

4. Selecting Data

Single column:

Code:
df["Speed"]

Multiple columns:

Code:
df[["Time", "Speed"]]

Row by index:

Code:
df.iloc[0]    # first row

Row by label:

Code:
df.loc[2]      # row where index = 2

Slicing rows:

Code:
df[0:3]

-----------------------------------------------------------------------

5. Filtering Data (Very Important)

Code:
df[df["Speed"] > 5]  

df[df["Time"] == 3]

df[(df["Time"] >= 2) & (df["Speed"] < 10)]

-----------------------------------------------------------------------

6. Sorting Data

Code:
df.sort_values("Speed")

df.sort_values("Time", ascending=False)

-----------------------------------------------------------------------

7. Adding & Modifying Columns

Add new column:

Code:
df["Acceleration"] = df["Speed"] / df["Time"]

(Will produce NaN when Time = 0 — normal.)

Apply a function:

Code:
df["Double"] = df["Speed"].apply(lambda x: x * 2)

-----------------------------------------------------------------------

8. Removing Data

Delete column:

Code:
df.drop(columns=["Double"], inplace=True)

Drop rows with missing data:

Code:
df.dropna()

Fill missing values:

Code:
df.fillna(0)

-----------------------------------------------------------------------

9. Grouping & Aggregating (Statistics)

Group by a column:

Code:
df.groupby("Time")["Speed"].mean()

Multiple operations:

Code:
df.groupby("Time").agg({

    "Speed": ["mean", "max", "min"]

})

-----------------------------------------------------------------------

10. Merging & Joining DataFrames

Merge two tables by column:

Code:
merged = pd.merge(df1, df2, on="ID")

-----------------------------------------------------------------------

11. Plotting with pandas + matplotlib

Requires matplotlib installed.

Code:
import matplotlib.pyplot as plt

df.plot(x="Time", y="Speed")

plt.show()

Scatter example:

Code:
df.plot(kind="scatter", x="Time", y="Speed")

plt.show()

-----------------------------------------------------------------------

12. Exporting Data

Code:
df.to_csv("output.csv", index=False)

df.to_excel("output.xlsx", index=False)

-----------------------------------------------------------------------

13. Common Mistakes

❌ Using Python lists instead of DataFrames
✔ pandas is built for scientific data

❌ Forgetting df = pd.read_csv("file.csv")
✔ must load data before analysis

❌ Using wrong column names
✔ check df.columns

❌ Mixing .loc and .iloc
✔ loc = labels
✔ iloc = positions

❌ Not resetting index after filtering
✔ use df.reset_index(drop=True) if needed

-----------------------------------------------------------------------

Summary

pandas gives you:
• DataFrames
• filtering & sorting
• statistics
• grouping
• importing & exporting
• plotting
• data cleaning

It’s one of the most important tools for science, maths, and research.

Login
Username:
Password:	Lost Password?
	Remember me