Thread Rating:
pandas Cheat Sheet — Data Analysis for Science
#1
pandas Cheat Sheet — Data Analysis for Science

pandas is one of the most important tools in scientific coding. 
It lets you load, analyse, clean, filter, and visualise data quickly.

This sheet gives you everything you need to get started.

-----------------------------------------------------------------------

1. Importing pandas

Code:
import pandas as pd

-----------------------------------------------------------------------

2. Creating DataFrames

From a dictionary:

Code:
data = {
    "Time": [0, 1, 2, 3],
    "Speed": [0, 3, 7, 12]
}
df = pd.DataFrame(data)

From a CSV file:

Code:
df = pd.read_csv("data.csv")

From an Excel file:

Code:
df = pd.read_excel("file.xlsx")

-----------------------------------------------------------------------

3. Basic Exploration

Code:
df.head()          # first 5 rows
df.tail()          # last 5 rows
df.shape          # (rows, columns)
df.columns        # list of column names
df.info()          # data types & summary
df.describe()      # stats summary

-----------------------------------------------------------------------

4. Selecting Data

Single column: 
Code:
df["Speed"]

Multiple columns: 
Code:
df[["Time", "Speed"]]

Row by index: 
Code:
df.iloc[0]    # first row

Row by label: 
Code:
df.loc[2]      # row where index = 2

Slicing rows: 
Code:
df[0:3]

-----------------------------------------------------------------------

5. Filtering Data (Very Important)

Code:
df[df["Speed"] > 5] 
df[df["Time"] == 3]
df[(df["Time"] >= 2) & (df["Speed"] < 10)]

-----------------------------------------------------------------------

6. Sorting Data

Code:
df.sort_values("Speed")
df.sort_values("Time", ascending=False)

-----------------------------------------------------------------------

7. Adding & Modifying Columns

Add new column:

Code:
df["Acceleration"] = df["Speed"] / df["Time"]

(Will produce NaN when Time = 0 — normal.)

Apply a function:

Code:
df["Double"] = df["Speed"].apply(lambda x: x * 2)

-----------------------------------------------------------------------

8. Removing Data

Delete column: 
Code:
df.drop(columns=["Double"], inplace=True)

Drop rows with missing data: 
Code:
df.dropna()

Fill missing values: 
Code:
df.fillna(0)

-----------------------------------------------------------------------

9. Grouping & Aggregating (Statistics)

Group by a column:

Code:
df.groupby("Time")["Speed"].mean()

Multiple operations:

Code:
df.groupby("Time").agg({
    "Speed": ["mean", "max", "min"]
})

-----------------------------------------------------------------------

10. Merging & Joining DataFrames

Merge two tables by column:

Code:
merged = pd.merge(df1, df2, on="ID")

-----------------------------------------------------------------------

11. Plotting with pandas + matplotlib

Requires matplotlib installed.

Code:
import matplotlib.pyplot as plt

df.plot(x="Time", y="Speed")
plt.show()

Scatter example:

Code:
df.plot(kind="scatter", x="Time", y="Speed")
plt.show()

-----------------------------------------------------------------------

12. Exporting Data

Code:
df.to_csv("output.csv", index=False)
df.to_excel("output.xlsx", index=False)

-----------------------------------------------------------------------

13. Common Mistakes

❌ Using Python lists instead of DataFrames 
✔ pandas is built for scientific data 

❌ Forgetting df = pd.read_csv("file.csv") 
✔ must load data before analysis 

❌ Using wrong column names 
✔ check df.columns 

❌ Mixing .loc and .iloc 
✔ loc = labels 
✔ iloc = positions 

❌ Not resetting index after filtering 
✔ use df.reset_index(drop=True) if needed 

-----------------------------------------------------------------------

Summary

pandas gives you:
• DataFrames 
• filtering & sorting 
• statistics 
• grouping 
• importing & exporting 
• plotting 
• data cleaning 

It’s one of the most important tools for science, maths, and research.
Reply
« Next Oldest | Next Newest »


Forum Jump:


Users browsing this thread: 1 Guest(s)