DataFrame Operations

Index Operations

Example:

df.set_index("name", inplace=True) print(df) df.reset_index(inplace=True) # drop=True print(df)

Example:

age_series = df["age"] # Select Age column age_loc = df.loc[:, "age"] # Select age column age_iloc = df.iloc[:, 2] # Select column at index 2 sample_df = df[["name", "age"]] # Select name and age columns sample_loc = df.loc[:, ["name", "age"]] # Select name and age columns sample_iloc = df.iloc[:, [0, 2]] # Select columns at index 0 and 2 age_loc_samp = df.loc[0:5, "age"] # First 5 rows of age column age_iloc_samp = df.iloc[0:5, 2] # First 5 rows of column at index 2 age_samp = df["age"][0:5] # First 5 rows of age column sample_data = df.loc[0:5, ["name", "age"]] # First 5 rows of name, age

Example:

age_filter = df["age"] > 30 filtered_df = df[age_filter] data_filter = (df["age"] > 30) & (df["weight"] < 70) filtered_df = df[data_filter] name_filter = df["name"].str.startswith("A") filtered_df = df[name_filter] country_filter = df["country"].isin(["USA", "Canada"]) filtered_df = df[~country_filter]

Example

df["course"] = "Python" df["age_plus_one"] = df["age"] + 1 df["bmi"] = df["weight"] / df["height"] ** 2 df["is_above_30"] = df["age"] > 30 random_array = np.random.rand(len(df)) df["random_value"] = random_array

Example

sorted_df = df.sort_values(by="age", ascending=False) sorted_df = df.sort_values(by=["age", "weight"], ascending=[True, False]) ranked_df = df.copy() ranked_df["age_rank"] = ranked_df["age"].rank(method="min")

Example

df_dropped = df.drop(columns=["age", "weight"]) df_dropped = df.drop(["age", "weight"], axis=1) df_dropped = df.drop([0, 1]) df_renamed = df.rename(columns={"age": "Age", "weight": "Weight"})

Example

df1 = pd.read_csv("2025-01-02.csv") df2 = pd.read_csv("2025-01-03.csv") df3 = pd.read_csv("2025-01-04.csv") combined_df = pd.concat([df1, df2, df3]) print(df1.shape) print(df2.shape) print(df3.shape) print(combined_df.shape)

pd.merge combines dataframes based on common columns. Like join.
Syntax: pd.merge(left_df, right_df, on="key_col", how="left")
how parameter values: left, right, inner, outer
left_on & right_on is used when dataframes have different column names.
Example: left_df = pd.DataFrame({"id": [1, 2], "name": ["Hari", "Bob"]}) right_df = pd.DataFrame({"id": [2, 3], "age": [25, 30]}) merged_df = pd.merge(left_df, right_df, on="id", how="left") print(merged_df)

Example

df["age"] = df["age"].astype(int) df["weight"] = df["weight"].astype(float) df["name"] = df["name"].astype(str) df["category_col"] = df["category_col"].astype("category") df["date_col"] = pd.to_datetime(df["date_col"])

For aggregating data by groups.
Syntax: df.groupby(["c1","c2"]).agg(c3=("c1","sum), c4=("c2","min")..)
Common functions: sum, mean, count, min, max, std, median, quantile, idxmin, idxmax, unique, nunique, cumsum, prod, mode, var etc.
Example: df["sales"].sum() df.groupby(["category", "product"], as_index=False).agg( total_sales=("sales", "sum") , avg_sales=("sales", "mean"))

Pandas provides powerful tools for working with time series data.
Index has to be set to datetime type for time series operations.
Sampling options: D (daily), W, MS, ME (month end), Q, YS, YE etc.

Example

df["date"] = pd.to_datetime(df["date"]) df.set_index("date", inplace=True) monthly_sales = df.resample("ME").agg({"sales": "sum"}) df["rolling_avg"] = df["sales"].rolling(window=7).mean()

Example:

df["name"] = df["name"].str.upper() df["name"] = df["name"].str.strip() df["name"] = df["name"].str.replace("old", "new") df["year"] = df["date"].dt.year df["day"] = df["date"].dt.day df["day_of_week"] = df["date"].dt.day_name()