Once you have a DataFrame, the next step is to select the specific rows and columns you need for your analysis. Pandas provides several ways to do this, but the two most important are .loc and .iloc.

.loc[]: Selection by Label

Use .loc[] for label-based indexing. This means you refer to rows and columns by their names (i.e., their index labels and column names).

Syntax: df.loc[row_labels, column_labels]

Python


import pandas as pd

data = {'Math': [85, 90, 78], 'Science': [92, 88, 94], 'History': [80, 85, 82]}
index = ['Alice', 'Bob', 'Charlie']
df = pd.DataFrame(data, index=index)
print("Original DataFrame:")
print(df)

# Select a single row (returns a Series)
alice_scores = df.loc['Alice']
print("\nAlice's scores:")
print(alice_scores)

# Select multiple rows and a single column
bob_charlie_math = df.loc[['Bob', 'Charlie'], 'Math']
print("\nBob and Charlie's Math scores:")
print(bob_charlie_math)

# Select a range of rows and columns (slicing)
# Note: When slicing with labels, the endpoint is INCLUDED.
all_scores_sci_hist = df.loc['Alice':'Charlie', 'Science':'History']
print("\nAll scores for Science and History:")
print(all_scores_sci_hist)

.iloc[]: Selection by Position

Use .iloc[] for integer position-based indexing. This means you refer to rows and columns by their integer position, starting from 0.

Syntax: df.iloc[row_positions, column_positions]

Python


# Using the same DataFrame as above

# Select the first row (index position 0)
first_row = df.iloc[0]
print("\nFirst row (Alice):")
print(first_row)

# Select the last row (index position -1)
last_row = df.iloc[-1]
print("\nLast row (Charlie):")
print(last_row)

# Select rows 0 and 2, and columns 0 and 2
subset = df.iloc[[0, 2], [0, 2]]
print("\nAlice & Charlie's Math & History scores:")
print(subset)

# Select a range of rows and columns
# Note: When slicing with integers, the endpoint is EXCLUDED.
first_two_rows_cols = df.iloc[0:2, 0:2]
print("\nFirst two rows and columns:")
print(first_two_rows_cols)

Boolean Masking: Conditional Selection

This is one of the most powerful features of Pandas. You can filter your data based on a condition, creating a "mask" of True/False values.

  1. Create the condition: This results in a Series of booleans.
  2. Apply the mask: Pass this Series into the DataFrame using [] or .loc[].

Python


# Find all students who scored above 80 in Science
science_mask = df['Science'] > 80
print("\nBoolean mask for Science > 80:")
print(science_mask)

# Apply the mask to the DataFrame
high_science_scorers = df[science_mask]
print("\nStudents who scored > 80 in Science:")
print(high_science_scorers)

# You can combine multiple conditions with & (and) and | (or).
# Remember to wrap each condition in parentheses!
high_math_and_history = df[(df['Math'] > 80) & (df['History'] > 80)]
print("\nStudents who scored > 80 in both Math and History:")
print(high_math_and_history)