What is Pandas?
Pandas is a Python library that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It's the go-to tool for practical, real-world data analysis in Python.
The Pandas Series
A Series is a one-dimensional, labeled array capable of holding any data type (integers, strings, floats, Python objects, etc.). Think of it as a single column in a spreadsheet. It has two main components:
- Data: The actual values.
- Index: A label for each data point. If you don't specify an index, Pandas creates a default integer index from 0 to N-1.
Python
import pandas as pd # It's a strong convention to import pandas as pd
# Creating a Series from a list
population = pd.Series([990_000, 850_000, 3_400_000], name='Population')
print("--- Default Index ---")
print(population)
# Creating a Series with a custom index
population_labeled = pd.Series(
    [990_000, 850_000, 3_400_000],
    index=['San Jose', 'San Francisco', 'Los Angeles'],
    name='Population'
)
print("\n--- Custom Index ---")
print(population_labeled)
# Accessing data via index label
print(f"\nPopulation of Los Angeles: {population_labeled['Los Angeles']}")
The Pandas DataFrame
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It's the most commonly used Pandas object. Think of it as a spreadsheet, an SQL table, or a dictionary of Series objects.
A DataFrame has both a row index and a column index.
Python
# Creating a DataFrame from a dictionary
data = {
    'City': ['San Jose', 'San Francisco', 'Los Angeles'],
    'Population': [990_000, 850_000, 3_400_000],
    'State': ['CA', 'CA', 'CA']
}
df = pd.DataFrame(data)
print(df)
Inspecting Your DataFrame
Once you've loaded your data, you'll want to inspect it. Here are some essential methods:
- df.head(): View the first 5 rows.
- df.tail(): View the last 5 rows.
- df.shape: Get the dimensions (rows, columns).
- df.info(): Get a concise summary, including data types and non-null counts.
- df.describe(): Get descriptive statistics for numerical columns (count, mean, std, etc.).
Python
# Using the DataFrame created above
print("\n--- DataFrame Info ---")
df.info()
print("\n--- Descriptive Stats ---")
print(df.describe())