Pandas and Numpy in PYTHON

Mastering Pandas: The Ultimate Beginner’s Guide to Data Handling in Python

Data is everywhere — but raw data is messy, inconsistent, and rarely ready for analysis.
This is where Pandas comes to the rescue.

Pandas is a Python library for data manipulation and analysis, built on top of NumPy.
It gives you powerful tools to clean, explore, and transform datasets efficiently — whether they’re in CSV files, SQL tables, JSON APIs, or Excel sheets.

🔹 What is Pandas?

Pandas stands for “Python Data Analysis Library”.
It provides easy-to-use data structures — mainly Series and DataFrame — to work with structured data.

Think of a Series as a single column in Excel, and a DataFrame as a full spreadsheet with rows and columns.

1. Installation & Import

pip install pandas

import pandas as pd

2. Pandas Data Structures

🔹 Series

1-dimensional labeled array.
Can hold any data type.

s = pd.Series([10, 20, 30, 40])
print(s)

🔹 DataFrame

2-dimensional labeled data structure (like a spreadsheet or SQL table).
Can hold heterogeneous data types.

data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

3. Creating DataFrames

From dictionary

data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)

From list of lists

data = [['Alice', 25], ['Bob', 30]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

From CSV/Excel

df = pd.read_csv('data.csv')
df = pd.read_excel('data.xlsx')

4. Viewing & Inspecting Data

df.head()       # First 5 rows
df.tail()       # Last 5 rows
df.shape        # returns a tuple like (number_of_rows, number_of_columns)
df.info()       # Summary of dataframe
df.describe()   # Statistical summary for numeric columns
df.columns      # Column names
df.index        # Row labels

5. Selecting Data

🔹 Selecting Columns

df['Name']
df[['Name', 'Age']]

🔹 Selecting Rows

When working with DataFrames, it’s common to select specific rows. Pandas provides two main ways: iloc and loc.

→ iloc – Index-based Selection

iloc stands for integer-location based selection.
You use it when you want to select rows by their integer position (row number).
Syntax: df.iloc[row_index] or df.iloc[start:end]

→ loc – Label-based Selection

loc stands for label-based selection.
You use it when you want to select rows by their index label.
Syntax: df.loc[row_label] or df.loc[start_label:end_label]

df.iloc[0]      # First row by index
df.loc[0]       # First row by label
df.iloc[0:3]    # First three rows

🔹 Conditional Selection

df[df['Age'] > 25]
df[(df['Age'] > 20) & (df['Name'] == 'Alice')]

6. Modifying Data

Add new column

df['Salary'] = [50000, 60000]

Modify existing column

df['Age'] = df['Age'] + 1

Rename columns

df.rename(columns={'Age':'Years'}, inplace=True)

Drop column/row

df.drop('Salary', axis=1, inplace=True)  # Column
df.drop(0, axis=0, inplace=True)         # Row

7. Handling Missing Data

Check for missing values

df.isnull()
df.isnull().sum()

Fill missing values

df.fillna(0, inplace=True)
df['Column'].fillna(df['Column'].mean(), inplace=True)

Drop missing values

df.dropna(inplace=True)

8. Data Cleaning

Remove duplicates:

df.drop_duplicates(inplace=True)

Strip whitespace:

df['Name'] = df['Name'].str.strip()

Change case:

df['Name'] = df['Name'].str.upper()

9. Sorting Data

By column:

df.sort_values('Age', ascending=True, inplace=True)

By index:

df.sort_index(inplace=True)

10. Aggregation & Grouping

Basic statistics

df['Age'].sum()
df['Age'].mean()
df['Age'].max()
df['Age'].min()
df['Age'].std()

Group by

df.groupby('Department')['Salary'].mean()

Multiple aggregations

df.groupby('Department')['Salary'].agg(['mean', 'sum', 'max'])

11. Merging, Joining & Concatenation

Concatenate

pd.concat([df1, df2], axis=0)  # Stack rows
pd.concat([df1, df2], axis=1)  # Stack columns

Merge / Join

pd.merge(df1, df2, on='Key', how='inner')  # inner, left, right, outer

12. Applying Functions

Using apply()

df['Age_plus_5'] = df['Age'].apply(lambda x: x + 5)

Vectorized operations

df['Salary'] = df['Salary'] * 1.1

13. Working with Dates

df['JoinDate'] = pd.to_datetime(df['JoinDate'])
df['Year'] = df['JoinDate'].dt.year
df['Month'] = df['JoinDate'].dt.month
df['Day'] = df['JoinDate'].dt.day

14. Pivot Tables

df.pivot_table(values='Salary', index='Department', columns='Gender', aggfunc='mean')

15. Exporting Data

df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)

16. Visualization with Pandas

df['Salary'].plot(kind='hist')        # Histogram
df.plot(x='Age', y='Salary', kind='scatter')  # Scatter plot
df['Department'].value_counts().plot(kind='bar')  # Bar plot

17. Tips for Beginners

Start with small datasets to understand operations.
Use head() and tail() often to inspect data.
Chain operations carefully: df.dropna().groupby('Dept')['Salary'].mean().
Remember Pandas is built on NumPy, so vectorized operations are faster than loops.

✅ Conclusion

Pandas is an essential tool for data cleaning, transformation, and analysis in Python.
Once you master it, tasks that used to take hours in Excel or SQL can be done in a few lines of code.

# 🐼 PANDAS COMPLETE PRACTICE CODE FOR BEGINNERS
# ----------------------------------------------

# 1️⃣ Importing Pandas
import pandas as pd

# 2️⃣ Creating Series
s = pd.Series([10, 20, 30, 40])
print("Series:\n", s, "\n")
# Output:
# 0    10
# 1    20
# 2    30
# 3    40
# dtype: int64

# 3️⃣ Creating DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 28, 22],
    'City': ['Delhi', 'Mumbai', 'Bangalore', 'Chennai']
}
df = pd.DataFrame(data)
print("DataFrame:\n", df, "\n")
# Output:
#      Name  Age       City
# 0   Alice   25      Delhi
# 1     Bob   30     Mumbai
# 2  Charlie   28  Bangalore
# 3   David   22    Chennai

# 4️⃣ Exploring Data
print("Head:\n", df.head(), "\n")
print("Info:")
print(df.info())
print("\nDescribe:\n", df.describe(), "\n")
print("Columns:", df.columns.tolist(), "\n")
print("Shape:", df.shape, "\n")

# 5️⃣ Selecting Data
print("Select Column:\n", df['Name'], "\n")
# Output:
# 0     Alice
# 1       Bob
# 2    Charlie
# 3      David

print("Select Multiple Columns:\n", df[['Name', 'City']], "\n")

print("Select by Index (loc):\n", df.loc[0], "\n")
# Output:
# Name    Alice
# Age        25
# City     Delhi
# Name: 0, dtype: object

print("Select by Position (iloc):\n", df.iloc[1], "\n")

# Conditional selection
print("Age > 25:\n", df[df['Age'] > 25], "\n")

# 6️⃣ Add, Modify, Delete Columns
df['Country'] = 'India'
df['Age'] = df['Age'] + 1
df.drop('City', axis=1, inplace=True)
print("After Modifications:\n", df, "\n")
# Output:
#      Name  Age Country
# 0   Alice   26   India
# 1     Bob   31   India
# 2  Charlie   29   India
# 3   David   23   India

# 7️⃣ Handling Missing Values
df.loc[2, 'Age'] = None
print("With Missing Value:\n", df, "\n")
print("Check NaN:\n", df.isnull(), "\n")
print("Fill Missing:\n", df.fillna(0), "\n")
print("Drop Missing:\n", df.dropna(), "\n")

# 8️⃣ Aggregation and Statistics
print("Mean Age:", df['Age'].mean())
print("Sum Age:", df['Age'].sum(), "\n")
# Output:
# Mean Age: 26.6666666667
# Sum Age: 80.0

# GroupBy example
group_data = {
    'City': ['Delhi', 'Delhi', 'Mumbai', 'Mumbai'],
    'Sales': [200, 250, 300, 400]
}
sales_df = pd.DataFrame(group_data)
print("Group By City (Mean Sales):\n", sales_df.groupby('City')['Sales'].mean(), "\n")
# Output:
# City
# Delhi     225.0
# Mumbai    350.0
# Name: Sales, dtype: float64

# 9️⃣ Sorting & Filtering
print("Sorted by Sales Desc:\n", sales_df.sort_values('Sales', ascending=False), "\n")
print("Filter with condition (Sales > 250):\n", sales_df[sales_df['Sales'] > 250], "\n")

# 🔟 Merging & Joining DataFrames
df1 = pd.DataFrame({'id': [1, 2, 3], 'Name': ['A', 'B', 'C']})
df2 = pd.DataFrame({'id': [1, 2, 3], 'Salary': [50000, 60000, 55000]})
merged = pd.merge(df1, df2, on='id', how='inner')
print("Merged DataFrame:\n", merged, "\n")
# Output:
#    id Name  Salary
# 0   1    A   50000
# 1   2    B   60000
# 2   3    C   55000

# 1️⃣1️⃣ Concatenating
concat_df = pd.concat([df1, df2], axis=1)
print("Concatenated DataFrame:\n", concat_df, "\n")
# Output (side by side):
#    id Name  id  Salary
# 0   1    A   1   50000
# 1   2    B   2   60000
# 2   3    C   3   55000

# 1️⃣2️⃣ Working with Dates
date_data = pd.DataFrame({
    'Date': ['2024-01-01', '2024-06-15', '2024-10-05']
})
date_data['Date'] = pd.to_datetime(date_data['Date'])
date_data['Year'] = date_data['Date'].dt.year
date_data['Month'] = date_data['Date'].dt.month
print("Date Operations:\n", date_data, "\n")
# Output:
#         Date  Year  Month
# 0 2024-01-01  2024      1
# 1 2024-06-15  2024      6
# 2 2024-10-05  2024     10

# 1️⃣3️⃣ Applying Functions
df = pd.DataFrame({'Age': [15, 22, 35, 45]})
df['AgeGroup'] = df['Age'].apply(lambda x: 'Adult' if x >= 18 else 'Minor')
print("Apply Function Example:\n", df, "\n")
# Output:
#    Age AgeGroup
# 0   15    Minor
# 1   22    Adult
# 2   35    Adult
# 3   45    Adult

# 1️⃣4️⃣ Pivot Table
pivot_df = pd.DataFrame({
    'Region': ['North', 'South', 'North', 'East'],
    'Sales': [200, 300, 400, 250]
})
pivot = pd.pivot_table(pivot_df, values='Sales', index='Region', aggfunc='sum')
print("Pivot Table:\n", pivot, "\n")
# Output:
#         Sales
# Region       
# East      250
# North     600
# South     300

# 1️⃣5️⃣ Useful Functions
demo_df = pd.DataFrame({
    'City': ['Delhi', 'Mumbai', 'Delhi', 'Bangalore'],
    'Age': [25, 30, 25, 40]
})
print("Value Counts:\n", demo_df['City'].value_counts(), "\n")
print("Check Duplicates:\n", demo_df.duplicated(), "\n")
print("Drop Duplicates:\n", demo_df.drop_duplicates(), "\n")
print("Rename Column:\n", demo_df.rename(columns={'City': 'Location'}), "\n")
print("Random Sample:\n", demo_df.sample(n=2), "\n")

# 1️⃣6️⃣ Export Data
# demo_df.to_csv('final_output.csv', index=False)
# print("Data exported successfully!")

NumPy – Python for Data Science

Introduction

NumPy (Numerical Python) is a fundamental library in Python for scientific computing and data analysis.

It provides:

ndarray → N-dimensional array for storing numbers
Fast operations on arrays (vectorized computations)
Mathematical, statistical, and linear algebra functions

NumPy is the foundation for Pandas, SciPy, and Machine Learning libraries like Scikit-learn and TensorFlow.

1. Installation & Import

pip install numpy

import numpy as np

Output: Nothing, just imports the library.

2. NumPy Arrays

NumPy arrays are like Python lists but faster and support vectorized operations.

Create 1D Array

arr = np.array([1, 2, 3, 4])
print(arr)

Output:

[1 2 3 4]

Create 2D Array

arr2d = np.array([[1,2,3],[4,5,6]])
print(arr2d)

Output:

[[1 2 3]
 [4 5 6]]

3. Array Attributes

print(arr.shape)    # Shape of array
print(arr2d.shape)
print(arr.ndim)     # Number of dimensions
print(arr2d.ndim)
print(arr.dtype)    # Data type

Output:

(4,)
(2, 3)
1
2
int64

4. Creating Arrays with Built-in Functions

np.zeros(5)          # Array of zeros
np.ones((2,3))       # Array of ones
np.arange(0,10,2)    # Numbers from 0 to 10 with step 2
np.linspace(0,1,5)   # 5 numbers evenly spaced between 0 and 1
np.eye(3)            # Identity matrix

Output Examples:

np.zeros(5) → [0. 0. 0. 0. 0.]
np.ones((2,3)) →
[[1. 1. 1.]
 [1. 1. 1.]]
np.arange(0,10,2) → [0 2 4 6 8]
np.linspace(0,1,5) → [0.   0.25 0.5  0.75 1.  ]
np.eye(3) →
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

5. Indexing & Slicing

1D Array

arr = np.array([10,20,30,40,50])
print(arr[0])       # First element
print(arr[1:4])     # Slice

Output:

10
[20 30 40]

2D Array

arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr2d[0,1])   # Row 0, Column 1
print(arr2d[1,:])   # Row 1, all columns
print(arr2d[:,2])   # All rows, column 2

Output:

2
[4 5 6]
[3 6 9]

6. Array Operations

NumPy supports element-wise operations:

a = np.array([1,2,3])
b = np.array([4,5,6])

print(a+b)   # [5 7 9]
print(a-b)   # [-3 -3 -3]
print(a*b)   # [4 10 18]
print(a/b)   # [0.25 0.4 0.5]
print(a**2)  # [1 4 9]

7. Universal Functions (ufunc)

NumPy provides fast mathematical functions:

arr = np.array([1,4,9,16])

print(np.sqrt(arr))    # [1. 2. 3. 4.]
print(np.exp(arr))     # Exponentials
print(np.log(arr))     # Natural log
print(np.sin(arr))     # Trigonometric functions

8. Aggregation Functions

arr = np.array([1,2,3,4,5])
print(arr.sum())       # 15
print(arr.mean())      # 3.0
print(arr.std())       # 1.4142
print(arr.min())       # 1
print(arr.max())       # 5
print(arr.argmin())    # Index of min → 0
print(arr.argmax())    # Index of max → 4

9. Reshaping Arrays

arr = np.arange(1,13)
print(arr.reshape(3,4))

Output:

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Flatten back: arr.reshape(3,4).ravel()

10. Stacking Arrays

a = np.array([1,2,3])
b = np.array([4,5,6])

np.vstack((a,b))  # Vertical stack
np.hstack((a,b))  # Horizontal stack

Output:

vstack →
[[1 2 3]
 [4 5 6]]

hstack → [1 2 3 4 5 6]

11. Boolean Indexing

arr = np.array([10,20,30,40,50])
print(arr[arr>25])   # [30 40 50]

Select elements based on conditions.

12. Copy vs View

arr = np.array([1,2,3,4])
arr_view = arr.view()
arr_copy = arr.copy()

arr_view[0] = 100
arr_copy[1] = 200

print(arr)       # arr affected by view, not copy
print(arr_view)
print(arr_copy)

13. Random Numbers

np.random.seed(0)
print(np.random.randint(0,10,5))      # Random integers
print(np.random.rand(3,3))            # Uniform random floats
print(np.random.randn(3,3))           # Normal distribution

14. Linear Algebra

A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])

print(np.dot(A,B))      # Matrix multiplication
print(np.linalg.inv(A)) # Inverse
print(np.linalg.det(A)) # Determinant

15. Tips for Beginners

NumPy arrays are faster than Python lists for numeric operations.
Always try vectorized operations instead of loops.
Use reshape, ravel, and flatten to adjust dimensions.
Boolean indexing is very powerful for filtering data.

Conclusion

NumPy is the foundation of Python data science. Once you master it, you can perform fast numerical computations, array manipulations, and linear algebra operations with ease.

Command Palette

Mastering Pandas: The Ultimate Beginner’s Guide to Data Handling in Python

🔹 What is Pandas?

1. Installation & Import

2. Pandas Data Structures

🔹 Series

🔹 DataFrame

3. Creating DataFrames

4. Viewing & Inspecting Data

5. Selecting Data

🔹 Selecting Columns

🔹 Selecting Rows

🔹 Conditional Selection

6. Modifying Data

7. Handling Missing Data

8. Data Cleaning

9. Sorting Data

10. Aggregation & Grouping

11. Merging, Joining & Concatenation

12. Applying Functions

13. Working with Dates

14. Pivot Tables

15. Exporting Data

16. Visualization with Pandas

17. Tips for Beginners

NumPy – Python for Data Science

Introduction

1. Installation & Import

2. NumPy Arrays

Create 1D Array

Create 2D Array

3. Array Attributes

4. Creating Arrays with Built-in Functions

5. Indexing & Slicing

1D Array

2D Array

6. Array Operations

7. Universal Functions (ufunc)

8. Aggregation Functions

9. Reshaping Arrays

10. Stacking Arrays

11. Boolean Indexing

12. Copy vs View

13. Random Numbers

14. Linear Algebra

15. Tips for Beginners

Comments (1)

More from this blog