Pandas and Numpy in PYTHON

I am a versatile full-stack developer with expertise in both modern and traditional web technologies. My skill set encompasses the MERN (MongoDB, Express.js, React.js, Node.js) stack, enabling me to build scalable and efficient web applications with ease. Additionally, I have extensive experience in PHP, allowing me to tackle a wide range of projects and integrate legacy systems seamlessly. With a passion for problem-solving and a keen eye for detail, I strive to deliver high-quality solutions that exceed expectations. My dedication to staying updated with the latest industry trends and best practices ensures that my work is always cutting-edge and future-proof.
Mastering Pandas: The Ultimate Beginner’s Guide to Data Handling in Python
Data is everywhere — but raw data is messy, inconsistent, and rarely ready for analysis.
This is where Pandas comes to the rescue.
Pandas is a Python library for data manipulation and analysis, built on top of NumPy.
It gives you powerful tools to clean, explore, and transform datasets efficiently — whether they’re in CSV files, SQL tables, JSON APIs, or Excel sheets.
🔹 What is Pandas?
Pandas stands for “Python Data Analysis Library”.
It provides easy-to-use data structures — mainly Series and DataFrame — to work with structured data.
Think of a Series as a single column in Excel, and a DataFrame as a full spreadsheet with rows and columns.
1. Installation & Import
pip install pandas
import pandas as pd
2. Pandas Data Structures
🔹 Series
1-dimensional labeled array.
Can hold any data type.
s = pd.Series([10, 20, 30, 40])
print(s)

🔹 DataFrame
2-dimensional labeled data structure (like a spreadsheet or SQL table).
Can hold heterogeneous data types.
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

3. Creating DataFrames
- From dictionary
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
- From list of lists
data = [['Alice', 25], ['Bob', 30]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
- From CSV/Excel
df = pd.read_csv('data.csv')
df = pd.read_excel('data.xlsx')
4. Viewing & Inspecting Data
df.head() # First 5 rows
df.tail() # Last 5 rows
df.shape # returns a tuple like (number_of_rows, number_of_columns)
df.info() # Summary of dataframe
df.describe() # Statistical summary for numeric columns
df.columns # Column names
df.index # Row labels



5. Selecting Data
🔹 Selecting Columns
df['Name']
df[['Name', 'Age']]
🔹 Selecting Rows
When working with DataFrames, it’s common to select specific rows. Pandas provides two main ways: iloc and loc.
→ iloc – Index-based Selection
ilocstands for integer-location based selection.You use it when you want to select rows by their integer position (row number).
Syntax:
df.iloc[row_index]ordf.iloc[start:end]
→ loc – Label-based Selection
locstands for label-based selection.You use it when you want to select rows by their index label.
Syntax:
df.loc[row_label]ordf.loc[start_label:end_label]
df.iloc[0] # First row by index
df.loc[0] # First row by label
df.iloc[0:3] # First three rows
🔹 Conditional Selection
df[df['Age'] > 25]
df[(df['Age'] > 20) & (df['Name'] == 'Alice')]
6. Modifying Data
- Add new column
df['Salary'] = [50000, 60000]
- Modify existing column
df['Age'] = df['Age'] + 1
- Rename columns
df.rename(columns={'Age':'Years'}, inplace=True)
- Drop column/row
df.drop('Salary', axis=1, inplace=True) # Column
df.drop(0, axis=0, inplace=True) # Row
7. Handling Missing Data
- Check for missing values
df.isnull()
df.isnull().sum()
- Fill missing values
df.fillna(0, inplace=True)
df['Column'].fillna(df['Column'].mean(), inplace=True)
- Drop missing values
df.dropna(inplace=True)
8. Data Cleaning
- Remove duplicates:
df.drop_duplicates(inplace=True)
- Strip whitespace:
df['Name'] = df['Name'].str.strip()
- Change case:
df['Name'] = df['Name'].str.upper()
9. Sorting Data
- By column:
df.sort_values('Age', ascending=True, inplace=True)
- By index:
df.sort_index(inplace=True)
10. Aggregation & Grouping
- Basic statistics
df['Age'].sum()
df['Age'].mean()
df['Age'].max()
df['Age'].min()
df['Age'].std()
- Group by
df.groupby('Department')['Salary'].mean()
- Multiple aggregations
df.groupby('Department')['Salary'].agg(['mean', 'sum', 'max'])
11. Merging, Joining & Concatenation
- Concatenate
pd.concat([df1, df2], axis=0) # Stack rows
pd.concat([df1, df2], axis=1) # Stack columns
- Merge / Join
pd.merge(df1, df2, on='Key', how='inner') # inner, left, right, outer
12. Applying Functions
- Using
apply()
df['Age_plus_5'] = df['Age'].apply(lambda x: x + 5)
- Vectorized operations
df['Salary'] = df['Salary'] * 1.1
13. Working with Dates
df['JoinDate'] = pd.to_datetime(df['JoinDate'])
df['Year'] = df['JoinDate'].dt.year
df['Month'] = df['JoinDate'].dt.month
df['Day'] = df['JoinDate'].dt.day
14. Pivot Tables
df.pivot_table(values='Salary', index='Department', columns='Gender', aggfunc='mean')
15. Exporting Data
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)
16. Visualization with Pandas
df['Salary'].plot(kind='hist') # Histogram
df.plot(x='Age', y='Salary', kind='scatter') # Scatter plot
df['Department'].value_counts().plot(kind='bar') # Bar plot
17. Tips for Beginners
Start with small datasets to understand operations.
Use
head()andtail()often to inspect data.Chain operations carefully:
df.dropna().groupby('Dept')['Salary'].mean().Remember Pandas is built on NumPy, so vectorized operations are faster than loops.
✅ Conclusion
Pandas is an essential tool for data cleaning, transformation, and analysis in Python.
Once you master it, tasks that used to take hours in Excel or SQL can be done in a few lines of code.
# 🐼 PANDAS COMPLETE PRACTICE CODE FOR BEGINNERS
# ----------------------------------------------
# 1️⃣ Importing Pandas
import pandas as pd
# 2️⃣ Creating Series
s = pd.Series([10, 20, 30, 40])
print("Series:\n", s, "\n")
# Output:
# 0 10
# 1 20
# 2 30
# 3 40
# dtype: int64
# 3️⃣ Creating DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 28, 22],
'City': ['Delhi', 'Mumbai', 'Bangalore', 'Chennai']
}
df = pd.DataFrame(data)
print("DataFrame:\n", df, "\n")
# Output:
# Name Age City
# 0 Alice 25 Delhi
# 1 Bob 30 Mumbai
# 2 Charlie 28 Bangalore
# 3 David 22 Chennai
# 4️⃣ Exploring Data
print("Head:\n", df.head(), "\n")
print("Info:")
print(df.info())
print("\nDescribe:\n", df.describe(), "\n")
print("Columns:", df.columns.tolist(), "\n")
print("Shape:", df.shape, "\n")
# 5️⃣ Selecting Data
print("Select Column:\n", df['Name'], "\n")
# Output:
# 0 Alice
# 1 Bob
# 2 Charlie
# 3 David
print("Select Multiple Columns:\n", df[['Name', 'City']], "\n")
print("Select by Index (loc):\n", df.loc[0], "\n")
# Output:
# Name Alice
# Age 25
# City Delhi
# Name: 0, dtype: object
print("Select by Position (iloc):\n", df.iloc[1], "\n")
# Conditional selection
print("Age > 25:\n", df[df['Age'] > 25], "\n")
# 6️⃣ Add, Modify, Delete Columns
df['Country'] = 'India'
df['Age'] = df['Age'] + 1
df.drop('City', axis=1, inplace=True)
print("After Modifications:\n", df, "\n")
# Output:
# Name Age Country
# 0 Alice 26 India
# 1 Bob 31 India
# 2 Charlie 29 India
# 3 David 23 India
# 7️⃣ Handling Missing Values
df.loc[2, 'Age'] = None
print("With Missing Value:\n", df, "\n")
print("Check NaN:\n", df.isnull(), "\n")
print("Fill Missing:\n", df.fillna(0), "\n")
print("Drop Missing:\n", df.dropna(), "\n")
# 8️⃣ Aggregation and Statistics
print("Mean Age:", df['Age'].mean())
print("Sum Age:", df['Age'].sum(), "\n")
# Output:
# Mean Age: 26.6666666667
# Sum Age: 80.0
# GroupBy example
group_data = {
'City': ['Delhi', 'Delhi', 'Mumbai', 'Mumbai'],
'Sales': [200, 250, 300, 400]
}
sales_df = pd.DataFrame(group_data)
print("Group By City (Mean Sales):\n", sales_df.groupby('City')['Sales'].mean(), "\n")
# Output:
# City
# Delhi 225.0
# Mumbai 350.0
# Name: Sales, dtype: float64
# 9️⃣ Sorting & Filtering
print("Sorted by Sales Desc:\n", sales_df.sort_values('Sales', ascending=False), "\n")
print("Filter with condition (Sales > 250):\n", sales_df[sales_df['Sales'] > 250], "\n")
# 🔟 Merging & Joining DataFrames
df1 = pd.DataFrame({'id': [1, 2, 3], 'Name': ['A', 'B', 'C']})
df2 = pd.DataFrame({'id': [1, 2, 3], 'Salary': [50000, 60000, 55000]})
merged = pd.merge(df1, df2, on='id', how='inner')
print("Merged DataFrame:\n", merged, "\n")
# Output:
# id Name Salary
# 0 1 A 50000
# 1 2 B 60000
# 2 3 C 55000
# 1️⃣1️⃣ Concatenating
concat_df = pd.concat([df1, df2], axis=1)
print("Concatenated DataFrame:\n", concat_df, "\n")
# Output (side by side):
# id Name id Salary
# 0 1 A 1 50000
# 1 2 B 2 60000
# 2 3 C 3 55000
# 1️⃣2️⃣ Working with Dates
date_data = pd.DataFrame({
'Date': ['2024-01-01', '2024-06-15', '2024-10-05']
})
date_data['Date'] = pd.to_datetime(date_data['Date'])
date_data['Year'] = date_data['Date'].dt.year
date_data['Month'] = date_data['Date'].dt.month
print("Date Operations:\n", date_data, "\n")
# Output:
# Date Year Month
# 0 2024-01-01 2024 1
# 1 2024-06-15 2024 6
# 2 2024-10-05 2024 10
# 1️⃣3️⃣ Applying Functions
df = pd.DataFrame({'Age': [15, 22, 35, 45]})
df['AgeGroup'] = df['Age'].apply(lambda x: 'Adult' if x >= 18 else 'Minor')
print("Apply Function Example:\n", df, "\n")
# Output:
# Age AgeGroup
# 0 15 Minor
# 1 22 Adult
# 2 35 Adult
# 3 45 Adult
# 1️⃣4️⃣ Pivot Table
pivot_df = pd.DataFrame({
'Region': ['North', 'South', 'North', 'East'],
'Sales': [200, 300, 400, 250]
})
pivot = pd.pivot_table(pivot_df, values='Sales', index='Region', aggfunc='sum')
print("Pivot Table:\n", pivot, "\n")
# Output:
# Sales
# Region
# East 250
# North 600
# South 300
# 1️⃣5️⃣ Useful Functions
demo_df = pd.DataFrame({
'City': ['Delhi', 'Mumbai', 'Delhi', 'Bangalore'],
'Age': [25, 30, 25, 40]
})
print("Value Counts:\n", demo_df['City'].value_counts(), "\n")
print("Check Duplicates:\n", demo_df.duplicated(), "\n")
print("Drop Duplicates:\n", demo_df.drop_duplicates(), "\n")
print("Rename Column:\n", demo_df.rename(columns={'City': 'Location'}), "\n")
print("Random Sample:\n", demo_df.sample(n=2), "\n")
# 1️⃣6️⃣ Export Data
# demo_df.to_csv('final_output.csv', index=False)
# print("Data exported successfully!")
NumPy – Python for Data Science
Introduction
NumPy (Numerical Python) is a fundamental library in Python for scientific computing and data analysis.
It provides:
ndarray → N-dimensional array for storing numbers
Fast operations on arrays (vectorized computations)
Mathematical, statistical, and linear algebra functions
NumPy is the foundation for Pandas, SciPy, and Machine Learning libraries like Scikit-learn and TensorFlow.
1. Installation & Import
pip install numpy
import numpy as np
Output: Nothing, just imports the library.
2. NumPy Arrays
NumPy arrays are like Python lists but faster and support vectorized operations.
Create 1D Array
arr = np.array([1, 2, 3, 4])
print(arr)
Output:
[1 2 3 4]
Create 2D Array
arr2d = np.array([[1,2,3],[4,5,6]])
print(arr2d)
Output:
[[1 2 3]
[4 5 6]]
3. Array Attributes
print(arr.shape) # Shape of array
print(arr2d.shape)
print(arr.ndim) # Number of dimensions
print(arr2d.ndim)
print(arr.dtype) # Data type
Output:
(4,)
(2, 3)
1
2
int64
4. Creating Arrays with Built-in Functions
np.zeros(5) # Array of zeros
np.ones((2,3)) # Array of ones
np.arange(0,10,2) # Numbers from 0 to 10 with step 2
np.linspace(0,1,5) # 5 numbers evenly spaced between 0 and 1
np.eye(3) # Identity matrix
Output Examples:
np.zeros(5) → [0. 0. 0. 0. 0.]
np.ones((2,3)) →
[[1. 1. 1.]
[1. 1. 1.]]
np.arange(0,10,2) → [0 2 4 6 8]
np.linspace(0,1,5) → [0. 0.25 0.5 0.75 1. ]
np.eye(3) →
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
5. Indexing & Slicing
1D Array
arr = np.array([10,20,30,40,50])
print(arr[0]) # First element
print(arr[1:4]) # Slice
Output:
10
[20 30 40]
2D Array
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr2d[0,1]) # Row 0, Column 1
print(arr2d[1,:]) # Row 1, all columns
print(arr2d[:,2]) # All rows, column 2
Output:
2
[4 5 6]
[3 6 9]
6. Array Operations
NumPy supports element-wise operations:
a = np.array([1,2,3])
b = np.array([4,5,6])
print(a+b) # [5 7 9]
print(a-b) # [-3 -3 -3]
print(a*b) # [4 10 18]
print(a/b) # [0.25 0.4 0.5]
print(a**2) # [1 4 9]
7. Universal Functions (ufunc)
NumPy provides fast mathematical functions:
arr = np.array([1,4,9,16])
print(np.sqrt(arr)) # [1. 2. 3. 4.]
print(np.exp(arr)) # Exponentials
print(np.log(arr)) # Natural log
print(np.sin(arr)) # Trigonometric functions
8. Aggregation Functions
arr = np.array([1,2,3,4,5])
print(arr.sum()) # 15
print(arr.mean()) # 3.0
print(arr.std()) # 1.4142
print(arr.min()) # 1
print(arr.max()) # 5
print(arr.argmin()) # Index of min → 0
print(arr.argmax()) # Index of max → 4
9. Reshaping Arrays
arr = np.arange(1,13)
print(arr.reshape(3,4))
Output:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
- Flatten back:
arr.reshape(3,4).ravel()
10. Stacking Arrays
a = np.array([1,2,3])
b = np.array([4,5,6])
np.vstack((a,b)) # Vertical stack
np.hstack((a,b)) # Horizontal stack
Output:
vstack →
[[1 2 3]
[4 5 6]]
hstack → [1 2 3 4 5 6]
11. Boolean Indexing
arr = np.array([10,20,30,40,50])
print(arr[arr>25]) # [30 40 50]
Select elements based on conditions.
12. Copy vs View
arr = np.array([1,2,3,4])
arr_view = arr.view()
arr_copy = arr.copy()
arr_view[0] = 100
arr_copy[1] = 200
print(arr) # arr affected by view, not copy
print(arr_view)
print(arr_copy)
13. Random Numbers
np.random.seed(0)
print(np.random.randint(0,10,5)) # Random integers
print(np.random.rand(3,3)) # Uniform random floats
print(np.random.randn(3,3)) # Normal distribution
14. Linear Algebra
A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
print(np.dot(A,B)) # Matrix multiplication
print(np.linalg.inv(A)) # Inverse
print(np.linalg.det(A)) # Determinant
15. Tips for Beginners
NumPy arrays are faster than Python lists for numeric operations.
Always try vectorized operations instead of loops.
Use
reshape,ravel, andflattento adjust dimensions.Boolean indexing is very powerful for filtering data.
Conclusion
NumPy is the foundation of Python data science. Once you master it, you can perform fast numerical computations, array manipulations, and linear algebra operations with ease.



