Python for Data Science (NumPy & Pandas)
Introduction
Python is widely used in data science due to its powerful libraries like NumPy and Pandas.
These libraries facilitate numerical computations and data manipulation efficiently.
NumPy: Numerical Computing in Python
NumPy (Numerical Python) provides support for multi-dimensional arrays and mathematical
functions.
Importing NumPy
import numpy as np
Creating Arrays
arr = np.array([1, 2, 3, 4, 5])
print(arr) # Output: [1 2 3 4 5]
Array Operations
arr2 = np.array([10, 20, 30, 40, 50])
print(arr + arr2) # Element-wise addition
print(arr * 2) # Scalar multiplication
Pandas: Data Manipulation in Python
Pandas is used for handling structured data efficiently.
Importing Pandas
import pandas as pd
Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print(df)
Reading and Writing Data
df.to_csv("data.csv", index=False) # Save DataFrame to CSV
new_df = pd.read_csv("data.csv") # Read CSV file
DataFrame Operations
print(df.describe()) # Summary statistics
print(df['Age'].mean()) # Average age
Conclusion
NumPy and Pandas are essential for data science. Next, we will explore Machine Learning with
Scikit-Learn.