Week 3
Week 3
import pandas as pd
# Load dataset
df = pd.read_csv('data.csv') # Read a CSV file into a DataFrame
# Sample data
x = np.linspace(0, 10, 100) # 100 points between 0 and 10
y = np.sin(x)
# Line plot
plt.plot(x, y, label='Sine wave', color='b')
plt.title('Sine Wave Example')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
# Scatter plot
x2 = np.random.rand(50)
y2 = np.random.rand(50)
plt.scatter(x2, y2, color='r', alpha=0.7)
plt.title('Random Scatter Plot')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
# Histogram
data = np.random.randn(1000)
plt.hist(data, bins=30, color='green', edgecolor='black')
plt.title('Histogram of Random Data')
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report
# Make predictions
y_pred = model.predict(X_test)
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)
# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))
Conclusion
Pandas is essential for data manipulation, cleaning, and
preprocessing. It allows you to efficiently handle large datasets
and perform operations like filtering, grouping, and merging
data.
Matplotlib is crucial for data visualization, making it easy to
plot data distributions, trends, and performance metrics, which
is important for both exploratory data analysis and model
evaluation.