1.
Python Libraries (NumPy, Pandas, Matplotlib, Seaborn):
NumPy:
import numpy as np
# Basic operations
array = np.array([1, 2, 3])
mean = np.mean(array)
std_dev = np.std(array)
Pandas:
import pandas as pd
# DataFrame operations
df = pd.read_csv('data.csv')
df.head()
df.describe()
df['column'].fillna(df['column'].mean(), inplace=True)
Matplotlib:
import matplotlib.pyplot as plt
# Basic plot
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Title')
plt.show()
Seaborn:
import seaborn as sns
# Creating visualizations
sns.scatterplot(x='x_column', y='y_column', data=df)
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
●
2. Data Preprocessing & Feature Engineering:
Handling Missing Values:
df.fillna(method='ffill', inplace=True)
df.dropna(subset=['column'], inplace=True)
Encoding Categorical Data:
pd.get_dummies(df, columns=['category_column'])
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['encoded_col'] = le.fit_transform(df['category_col'])
Feature Scaling:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
3. Linear Regression:
Model Representation:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
Making Predictions:
predictions = model.predict(X_test)
4. Logistic Regression:
Logistic Function:
import numpy as np
def logistic(x):
return 1 / (1 + np.exp(-x))
Learning the Model:
from sklearn.linear_model import LogisticRegression
log_model = LogisticRegression()
log_model.fit(X_train, y_train)
Prediction:
log_predictions = log_model.predict(X_test)
5. Naive Bayes:
Implementation:
from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_predictions = nb_model.predict(X_test)
6. Decision Tree & Random Forest:
Decision Tree:
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)
Random Forest:
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators=100)
rf_model.fit(X_train, y_train)
rf_predictions = rf_model.predict(X_test)
7. K-Nearest Neighbour (KNN):
Implementation:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
knn_predictions = knn.predict(X_test)
8. K-Means Clustering:
Clustering:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
9. Loading CSV Files with Pandas:
import pandas as pd
# Load CSV file into a DataFrame
df = pd.read_csv('data.csv')
# Display the first few rows of the DataFrame
print(df.head())
10. Loading Excel Files:
# Load Excel file into a DataFrame
df_excel = pd.read_excel('data.xlsx', sheet_name='Sheet1')
# Display the first few rows
print(df_excel.head())
This covers the essential Python syntax for data mining using these popular algorithms
and libraries.
● To show and display data from a CSV file, you can use the pandas
library. Here is a step-by-step guide:
Step 1: Import the Pandas Library
import pandas as pd
Step 2: Load the CSV File into a DataFrame
# Load the CSV file
df = pd.read_csv('data.csv')
Step 3: Display the Data
Show the First Few Rows:
print(df.head()) # Displays the first 5 rows by default
To show a specific number of rows:
print(df.head(10)) # Displays the first 10 rows
Show the Last Few Rows:
print(df.tail()) # Displays the last 5 rows by default
Show the Entire DataFrame:
print(df)
●
○ Note: Displaying the entire DataFrame may not be practical for large
datasets. Use head() or tail() for better readability.
Additional Useful Functions:
Display Basic Information:
print(df.info()) # Shows a summary including data types and
non-null counts
View DataFrame Dimensions:
print(df.shape) # Prints the number of rows and columns (rows,
columns)
Display Column Names:
print(df.columns)
These commands will help you load and inspect your dataset quickly.