0% found this document useful (0 votes)
3 views6 pages

Python Syntax and Functions for Data Mining

The document provides an overview of essential Python libraries for data analysis, including NumPy, Pandas, Matplotlib, and Seaborn, along with their basic functionalities. It also covers data preprocessing techniques, various machine learning algorithms such as linear regression, logistic regression, naive Bayes, decision trees, random forests, KNN, and K-means clustering, as well as methods for loading and displaying CSV and Excel files. Overall, it serves as a guide for performing data mining using popular Python libraries and algorithms.

Uploaded by

imtiaznafiz773
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views6 pages

Python Syntax and Functions for Data Mining

The document provides an overview of essential Python libraries for data analysis, including NumPy, Pandas, Matplotlib, and Seaborn, along with their basic functionalities. It also covers data preprocessing techniques, various machine learning algorithms such as linear regression, logistic regression, naive Bayes, decision trees, random forests, KNN, and K-means clustering, as well as methods for loading and displaying CSV and Excel files. Overall, it serves as a guide for performing data mining using popular Python libraries and algorithms.

Uploaded by

imtiaznafiz773
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1.

Python Libraries (NumPy, Pandas, Matplotlib, Seaborn):


NumPy:
import numpy as np
# Basic operations
array = np.array([1, 2, 3])
mean = np.mean(array)
std_dev = np.std(array)

Pandas:
import pandas as pd
# DataFrame operations
df = pd.read_csv('data.csv')
df.head()
df.describe()
df['column'].fillna(df['column'].mean(), inplace=True)

Matplotlib:
import matplotlib.pyplot as plt
# Basic plot
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Title')
plt.show()

Seaborn:
import seaborn as sns
# Creating visualizations
sns.scatterplot(x='x_column', y='y_column', data=df)
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')


2. Data Preprocessing & Feature Engineering:
Handling Missing Values:
df.fillna(method='ffill', inplace=True)
df.dropna(subset=['column'], inplace=True)

Encoding Categorical Data:


pd.get_dummies(df, columns=['category_column'])
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['encoded_col'] = le.fit_transform(df['category_col'])

Feature Scaling:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

3. Linear Regression:
Model Representation:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)

Making Predictions:
predictions = model.predict(X_test)

4. Logistic Regression:
Logistic Function:
import numpy as np
def logistic(x):
return 1 / (1 + np.exp(-x))

Learning the Model:


from sklearn.linear_model import LogisticRegression

log_model = LogisticRegression()
log_model.fit(X_train, y_train)

Prediction:
log_predictions = log_model.predict(X_test)

5. Naive Bayes:
Implementation:
from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_predictions = nb_model.predict(X_test)

6. Decision Tree & Random Forest:


Decision Tree:
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)

Random Forest:
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators=100)
rf_model.fit(X_train, y_train)
rf_predictions = rf_model.predict(X_test)

7. K-Nearest Neighbour (KNN):


Implementation:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
knn_predictions = knn.predict(X_test)

8. K-Means Clustering:
Clustering:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

9. Loading CSV Files with Pandas:

import pandas as pd

# Load CSV file into a DataFrame

df = pd.read_csv('data.csv')

# Display the first few rows of the DataFrame

print(df.head())

10. Loading Excel Files:

# Load Excel file into a DataFrame


df_excel = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# Display the first few rows

print(df_excel.head())

This covers the essential Python syntax for data mining using these popular algorithms
and libraries.

● To show and display data from a CSV file, you can use the pandas
library. Here is a step-by-step guide:

Step 1: Import the Pandas Library


import pandas as pd

Step 2: Load the CSV File into a DataFrame


# Load the CSV file
df = pd.read_csv('data.csv')

Step 3: Display the Data


Show the First Few Rows:
print(df.head()) # Displays the first 5 rows by default

To show a specific number of rows:


print(df.head(10)) # Displays the first 10 rows

Show the Last Few Rows:


print(df.tail()) # Displays the last 5 rows by default

Show the Entire DataFrame:


print(df)


○ Note: Displaying the entire DataFrame may not be practical for large
datasets. Use head() or tail() for better readability.

Additional Useful Functions:


Display Basic Information:
print(df.info()) # Shows a summary including data types and
non-null counts

View DataFrame Dimensions:


print(df.shape) # Prints the number of rows and columns (rows,
columns)

Display Column Names:


print(df.columns)

These commands will help you load and inspect your dataset quickly.

You might also like