0% found this document useful (0 votes)
8 views

Practical 01 Dms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Practical 01 Dms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Dr.

Rafiq Zakaria Campus


P.G. Dept. of Computer Science
M.Sc. III Semester
Data mining and data warehousing
Practical 01

Aim: Data mining using python Libraries.

Method:

When it comes to data mining using the pandas library in Python, you're essentially engaging in
data extraction, transformation, and analysis to discover useful patterns and insights. Pandas
provides a rich set of tools for handling and analyzing data, making it an essential tool for data
mining tasks.

Here’s a structured approach to data mining.

1. Setup

First, ensure you have pandas installed. You may also want to use other libraries for specific
tasks like visualization (matplotlib, pandas) or numerical computations (numpy).

pip install pandas matplotlib numpy

2. Loading Data

You can load data from various sources such as CSV files, Excel files, or databases.

import pandas as pd

# Load data from a CSV file


df = pd.read_csv('data.csv')

# Load data from an Excel file


# df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# Load data from a SQL database


# from sqlalchemy import create_engine
# engine = create_engine('sqlite:///database.db')
# df = pd.read_sql('SELECT * FROM table_name', engine)

3. Exploring Data

Get a basic understanding of the dataset using various methods:


# Display the first few rows of the dataframe
print(df.head())

# Get a summary of the dataframe


print(df.info())

# Get descriptive statistics


print(df.describe())

4. Cleaning Data

Data cleaning involves handling missing values, correcting data types, and removing duplicates.

# Handle missing values


df.dropna() # Drop rows with missing values
df.fillna(value={'column_name': 0}) # Replace missing values with a
specified value

# Convert data types


df['date_column'] = pd.to_datetime(df['date_column'])

# Remove duplicates
df.drop_duplicates(inplace=True)

5. Transforming Data

Transform data to fit your needs, such as filtering rows, aggregating data, and creating new
features.

# Filtering data
filtered_df = df[df['column_name'] > 50]

# Aggregating data
grouped_df = df.groupby('category_column').agg({'numeric_column': 'sum'})

# Creating new features


df['new_column'] = df['existing_column'] * 2

Output:

data mining steps such as loading data, exploring data, transforming data are
performed using python libraries.

Prepared By: Khan Shagufta (PG Dept of Comp Sci)

You might also like