The document outlines a laboratory exercise for a Data Science and Big Data Analytics course, focusing on data wrangling using the Iris dataset. It details steps including importing libraries, loading the dataset, preprocessing data, checking for missing values, and encoding categorical variables. The exercise emphasizes data manipulation and visualization techniques using Python's pandas, numpy, seaborn, and matplotlib libraries.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views4 pages
Data Wrangling 1
The document outlines a laboratory exercise for a Data Science and Big Data Analytics course, focusing on data wrangling using the Iris dataset. It details steps including importing libraries, loading the dataset, preprocessing data, checking for missing values, and encoding categorical variables. The exercise emphasizes data manipulation and visualization techniques using Python's pandas, numpy, seaborn, and matplotlib libraries.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4
Third Year Engineering (2019 Pattern)
Course Code: 310256
Course Name: Data Science and Big Data Analytics Laboratory Group A 1) Data Wrangling I import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt
# Step 2: Locate an open-source dataset
# I'll use the "Iris" dataset from UCI Machine Learning Repository url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
print("\nData Types Before Conversion:\n") print(df.dtypes)
# Convert categorical variable 'species' to categorical data type
df['species'] = df['species'].astype('category')
print("\nData Types After Conversion:\n")
print(df.dtypes)
# Step 6: Convert Categorical Variables into Quantitative Variables
print("\nEncoding Categorical Variable 'species':\n") df['species_encoded'] = df['species'].cat.codes print(df.head()) Explanation of Each Step: 1. Import Libraries: o pandas: Handles dataframes and data manipulation. o numpy: Supports numerical operations. o seaborn & matplotlib: Used for visualization. 2. Dataset Selection: o The Iris dataset is a well-known dataset for classification tasks. o It is sourced from the UCI Machine Learning Repository: Iris Dataset. 3. Loading the Dataset: o Read the dataset directly from the web into a pandas dataframe. o Assign column names based on dataset documentation. 4. Data Preprocessing: o Use .describe() to get summary statistics. o Check for missing values using .isnull().sum(). o Print dataset dimensions. 5. Data Formatting and Normalization: o Check data types using .dtypes. o Convert the categorical column species into a categorical data type. 6. Encoding Categorical Variables: o Convert the species categorical column into a numerical format using .cat.codes.