Data Cleaning and Preparation
Data Cleaning and Preparation
Preparation
using Python
swatikulkarni24/
Data cleaning and preparation are crucial
steps in the data analysis process. They
involve transforming raw data into a clean,
structured format that is suitable for
analysis.
swatikulkarni24/
2. Loading Data:
Load your data into a pandas DataFrame,
which provides a powerful data structure
for working with structured data.
df = pd.read_csv('data.csv')
Replace 'data.csv' with your file path or URL
3. Removing Duplicates:
Duplicates can skew analysis results, so it's
important to identify and remove them if
necessary.
df.drop_duplicates(inplace=True)
swatikulkarni24/
4. Handling Missing Values:
Missing values are common in datasets and
can cause issues during analysis. You can
handle missing values in various ways, such
as dropping rows or columns with missing
values, imputing missing values with mean
or median, or using more sophisticated
techniques.
Drop rows with missing values
df.dropna(inplace=True)
swatikulkarni24/
6. Handling Inconsistent Data:
Deal with inconsistencies in your data, such as
inconsistent capitalization or spelling errors.
Convert text to lowercase
df['column'].str.lower()
Replace specific values
df['column'].replace({'old_value': 'new_value'},
inplace=True)
7. Text Cleaning and Regular Expressions:
Clean text data using regular expressions
(regex) to remove special characters, and
unwanted symbols, or extract specific
patterns. import re
Remove non-alphabetic characters
df['text_column'] =
df['text_column'].apply(lambda x: re.sub('[^a-
zA-Z]', ' ', x))
swatikulkarni24/
8. Correcting Data Types:
Ensure that columns have the correct data
types for analysis
Convert a column to an integer type
df['column'] = df['column'].astype('int')
swatikulkarni24/
10. Feature Engineering:
Feature engineering involves creating new
features or modifying existing ones to
improve the predictive power of the
dataset.
Create a new feature
df['new_feature'] = df['feature1'] + df['feature2']
swatikulkarni24/
Follow me for more such contents
https://www.linkedin.com/in/swatikulkarni24/
swatikulkarni24/