0% found this document useful (0 votes)
83 views13 pages

Data Transformation Techniques

Uploaded by

Priya Elango
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views13 pages

Data Transformation Techniques

Uploaded by

Priya Elango
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Transformation

Techniques
Choosing the best chart

 It is important to understand what type of data you have.


 If the data is a continuous variable, then a histogram would be a good choice.
 If, want to show ranking, an ordered bar chart would be a good choice.
Data Transformation

 One of the fundamental steps of Exploratory Data Analysis (EDA) is data


wrangling.
 DATA Wrangling
 It is a process that prepares raw data for analysis by transforming it
into a more accessible format.
 Merge database-style data frames,
 merging on the index,
 concatenating along an axis,
 combining data with overlap,
 reshaping with hierarchical indexing, and pivoting long to wide format.
Will cover the following topics

Back Ground
Merging database-style data frames
Transformation techniques
Benefits of data transformation
Back Ground

 Data transformation is a set of techniques used to convert data from one


format or structure to another format or structure.

 1. Data deduplication involves the identification of duplicates and their removal.


 2. Key restructuring involves transforming any keys with built-in meanings to the
generic keys.
 3. Data cleansing involves extracting words and deleting out-of-date, inaccurate,
and incomplete information from the source language without extracting the
meaning or information to enhance the accuracy of the source data.
4. Data validation is a process of formulating rules or algorithms that

help in validating different types of data against some known issues.

5. Format revisioning involves converting from one format to another.

6. Data derivation consists of creating a set of rules to generate more


information from the data source.
7. Data aggregation involves searching, extracting, summarizing, and
preserving important information in different types of reporting
systems.

8. Data integration involves converting different data types and


merging them into a common structure or schema.
9. Data filtering involves identifying information relevant to any
particular user.
10. Data joining involves establishing a relationship between two or
more tables.
Merging database-style data frames

 working with pandas data frames, especially regarding when to use append,
concat, merge, or join.
 Assume that you are working at a university as a professor
teaching a Software Engineering course and an Introduction to
Machine Learning course, and there are enough students to split into
two classes.
 The examination for each class was done in two separate buildings
and graded by two different professors. They gave you two different
data frames.
 In the first example, let's only consider one subject— the Software
Engineering course.
In the preceding dataset, the first column contains
information about student identifiers and the
second column contains their respective scores in
any subject.

The structure of the dataframes is the same in


both cases.
In this case, we would need to concatenate them.
Using the pandas concat() method:
 dataframe = pd.concat([dataFrame1, dataFrame2],
ignore_index=True)
Concatenating along with an axis

 The code for combining the dataframes is as follows:


 # Option 1

 dfSE = pd.concat([df1SE, df2SE],


ignore_index=True) dfML = pd.concat([df1ML,
df2ML], ignore_index=True)

 df = pd.concat([dfML, dfSE], axis=1)


 concatenated the dataframes with axis=1 to
place them side by side.
Using df.merge with an inner join

You might also like