0% found this document useful (0 votes)
8 views7 pages

Data Preparation&Analysis Practicals

The document outlines the advantages and disadvantages of data preparation and analysis, highlighting benefits such as improved data quality and insights, while noting challenges like time consumption and complexity. It also discusses various data analytical tools, including Tableau, Power BI, Hadoop, Apache Spark, and TensorFlow, detailing their features, applications, and example use cases. Overall, it serves as a practical guide for understanding data preparation, analysis, and the tools available for these processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Data Preparation&Analysis Practicals

The document outlines the advantages and disadvantages of data preparation and analysis, highlighting benefits such as improved data quality and insights, while noting challenges like time consumption and complexity. It also discusses various data analytical tools, including Tableau, Power BI, Hadoop, Apache Spark, and TensorFlow, detailing their features, applications, and example use cases. Overall, it serves as a practical guide for understanding data preparation, analysis, and the tools available for these processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Hiren Patel IU2241230273​​ ​ ​ ​ 6th CSE

Data Preparation
& Analysis
(CS0602)
Practical File

Hiren Patel
B.Tech CSE
Semester 6
IU2241230273

Data Preparation & Analysis 1


Hiren Patel IU2241230273​​ ​ ​ ​ 6th CSE

PRACTICAL - 1
AIM:
Study about Advantages & Disadvantages of Data Preparation & Analysis.

Data Preparation

Advantages:

1.​ Data Cleaning:


○​ Improves Data Quality: Removing duplicates, handling missing values, and
detecting outliers lead to more accurate and reliable data.
○​ Enhances Analysis: Clean data ensures that the analysis is based on true
representations, resulting in better insights.
2.​ Data Transformation:
○​ Consistency: Normalization/standardization ensures that data is on a common
scale, making it easier to compare.
○​ Usability: Encoding categorical variables makes them usable in machine learning
algorithms.
3.​ Data Integration:
○​ Comprehensive Data: Combining datasets provides a more complete picture.
○​ Multimodal Analysis: Integrating different types of data can lead to richer
insights.
4.​ Data Reduction:
○​ Efficiency: Reducing the dimensionality or size of the data can speed up the
analysis process.
○​ Focus on Key Features: Helps in focusing on the most important features that
have the most impact.

Disadvantages:

1.​ Time-Consuming: The process of cleaning, transforming, integrating, and reducing data
can take a significant amount of time and effort.
2.​ Complexity: Requires expertise and careful handling to avoid introducing errors or bias.
3.​ Resource Intensive: Involves the use of various tools and skilled personnel, which can be
costly.

Data Analysis

Advantages:

1.​ Descriptive Statistics:

Data Preparation & Analysis 2


Hiren Patel IU2241230273​​ ​ ​ ​ 6th CSE

○​ Summarizes Data: Provides a quick summary of the dataset's main


characteristics.
○​ Easy to Interpret: Simple measures like mean, median, and mode are easy to
understand.
2.​ Exploratory Data Analysis (EDA):
○​ Identifies Patterns: Helps in discovering hidden patterns and trends.
○​ Visual Insights: Data visualization makes it easier to understand complex data.
3.​ Inferential Statistics:
○​ Makes Predictions: Allows for making inferences about a population based on a
sample.
○​ Hypothesis Testing: Helps in determining the statistical significance of findings.
4.​ Predictive Analytics:
○​ Forecasting: Provides predictions about future events based on historical data.
○​ Decision Making: Supports better decision-making by providing actionable
insights.
5.​ Clustering:
○​ Group Identification: Identifies groups or clusters within data.
○​ Market Segmentation: Useful for segmenting markets or customer bases.
6.​ Association Rule Learning:
○​ Discover Relationships: Identifies associations between different variables.
○​ Market Basket Analysis: Helps in understanding purchase patterns.

Disadvantages:

1.​ Complexity: Analysis techniques can be complex and require specialized knowledge.
2.​ Data Quality Dependency: Results are heavily dependent on the quality of the input
data.
3.​ Resource Intensive: Requires computational resources and expertise to perform
effectively.
4.​ Potential for Misinterpretation: Incorrect analysis or misinterpretation of results can
lead to wrong conclusions.

Data Preparation & Analysis 3


Hiren Patel IU2241230273​​ ​ ​ ​ 6th CSE

PRACTICAL - 2
AIM:
Study about Different data analytical tools.

Tableau

Overview:

Tableau is a Business Intelligence (BI) and data visualization tool that helps users analyze and
present data through interactive dashboards and reports.

Key Features:

●​ Tableau Prep for data cleaning and transformation.


●​ Drag-and-drop interface for data visualization.
●​ Connects to multiple data sources (databases, cloud, Excel, etc.).
●​ Real-time data analytics with live or extracted data.

Applications:

●​ Business intelligence and decision-making.


●​ Market trend analysis.
●​ Customer insights and reporting.

Example Use Case:

●​ Retail: Analyzing sales trends and customer demographics.


●​ Finance: Monitoring real-time financial transactions and risk assessment.
●​ Healthcare: Visualizing patient data for better diagnosis and treatment planning.

2. Power BI

Overview:

Power BI is a Microsoft-powered BI tool used for data visualization, reporting, and analytics,
particularly in enterprise environments.

Key Features:

●​ Power Query for data extraction, transformation, and loading (ETL).

Data Preparation & Analysis 4


Hiren Patel IU2241230273​​ ​ ​ ​ 6th CSE

●​ Integrates seamlessly with Excel, SQL Server, and Azure.


●​ AI-driven insights and automation capabilities.
●​ Real-time dashboard updates.

Applications:

●​ Enterprise reporting and dashboards.


●​ Data-driven business decision-making.
●​ Financial planning and forecasting.

Example Use Case:

●​ Corporate Performance: Analyzing company-wide KPIs and operational metrics.


●​ Sales & Marketing: Tracking customer engagement and campaign performance.
●​ Supply Chain: Monitoring inventory and logistics efficiency.

3. Hadoop

Overview

Hadoop is an open-source framework for distributed data storage and processing, primarily used
for handling big data.

Key Features:

●​ HDFS (Hadoop Distributed File System) for storing large datasets.


●​ MapReduce for parallel processing of massive data.
●​ Works with Hive and Pig for querying and transforming data.
●​ Supports structured, semi-structured, and unstructured data.

Applications:

●​ Big data storage and processing.


●​ Large-scale data analytics.
●​ Enterprise data warehousing.

Example Use Case:

●​ E-commerce: Processing millions of customer transactions in real-time.


●​ Healthcare: Storing and analyzing large volumes of patient health records.
●​ Social Media: Analyzing user-generated data across platforms.

Data Preparation & Analysis 5


Hiren Patel IU2241230273​​ ​ ​ ​ 6th CSE

4. Apache Spark

Overview:

Apache Spark is an in-memory big data processing engine designed for real-time and batch data
analytics.

Key Features:

●​ Spark SQL for querying structured data.


●​ Spark Streaming for real-time data processing.
●​ MLlib for machine learning and data science applications.
●​ GraphX for graph processing.

Applications:

●​ Real-time big data analytics.


●​ AI and machine learning pipelines.
●​ Streaming data processing.

Example Use Case:

●​ Banking & Finance: Detecting fraud in real-time transactions.


●​ Telecommunications: Analyzing customer call data for better service.
●​ IoT (Internet of Things): Processing sensor data in real-time.

5. TensorFlow

Overview:

TensorFlow is an open-source machine learning and AI framework used for building and training
deep learning models.

Key Features:

●​ TensorFlow Data (TFData) for scalable data preparation.


●​ Supports deep learning models (CNNs, RNNs, NLP).
●​ TensorFlow Lite for mobile and embedded AI applications.
●​ Cloud-based model training with TensorFlow Serving.

Applications:

Data Preparation & Analysis 6


Hiren Patel IU2241230273​​ ​ ​ ​ 6th CSE

●​ Artificial intelligence and deep learning.


●​ Natural language processing (NLP).
●​ Image and speech recognition.

Example Use Case:

●​ Healthcare: Predicting diseases using AI-powered diagnostics.


●​ Autonomous Vehicles: AI-driven object detection and path planning.
●​ Finance: Fraud detection using deep learning models.

Data Preparation & Analysis 7

You might also like