0% found this document useful (0 votes)
16 views10 pages

Unit 4

The document provides an overview of data analytics and data science, highlighting their definitions, similarities, and differences. It discusses key concepts such as data types, data cleaning, and the significance of databases, as well as the applications of big data across various fields like healthcare and finance. Additionally, it addresses challenges and advantages of big data, emphasizing its impact on decision-making and operational efficiency.

Uploaded by

PISD Doha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

Unit 4

The document provides an overview of data analytics and data science, highlighting their definitions, similarities, and differences. It discusses key concepts such as data types, data cleaning, and the significance of databases, as well as the applications of big data across various fields like healthcare and finance. Additionally, it addresses challenges and advantages of big data, emphasizing its impact on decision-making and operational efficiency.

Uploaded by

PISD Doha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit 4: Data Analysis

Define data analytics and data science. Are they similar or different? Give reason.

Data Analytics: Data analysis is the process of inspecting, cleansing, transforming, and modeling data with
the goal of discovering useful information, informing conclusions, and supporting decision-making

Data Science: Data Science refers to an inter disciplinary field of multiple disciplines that uses
mathematics, statistics, data analysis and machine learning to analyze data and to extract knowledge and
insights from it.

What is data?

Data refers to facts. statistics, or pieces of information collected for reference, analysis, or calculation.

What is the difference between data and information?

Data is raw, unprocessed facts, while information is processed data that has meaning and context.

What is the importance of data cleaning?

Data cleaning is important for removing errors, inconsistencies, and inaccuracies from datasets to ensure
the quality and reliability of the data for analysis

What are the steps involved in data analysis

Steps Include data collection. data cleaning. data transformation. Data modelling and interpretation of
results

What is Data Set.

Dataset is a structured or processed collection of data usually associated with a unique body of work. This
collection of data is related to each other in some way, which can be used to evaluate certain pattern or
trend common in the entire dataset.

What do you know about Machine Learning:

Machine learning is a branch of Artificial Intelligence and computer science which emphasis on the use of
data and algorithms to replicate human learning by the computers.

What do you know about Deep Learning:

Deep learning is the subset of Machine learning, with emphasis on the simulation or imitation of human
brain's behavior by using artificial neural networks.

What do you know about Data Mining:

Data mining is the subset of data science which primarily focuses on discovering patterns and relationships
in existing datasets. The usage of techniques and tools is limited in data mining as compared to data
science.
Explain Data Visualization

Data visualization is the graphical representation of data using common charts. plots. infographics, and
animations. These visual displays of information communicate complex data relationships and data-driven
insights in a way that is easy to understand

What is Big Data?

Big data refers to handling large volumes of data. Data scientists use big data to find patterns and trends
in datasets, to obtain more accurate and reliable results. The huge size of data provides more opportunities
for machine learning and provides better results

What is Predictive Analysis:

Predictive analysis is the use of data to predict future trends and events based on historical data

What is Natural Language Processing (NLP):

It is the study of interaction between human language and computers. The common uses of NLP are
chatbots, language translators and sentiment analysis.

Analyze the term Dataset and Database:

Dataset: A dataset is a structured or organized collection of data, which is usually associated with a unique
body of work.

Database: A database is an organized collection of data stored in multiple datasets or tables. These tables
can be accessed electronically from the computer system for further manipulation and update.

What is the significance of understanding data types in data preprocessing?

Understanding data types helps in data cleaning, normalization, and transformation during preprocessing.

What is a numerical data type?

Numerical data types represent numbers and include integers (whole numbers) and floats (numbers with
decimals). E.g. Age, represented as an integer, is a numerical data type.

What are categorical data types?

Categorical data types represent discrete categories or labels and can be nominal (unordered) or ordinal
(ordered). E.g. Gender, categorized as male or female, is a categorical data type.

What is the purpose of text data type in data science?

Text data type is used for handling textual information, such as documents, emails, or social media posts.

What does Boolean data type represent?

Boolean data type represents true/false or binary values and is often used for logical operations and
filtering.
QI : Define data analytics and data science. Are they similar or different? Give a reason

Data Analytics:

The practice of exploring and analyzing data to uncover patterns, draw conclusions and derive meaningful
insights.

Data Science:

An interdisciplinary area that employs mathematics, statistics, data analysis and machine learning to
extract valuable insights and knowledge from data.

Similarities:

• Both fields analyze data to gather insights and guide decision-making


• Both utilize statistical and mathematical methods to uncover patterns and trends.
• Both depend on programming languages like Python, and tools such as SQL.

Differences:

Scope: Data Science involves data processing, modeling, and machine learning; Data Analytics focuses on
interpreting and reporting insights.

Tools: Data Scientists use Python, and AI frameworks; Data Analysts use SQL, Excel, and BI tools like
Tableau.

Outcome: Data Science predicts future trends; Data Analytics explains current patterns.

Q2: Can you relate how data science is helpful in solving business problems?

Data science enables businesses to address issues by examining data to uncover insights, trends, and
patterns. It supports better decision-making, improves customer experiences, identifies fresh
opportunities, forecasts outcomes, and minimizes risks.

Q. 3: Database is useful in the field of data science. Defend this statement.

A database is essential in data science as it stores, organizes, and manages large datasets efficiently. It
enables quick data retrieval, supports analysis, and ensures data integrity, providing a solid foundation for
building models and deriving insights.
Q3: Compare machine learning and deep learning in the context of formal and informal education.

Formal Education: Machine learning is widely taught as an introductory topic in structured courses, while
deep learning is covered at advanced levels due to its complexity and reliance on neural networks.

Informal Education: Machine learning tutorials and tools are more accessible for beginners online,
whereas deep learning requires specialized resources like frameworks (e.g., TensorFlow) and high
computational power.

Q5: What is meant by sources of data? Give three sources of data excluding those mentioned in the
book.

Sources of data refer to the origins or channels from which data is collected for analysis.

Three examples of data sources (excluding websites, surveys, and sensors):

I. Transaction Records: Data from purchase sales, and financial transactions stored in databases or record

systems.

2. Social media: Insights into consumer preferences, sentiment, and brand perception collected from
social media platforms.

3. Government Databases: Data from government records used for research, policy-making, and decision-
making in fields like healthcare, education, and public administration.

Q6: Differentiate between database and dataset.

Database: A structured collection of data organized for efficient storage, retrieval, and management.
Databases are designed to handle ongoing updates, inserts and deletions of data.

Dataset: A collection of data, structured or unstructured, often used for analysis research, or machine
learning tasks. Datasets are used for analysis, research, or building machine learning models

Q.7: Argue about the trends, outliers, and distribution of values in a dataset.

Trends: Trends highlight patterns or directions in data over time or categories, enabling forecasting and
strategic planning.

Outliers: Outliers are extreme values that may signal errors, rare events, or unique cases, potentially
skewing statistical analysis.

Distribution of Values: Distribution shows how data points spread across ranges, aiding in understanding
variability, central tendencies, and outcome probabilities.
Q. 8: Why are summary statistics needed?

Summary statistics simplify data interpretation by providing key insights like central tendency and
variability. They help in decision-making and identifying patterns or anomalies without the need to analyze
raw data.

Q.9: Express big data in your own words. Explain three V's of big data with reference to email data.

Big data refers to vast amounts of data that are too large, complex, and fast-changing for traditional data
processing methods to handle efficiently. It often includes structured, semi-structured, and unstructured
data from diverse sources.

Volume: Refers to the massive amount of email data, including billions of messages and attachments
generated daily.

Velocity: Describes the fast-paced flow of email data, with messages arriving in real-time, requiring swift
processing.

Variety: Refers to the diverse formats of email data, including text, attachments, and metadata, each
needing different processing methods.

Q. 10: Illustrate the purpose of data storage.

The purpose of data storage is to securely and efficiently keep data for future use, analysis, and retrieval.
It acts as a centralized system for organizing, managing, and preserving data over time. Data storage plays
a crucial role in managing and utilizing information effectively.
Q1: Sketch the key concepts of data science in your own words.

Data science involves extracting meaningful insights from vast amounts of data using techniques from
statistics, programming, and domain knowledge. Key concepts include:

• Data Collection: Gathering relevant data from various sources (e.g., databases, sensors).

• Data Cleaning: Preparing the data by handling missing values, outliers, and inconsistencies.

• Data Analysis: Applying statistical and machine learning methods to identify patterns and trends.

• Modeling: Creating algorithms that can predict or classify outcomes based on data.

• Visualization: Presenting data findings through charts and graphs for easy interpretation.

• Interpretation: Drawing actionable insights from data to inform decision-making.

Q2: Develop your own thinking on the various data types used in data science.

In data science, understanding various data types is crucial as they determine how data is processed,
analyzed, and interpreted. Key data types include:

Numerical Data: Represents measurable quantities like age or salary, used for statistical analysis and
predictive models.

Categorical Data: Represents non-numeric information like gender or product category, useful in
classification tasks.

Text Data: Consists of strings like emails or reviews, analyzed using natural language processing (NLP).

Time-Series Data: Data collected over time, such as stock prices, used for trend prediction.

Image and Audio Data: Complex data types for visual and sound information, analyzed with deep learning
techniques.

Boolean Data: Binary data (true or false), often used in logic operations and binary classification.
Q3: Compare how big data applies to various fields of life. Illustrate your answer with suitable examples

Big data applies to numerous fields, transforming how industries operate and make decisions. Here are
some examples:

Healthcare: Big data analyzes medical records and health trends to predict diseases and improve
treatment plans.

Finance: It helps detect fraud, assess risks, and optimize trading strategies by analyzing transaction data.

Retail: Retailers use big data to understand customer behavior and personalize marketing through
purchase history.

Transportation: Big data aids in traffic management and route optimization, like ride-sharing services
predicting demand.

Education: It tracks student performance and offers personalized learning experiences through data
analysis.

Agriculture: Big data helps optimize crop yields by analyzing soil, weather, and crop health data for
precision farming.

Q. 4: Relate the advantages and challenges of big data.

Advantages:

• Informed Decision-Making: Big data enables data-driven decisions, improving efficiency.

• Improved Customer Experience: It allows personalized marketing and better product


recommendations.

• Predictive Analytics: Big data helps forecast trends, such as customer demand or disease
outbreaks.

• Innovation: It drives new products, services, and business models based on data insights.

Challenges:

• Data Privacy and Security: Ensuring privacy and protecting sensitive data are major concerns.

• Data Quality: Large datasets often contain errors, inconsistencies, and missing values.

• Storage and Processing: Managing and processing big data requires significant computational
resources.

• Skill Gaps: A shortage of qualified professionals hinders effective data analysis.


Q5: Design a case study about how data science and big data has revolutionized the field of

healthcare.

Case Study: Revolutionizing Healthcare with Data Science and Big Data

Introduction:
Data science and big data have revolutionized healthcare by enhancing diagnosis, treatment, and patient
management, enabling informed decisions that improve outcomes and efficiency.

Applications:

1. Predictive Analytics: Big data helps hospitals forecast patient admissions, optimizing resource
allocation.

2. Personalized Medicine: Patient data drives tailored treatments, especially for cancer therapies.

3. Disease Monitoring and Prevention: Wearable devices track health metrics, enabling early
disease detection.

4. Improved Diagnostics: Machine learning enhances diagnostic accuracy for conditions like tumors
and fractures.

5. Operational Efficiency: Data science streamlines hospital operations, reducing wait times and
optimizing resource use.

Impact:
Patients experience better outcomes, while healthcare providers benefit from cost savings, fewer errors,
and improved decision-making.

Conclusion:
Data science and big data are reshaping healthcare, improving care delivery and operational efficiency,
leading to a more patient-centered approach.
Multiple Choice Question

1. ______ is a structured or processed collection of data usually associated with a unique body of work.

A. Database

B. Dataset

C. Data and Information

D. Information

2. _______ refers to carefully examining and studying data to identify patterns, draw conclusions, or make

the data meaningful.

A. Data analytics

B. Data Predictions

C. Dataset

D. Database

3. _____ is the graphical representation of data through use gf common charts, plots, infographics, and

animations.

A. Data cleaning

B. Missing values

C. Data visualization

D. Data hiding

4. ______ is a subset of Machine Learning, with emphasis on the simulation or imitation of the human

brain's behavior by using artificial neural networks.

A. Data visualization

B. Computer vision

C. Deep learning

D. Big Da

5. is the use of data to predict future trends and events based on historical data.

A. Statistical analysis

B. Predictive analysis

C. Graphical analysis

D. Deep learning
6. __________ is the fast rate at which data is received and acted on.

A. Volume

B. Velocity

C. Variety

D. Vision

7. ________includes the data which can only take certain values and cannot be further subdivide into

smaller units.

A. Discrete data

B. Continuous data

C. Ordinal data

D. Referral data

8. __________is a limitation of Big Data.

A. Statistical data

B. Unlimited growth of data

C. Predictive maintenance

D. Referred data

9. Customer satisfaction levels such as satisfied, dissatisfied, and neutral are examples of____type.

A. Ordinal data

B. Continuous data

C. Numeric data

D. Discrete data

10. _____is a method of Collecting information from individuals.

A. Survey

B. Data hiding

C. Data 'Visualization

D. Data finding

You might also like