0% found this document useful (0 votes)
55 views3 pages

CAE1 - 2 - Set1 Key

This document contains details of a data science examination including course outcomes, exam questions, and explanations of key data science concepts. It begins with listing the course outcomes being assessed. The exam then consists of three parts - Part A contains 5 short answer questions testing basic recall of data science terms and concepts. Part B has two longer answer questions explaining data preparation processes and the data science process in detail. Part C is a single long answer question explaining levels of measurement, types of variables, and providing examples.

Uploaded by

JANILA J.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views3 pages

CAE1 - 2 - Set1 Key

This document contains details of a data science examination including course outcomes, exam questions, and explanations of key data science concepts. It begins with listing the course outcomes being assessed. The exam then consists of three parts - Part A contains 5 short answer questions testing basic recall of data science terms and concepts. Part B has two longer answer questions explaining data preparation processes and the data science process in detail. Part C is a single long answer question explaining levels of measurement, types of variables, and providing examples.

Uploaded by

JANILA J.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Reg.No.

MarEphraemCollegeof EngineeringandTechnology,Elavuvilai
B.E. DEGREE CONTINUOUS ASSESSMENT EXAMINATION I – September
2022
ThirdSemester
Department of Computer Science and Engineering
CS3352 Foundations of Data Science

Time: 1.30hrs. Maximum:50marks


CourseOutcomes(COs)forAssessmentinthisExaminatio
n
CO1 Define the data science process
CO2 Describe different types of data description for data science process
CL-CognitiveLevel;R-Remember;Un-Understand;Ap-Apply;An-Analyze;Ev-Evaluate;Cr-Create

PARTA– (5 X2=10 marks)


1. List the benefits of data science R CO1
Commercial, human resource, financial, government sector, non government,
education
2. What do you mean by unstructured data? R CO1
Unstructured simply means that it is datasets (typical large collections of files)
that aren't stored in a structured database format. Unstructured data has an
internal structure, but it's not predefined through data models. It might be human
generated, or machine generated in a textual or a non-textual format.
3. What is statistics? R CO2
Statistics is a set of mathematical methods and tools that enable us to answer important
questions about data
4. List the types of data based on statistical analysis R CO2
Nominal data.
Ordinal data.
Discrete data.
Continuous data.

5. Give the difference between Descriptive statistics and inferential statistics. R CO2
Descriptive Statistics gives information about raw data regarding its description
or features. Inferential statistics, on the other hand, draw inferences about the
population by using data extracted from the population.

PARTB– (2 X13 =26marks)


6.a Explain in detail about data preparation process Un CO1
Data preparation steps

The specifics of the data preparation process vary by industry, organization, and
need, but the workflow remains largely the same.

1. Gather data
The data preparation process begins with finding the right data. This can come
from an existing data catalog or data sources can be added ad-hoc.

2. Discover and assess data

After collecting the data, it is important to discover each dataset. This step is


about getting to know the data and understanding what has to be done before the
data becomes useful in a particular context.

3. Cleanse and validate data

Cleaning up the data is traditionally the most time-consuming part of the data
preparation process, but it’s crucial for removing faulty data and filling in gaps.
Important tasks here include:

 Removing extraneous data and outliers


 Filling in missing values
 Conforming data to a standardized pattern
 Masking private or sensitive data entries

Once data has been cleansed, it must be validated by testing for errors in the data
preparation process up to this point. Often, an error in the system will become
apparent during this validation step and will need to be resolved before moving
forward.

4. Transform and enrich data

Data transformation is the process of updating the format or value entries in


order to reach a well-defined outcome, or to make the data more easily
understood by a wider audience. Enriching data refers to adding and connecting
data with other related information to provide deeper insights.

5. Store data

Once prepared, the data can be stored or channeled into a third party application
— such as a business intelligence tool — clearing the way for processing and
analysis to take place.

7.a Explain in detail about data science process. Un CO1


There are some steps that are necessary for any of the tasks which are being
done in the field of data science to derive any fruitful results from the data at
hand.
 Data Collection – After formulating any problem statement the main task is
to calculate data that can help us in our analysis and manipulation.
Sometimes data is collected by performing some kind of survey and there
are times when it is done by performing scrapping.
 Data Cleaning – Most of the real-world data is not structured and requires
cleaning and conversion into structured data before it can be used for any
analysis or modeling.
 Exploratory Data Analysis – This is the step in which we try to find the
hidden patterns in the data at hand. Also, we try to analyze different factors
which affect the target variable and the extent to which it does so. How the
independent features are related to each other and what can be done to
achieve the desired results all these answers can be extracted from this
process as well. This also gives us a direction in which we should work to
get started with the modeling process. 
 Model Building – Different types of machine learning algorithms as well as
techniques have been developed which can easily identify complex patterns
in the data which will be a very tedious task to be done by a human.
 Model Deployment – After a model is developed and gives better results on
the holdout or the real-world dataset then we deploy it and monitor its
performance. This is the main part where we use our learning from the data
to be applied in real-world applications and use cases.

PARTC– (1 X 14 =14marks)
8.a Explain in detail about levels of measurement and types of variables Un CO2
In descending order of precision, the four different levels of measurement are:

Nominal–Latin for name only (Republican, Democrat, Green, Libertarian)

Ordinal–Think ordered levels or ranks (small–8oz, medium–12oz, large–32oz)

Interval–Equal intervals among levels (1 dollar to 2 dollars is the same interval


as 88 dollars to 89 dollars)

Ratio–Let the “o” in ratio remind you of a zero in the scale (Day 0, day 1, day
2, day 3, …)

Types of variable:
Categorical variables. A categorical variable (also called qualitative variable)
refers to a characteristic that can't be quantifiable. ...
Nominal variables. ...
Ordinal variables. ...
Numeric variables. ...
Continuous variables. ...
Discrete variables.

Preparedby Verifiedby

You might also like