0% found this document useful (0 votes)

9 views16 pages

Mtech Final

The document outlines a course on Data Science and its applications, detailing various topics such as definitions, programming in R, statistical concepts, data visualization, and machine learning algorithms. It includes explanations of key concepts like Data Science, KNN, Naïve Bayes, and Principal Component Analysis, along with practical programming exercises in R. Additionally, it discusses the importance of data visualization, web scraping, and the MapReduce framework in data engineering.

Uploaded by

hajeera.nk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views16 pages

Mtech Final

Uploaded by

hajeera.nk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 16

Mtech 1st semester

Course Title: Data Science and application Course Code:

Scheme & Solution

1a Define Data science. Explain the Venn diagram of Data Science. 10

Definition of Data Science  2M
Data Science is an interdisciplinary field that uses scientific methods, algorithms, and
systems to extract insights from structured and unstructured data. It combines elements of
statistics, mathematics, programming, and domain expertise to analyze and interpret
complex data.

Venn Diagram of Data Science  4M

The Venn diagram of Data Science consists of three overlapping components:
1. Mathematics & Statistics
2. Computer Science (Programming & Data Engineering)
3. Domain Knowledge (Business/Industry Expertise)
Intersections in the Venn Diagram 4M
 Machine Learning (Mathematics + Computer Science)
 Traditional Research (Mathematics + Domain Knowledge)
 Software Development (Computer Science + Domain Knowledge)
 Data Science (Core Area)

b Write a short note about basics of R. Write an R program to print the 10

Fibonacci Series.
R is a powerful programming language and environment primarily used for statistical computing,
data analysis, and graphical representation. It is widely used in data science, machine learning,
and bioinformatics.  2M
Basic Features of R:  2M
 Open-source and free to use.
 Provides built-in functions for statistical analysis.
 Supports data visualization using packages like ggplot2.
 Uses vectors, matrices, and data frames for data manipulation.
 Offers extensive libraries such as dplyr, tidyr, and caret for data science applications.
Code with sample output :  5 + 1 M
fibonacci <- function(n)
{
a <- 0
b <- 1
cat(a, b)

for (i in 3:n)
{
c <- a + b
cat(c, " ")
a <- b
b <- c
}
}
n_terms <- 10 cat("Fibonacci Series:\n") fibonacci(n_terms)

2a Explain the following concepts with examples: 10

i. Statistical Inference
ii. Population
iii. Samples
iv. Types of data
v. Big Data
Statistical Inference  2M
 Statistical inference is the process of drawing conclusions about a population based on a
sample of data. It involves techniques like estimation, hypothesis testing, and confidence
intervals.
Any suitable Example:
 A researcher collects the test scores of 100 students from a university and uses statistical
methods to infer the average test score of all students at the university.
Population
 A population is the entire group of individuals or observations that a study aims to analyze. It
can be finite or infinite.
Any suitable Example:
All citizens of a country when conducting a national census.
iii. Samples
A sample is a subset of the population selected for analysis. It is used when studying the entire
population is impractical.
Example:
Surveying 1,000 voters (sample) to predict the outcome of a national election.
iv. Types of Data
Data can be categorized into different types based on its nature and usage:
Quantitative Data (Numerical) – Represents measurable quantities.
Example: Heights of students (in cm), annual income (in dollars).
Qualitative Data (Categorical) – Represents characteristics or categories.
Example: Gender (Male/Female), Car brands (Toyota, Ford).
Further Classification:
Discrete Data: Countable values (e.g., Number of students in a class).
Continuous Data: Measurable values (e.g., Temperature, Weight).
v. Big Data
Big Data refers to extremely large datasets that cannot be processed using traditional data
management tools. It is characterized by the 5 Vs: Volume, Velocity, Variety, Veracity, and
Value. Example:
Data generated by social media platforms like Facebook and Twitter.
Real-time transaction data from e-commerce websites like Amazon.
b Write a program to find the Sum, Mean and Product of the Vector 3 1 10
in R programming.
# Define a vector
vec <- c(2, 4, 6, 8, 10)

# Calculate Sum
sum_vec <- sum(vec)

# Calculate Mean
mean_vec <- mean(vec)

# Calculate Product
product_vec <- prod(vec)

# Print results
cat("Vector:", vec, "\n")
cat("Sum:", sum_vec, "\n")
cat("Mean:", mean_vec, "\n")
cat("Product:", product_vec, "\n")

3a Briefly explain Data Science process with a neat diagram. 2 2 10

Explanation of Data Science Process 5
Neat and Clear Diagram 5
Data Science Process
The Data Science process consists of several stages that transform raw data into valuable insights.
Below are the key steps:
Problem Definition
Understanding the problem and defining objectives.
Example: Predicting customer churn for a telecom company.
Data Collection
Gathering relevant data from various sources (databases, APIs, web scraping).
Example: Collecting transaction data from an e-commerce platform.
Data Cleaning & Preprocessing
Handling missing values, duplicates, and outliers.
Example: Removing inconsistent records from customer datasets.
Exploratory Data Analysis (EDA)
b What is Matplotlib. Write a R program to plot line chart by assuming 3 1 10
own data.
Matplotlib is a popular data visualization library in Python used for creating static, animated, and
interactive plots. It provides various plotting functions like line charts, bar charts, histograms, and
scatter plots.
However, since you requested an R program, we will use ggplot2 or base R for plotting a line
chart.
# Load necessary library
library(ggplot2)
# Create a sample dataset
data <- data.frame(
Year = c(2015, 2016, 2017, 2018, 2019, 2020),
Sales = c(500, 700, 900, 1100, 1300, 1500)
)
# Create a line chart
ggplot(data, aes(x = Year, y = Sales)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "red", size = 3) +
ggtitle("Yearly Sales Growth") +
xlab("Year") +
ylab("Sales") +
theme_minimal()
4a Explain the concept of KNN algorithm. What are the modelling 2 2 10
assumptions of KNN algorithm with example.
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for both
classification and regression tasks. It is a non-parametric and instance-based learning algorithm,
meaning it does not make strong assumptions about the underlying data distribution and directly
stores training data for making predictions.
Similarity-Based Learning

Assumes similar data points are close to each other in feature space.
Example: If most nearby points are apples, a new point near them is likely an apple. Choice of K
Value Matters
A small K (e.g., 1 or 3) may lead to noise affecting predictions.
A large K (e.g., 20 or 50) may oversmooth and ignore local variations. Feature Scaling is
Important
Distance calculations are affected by different feature scales.
Example: If height is in cm and weight in kg, height will dominate. Normalization (Min-Max
Scaling or Standardization) is required.
Assumes a Meaningful Distance Metric
Uses distance measures like Euclidean, Manhattan, or Cosine similarity.
Example: Euclidean distance works well for continuous data, while Hamming distance is used for
categorical data.
Non-Parametric Nature
Does not assume an underlying distribution (unlike linear regression). Learns only when making
predictions (lazy learning).
b What is ggplot2 in R? Write an R program to plot a bar chart using 3 1 10
some sample data.
ggplot2 is a powerful and widely used data visualization package in R. It is based on the Grammar
of Graphics and allows users to create complex visualizations easily by layering different
elements like axes, colors, and labels.
Features of ggplot2:
Provides high-quality, customizable plots.
Supports multiple chart types (bar charts, line charts, histograms, etc.).
Uses a layered approach for plotting.
Works well with dplyr and tidyverse for data manipulation.
# Load ggplot2 library
library(ggplot2)
# Create sample data
data <- data.frame(
Category = c("A", "B", "C", "D", "E"),
Value = c(10, 25, 15, 30, 20)
)
# Create a bar chart
ggplot(data, aes(x = Category, y = Value, fill = Category)) +
geom_bar(stat = "identity") +
ggtitle("Bar Chart Example") +
xlab("Categories") +
ylab("Values") +
theme_minimal()

5a Explain Naïve Bayes Algorithm for filtering Spam with example. 2 3 10

The Naïve Bayes algorithm is a probabilistic machine learning model used for classification tasks,
including spam filtering. It is based on Bayes' Theorem and assumes that features are independent
of each other (hence the term "naïve").
Bayes' Theorem Formula:
How Naïve Bayes is Used for Spam Filtering
Training Phase:
Collect a dataset of emails labeled as spam or ham (not spam). Extract features like word
frequency (e.g., "free", "win", "money").
Calculate probabilities of each word appearing in spam and non-spam emails. Prediction Phase:
For a new email, check the presence of words and compute probabilities using Bayes' theorem.
If the probability of spam is higher than ham, classify the email as spam, otherwise classify it as
ham.
b Compare Naïve Bayes Algorithm with KNN Algorithm. Explain 2 3 10
Laplace smoothing
Why is Laplace Smoothing Needed?
In Naïve Bayes, if a word (feature) never appears in training data for a class, its probability
becomes zero, making the entire probability calculation zero. This problem is called zero-
frequency problem.
 Prevents zero probabilities.
 Makes the model more robust, especially for text classification.
 Works well for handling rare words or unseen features in test data.
6a Why Linear Regression and KNN are poor choices for filtering spam? 2 3 10
Discuss.
Why Linear Regression and KNN are Poor Choices for Spam Filtering?
Spam filtering is a classification problem where emails are categorized as spam or ham (not
spam). While algorithms like Naïve Bayes are well-suited for text classification, Linear
Regression and K- Nearest Neighbors (KNN) have significant drawbacks when applied to spam
filtering.
Why Linear Regression is a Poor Choice?
+ Not Designed for Classification
Linear Regression is meant for continuous output (regression tasks), not categorical classification.

It produces continuous values instead of distinct class labels (spam or ham).

+ Decision Boundary Issues

A linear model assumes a straight-line relationship between inputs and output, which does not fit
complex decision boundaries in text classification.
Emails have non-linear relationships between words and spam probability.

+ Probabilities May Go Out of Range

Linear Regression may predict values outside the valid probability range (0 to 1). Example: It may
classify an email with a spam score of 1.5, which is invalid.
Why KNN is a Poor Choice?
+ High Computational Cost
KNN is a lazy learner, meaning it stores all training data and makes predictions by computing
distances between new data and all stored emails.
With a large dataset, computing distances between emails is slow and inefficient.

+ Curse of Dimensionality
In spam filtering, emails are represented as high-dimensional feature vectors (e.g., thousands of
words).

KNN performs poorly in high dimensions because distances become less meaningful.

+ Does Not Handle Text Data Well

KNN requires a distance metric (e.g., Euclidean distance), which is not ideal for categorical/text
data.
Text data should be treated probabilistically (like in Naïve Bayes) rather than relying on distances.

⬛ Alternative: Naïve Bayes is better since it uses word probabilities instead of distances to classify
emails.
b Explain scraping the web with API’s and other tools. 2 3 10
Scraping the Web with APIs and Other Tools
Web scraping is the process of extracting data from websites. It can be done using APIs or web
scraping tools when APIs are not available.

Scraping the Web with APIs

APIs (Application Programming Interfaces) allow structured access to web data without parsing
HTML. Many websites provide APIs to fetch data efficiently.
Steps for Web Scraping Using APIs
Find the API – Check if the website provides a public API (e.g., Twitter API, OpenWeather API).
Get API Key – Many APIs require authentication with an API key.
Send Requests – Use GET or POST requests to fetch data.
Process JSON/XML Response – Extract and analyze data from the response. With example
7a Explain and construct a Decision Tree with an example. 3 3 10
B Explain the concept of Principal Component Analysis (PCA) and 2 4 10
evaluate its significance in dimensionality reduction.
What is PCA?
Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction
while retaining the most important information in a dataset. It transforms the original features into
a new set of uncorrelated variables called Principal Components (PCs), which capture the
maximum variance in the data.

8a Write short notes on: 3 3 10

i) Feature selection criteria
ii) Random Forest
iii) The three primary methods of Regression.
iv) The Kaggle model
b Discuss features generation. Describe feature selection using SVD 2 4 10
algorithm.
9a Explain the concept of data visualization. What are its key objectives? 2 5 10
Discuss the various techniques used to visualize time series data
Data visualization is the graphical representation of data to help users understand patterns, trends,
and insights. It transforms raw data into visual formats like charts, graphs, and maps, making
complex data easier to interpret and analyze.
Key Objectives of Data Visualization
Simplify Data Interpretation – Makes complex data more accessible and understandable.
Identify Trends and Patterns – Helps detect trends, correlations, and anomalies in data.
Support Decision-Making – Assists businesses and researchers in making informed decisions.
Enhance Communication – Provides a clear way to present data-driven insights.
Detect Outliers – Helps in identifying unusual data points or inconsistencies.
Techniques for Visualizing Time Series Data
Line Chart – Best for showing trends over time.
Area Chart – Similar to a line chart but with the area filled, emphasizing volume.
B Explain the MapReduce framework in data engineering. How can it 3 5 10
be applied to solve the word frequency problem in large text datasets?
MapReduce is a distributed data processing framework used in big data engineering to process
large datasets across multiple machines in a parallel manner. It is primarily used in Hadoop
ecosystems and follows a divide-and-conquer approach to process data efficiently.
Components of MapReduce
Map Phase
Splits the input data into smaller chunks and processes them in parallel.
Each chunk is processed by a mapper function, which transforms the input into key-value pairs.
Shuffle & Sort Phase
The intermediate key-value pairs from the mappers are grouped by key.
Data is then sorted and sent to the reducers.
Reduce Phase
The reducers process grouped data and aggregate results.
The final output is stored in HDFS (Hadoop Distributed File System) or another storage system.
Example
The word frequency problem involves counting occurrences of words in a large text dataset.
10a Discuss the essential characteristics of a social network with a 2 5 10
suitable example. Explain the concept of a social graph and its
relevance in the field of data science.
Essential Characteristics of a Social Network
A social network is a structure made up of individuals or organizations that are connected through
relationships such as friendship, professional connections, or shared interests. The key
characteristics of social networks include:
Nodes (Entities) – Individuals, groups, or organizations that form the network.
Edges (Connections) – Relationships between nodes, which can be directed (one-way) or
undirected (mutual).
Community Structure – Groups of highly connected nodes within the network.
Degree Centrality – The number of direct connections a node has.
Homophily (Similarity) – The tendency of similar individuals to form connections (e.g., people
with common interests forming clusters).
Influence & Virality – Information spreads quickly through key influencers (viral marketing,
trends, etc.).
Dynamic Nature – Networks evolve as new connections form and old ones disappear.
Concept of a Social Graph
A social graph is a graphical representation of relationships between users in a social network. It
consists of:
Nodes (Users/Entities) representing people or groups.
Edges (Connections) depicting friendships, follows, or interactions.
Relevance of Social Graph in Data Science
b Explain Girvan-Newman algorithm with example 3 5 10
The Girvan-Newman algorithm is a community detection algorithm used in social network
analysis to identify clusters (communities) by iteratively removing edges with the highest
betweenness centrality.
Steps of the Girvan-Newman Algorithm
Calculate Edge Betweenness
Betweenness centrality measures how often an edge appears in the shortest paths between nodes.
Higher betweenness means the edge acts as a "bridge" between communities.
Remove the Edge with Highest Betweenness
The most influential edge is deleted.
The network may split into smaller groups.
Recalculate Betweenness
After removing an edge, betweenness is recalculated for the remaining edges.
Repeat Until Communities Emerge
The process continues until a clear separation of clusters is visible.
Applications of the Girvan-Newman Algorithm
Social Network Analysis – Finding friend groups in social media.
Biological Networks – Identifying functional modules in protein interactions.
Fraud Detection – Discovering fraudulent activity groups.
Marketing & Targeting – Understanding customer clusters for better recommendations.

CRC Data Science
No ratings yet
CRC Data Science
443 pages
Modern Data Science With R - Baumer Benjamin SKaplan Daniel THort
No ratings yet
Modern Data Science With R - Baumer Benjamin SKaplan Daniel THort
985 pages
An Overview of Business Intelligence, Analytics, and Data Science
No ratings yet
An Overview of Business Intelligence, Analytics, and Data Science
40 pages
(Edward Curry) An Introduction To Bioinformatics - A Practical Guide For Biologists
No ratings yet
(Edward Curry) An Introduction To Bioinformatics - A Practical Guide For Biologists
248 pages
Modern Data Science With R-775437 Chapters
No ratings yet
Modern Data Science With R-775437 Chapters
10 pages
Nandan Resume
No ratings yet
Nandan Resume
1 page
PGP-Data Science - Course Module With Internship Module
No ratings yet
PGP-Data Science - Course Module With Internship Module
17 pages
Ida PDF
No ratings yet
Ida PDF
62 pages
Introduction To R For Social Scientist Preview
No ratings yet
Introduction To R For Social Scientist Preview
26 pages
Different Types of Computer Storage Devices
25% (4)
Different Types of Computer Storage Devices
4 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
Oracle Database 10g - DBA
100% (1)
Oracle Database 10g - DBA
98 pages
Introduction To Data Science - Lin and Li
No ratings yet
Introduction To Data Science - Lin and Li
403 pages
Kadir
No ratings yet
Kadir
84 pages
Professional Certificate in Data Science
No ratings yet
Professional Certificate in Data Science
15 pages
Practicals
No ratings yet
Practicals
31 pages
Using R and Rstudio For Data Management Statistical Analysis and Graphics 2Nd Edition Nicholas J. Horton
No ratings yet
Using R and Rstudio For Data Management Statistical Analysis and Graphics 2Nd Edition Nicholas J. Horton
69 pages
Data Science Training in Hyderabad
No ratings yet
Data Science Training in Hyderabad
7 pages
MIT14 381F13 EcnomtrisInR PDF
No ratings yet
MIT14 381F13 EcnomtrisInR PDF
70 pages
R programming.Q.A
No ratings yet
R programming.Q.A
13 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
Mid 1 Answers IDS
No ratings yet
Mid 1 Answers IDS
22 pages
Module2 Ids 240201 162026
No ratings yet
Module2 Ids 240201 162026
11 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
Module 1
No ratings yet
Module 1
91 pages
4 III BTech Minor DS Courses Syllabus
No ratings yet
4 III BTech Minor DS Courses Syllabus
5 pages
Syllabus Online Learning (DV+ML) Compress
No ratings yet
Syllabus Online Learning (DV+ML) Compress
26 pages
Data Using R
No ratings yet
Data Using R
205 pages
01-R Basics
No ratings yet
01-R Basics
65 pages
IAT 2 Part A - DS
No ratings yet
IAT 2 Part A - DS
5 pages
CU Data Science
No ratings yet
CU Data Science
8 pages
Statistical Analysis and Visualizations Using R: Okan Bulut
No ratings yet
Statistical Analysis and Visualizations Using R: Okan Bulut
96 pages
R Code Intro
No ratings yet
R Code Intro
46 pages
Contents
No ratings yet
Contents
17 pages
Practical File R by Komal
No ratings yet
Practical File R by Komal
26 pages
(1111) An Introduction To R Programming
No ratings yet
(1111) An Introduction To R Programming
136 pages
B Ei
No ratings yet
B Ei
44 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
Stats With R
No ratings yet
Stats With R
103 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Data Science
No ratings yet
Data Science
13 pages
Project
No ratings yet
Project
36 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
DS IAT 2 Question Bank
No ratings yet
DS IAT 2 Question Bank
7 pages
Data (MCS102) Module 1
No ratings yet
Data (MCS102) Module 1
40 pages
Nac PDF
No ratings yet
Nac PDF
23 pages
Unit 2
No ratings yet
Unit 2
48 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Data Science & Machine Learning by Using R Programming
No ratings yet
Data Science & Machine Learning by Using R Programming
6 pages
Afin8015 Topic 1 2023.
No ratings yet
Afin8015 Topic 1 2023.
64 pages
Data Science Brochure - Jan
No ratings yet
Data Science Brochure - Jan
14 pages
IDS (R22) U1 NotesRK 03092024
No ratings yet
IDS (R22) U1 NotesRK 03092024
22 pages
R Curriculum - 30 Hrs
No ratings yet
R Curriculum - 30 Hrs
3 pages
R Programming For Data Science. A Comprehensive Guide To R Programming... 2024
No ratings yet
R Programming For Data Science. A Comprehensive Guide To R Programming... 2024
235 pages
R Short Course
No ratings yet
R Short Course
40 pages
Learn Data Scoence Learnbay
No ratings yet
Learn Data Scoence Learnbay
8 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
R For Everyone - For Data Science
No ratings yet
R For Everyone - For Data Science
10 pages
Introduction & Data Science Platforms
No ratings yet
Introduction & Data Science Platforms
31 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Question Bank R
No ratings yet
Question Bank R
19 pages
Project Report On NEXA SOFTWARE
No ratings yet
Project Report On NEXA SOFTWARE
45 pages
ESDL Lab Manual
No ratings yet
ESDL Lab Manual
7 pages
(IJCST-V5I2P73) :supriya Haribhau Pawar, Prof. Dr. Devendrasingh Thakore
No ratings yet
(IJCST-V5I2P73) :supriya Haribhau Pawar, Prof. Dr. Devendrasingh Thakore
4 pages
COA CSE 2009 Module 1
No ratings yet
COA CSE 2009 Module 1
90 pages
Modpoll Modbus Master Simulator
No ratings yet
Modpoll Modbus Master Simulator
2 pages
Best Practices - BW Data Loading & Performance
0% (1)
Best Practices - BW Data Loading & Performance
37 pages
Drive List Box
No ratings yet
Drive List Box
3 pages
PiMSR - MMS, Class of '19 - SIP Report Format
No ratings yet
PiMSR - MMS, Class of '19 - SIP Report Format
6 pages
Cie Lab and Theory Template
No ratings yet
Cie Lab and Theory Template
14 pages
The Psychological Report
No ratings yet
The Psychological Report
13 pages
Snort Manual
No ratings yet
Snort Manual
266 pages
What Are Document Databases
No ratings yet
What Are Document Databases
3 pages
Conditional Formatting Visible and Intuitive
No ratings yet
Conditional Formatting Visible and Intuitive
2 pages
International Journal of Multidisciplinary: Applied Business and Education Research
No ratings yet
International Journal of Multidisciplinary: Applied Business and Education Research
20 pages
Provincial
No ratings yet
Provincial
4 pages
COA - Module 3 Computer Arithmetic - Part2
No ratings yet
COA - Module 3 Computer Arithmetic - Part2
43 pages
Coa m3 Part2 Extra Slides
No ratings yet
Coa m3 Part2 Extra Slides
66 pages
Ai BCS401
No ratings yet
Ai BCS401
2 pages
Chapter 6 Foundation of Business Intelligence
No ratings yet
Chapter 6 Foundation of Business Intelligence
6 pages
Eit-505 Iscl Unit-4 Notes
No ratings yet
Eit-505 Iscl Unit-4 Notes
10 pages
E.F.Codd - A Relational Model of Data For Large Shared Data Banks
No ratings yet
E.F.Codd - A Relational Model of Data For Large Shared Data Banks
16 pages
A Thesis Submitted To The Department of English and Humanities of BRAC University by Ishrat Akhter ID: 15163012
No ratings yet
A Thesis Submitted To The Department of English and Humanities of BRAC University by Ishrat Akhter ID: 15163012
66 pages
Rigol GradientOne - One Pager
No ratings yet
Rigol GradientOne - One Pager
1 page
COA CSE 2009 Module-2 Part-1
No ratings yet
COA CSE 2009 Module-2 Part-1
40 pages
Artificial Aesthetics and Ethical Ambiguity - Exploring Business Ethics in The Context of AI Driven Creativity
No ratings yet
Artificial Aesthetics and Ethical Ambiguity - Exploring Business Ethics in The Context of AI Driven Creativity
22 pages
Question Bank For Internal Assessment
No ratings yet
Question Bank For Internal Assessment
3 pages
Multi Cloud and Edge Computing
No ratings yet
Multi Cloud and Edge Computing
2 pages
ChatGPT Prompt 3
No ratings yet
ChatGPT Prompt 3
2 pages
LIB101 Chapter 1
No ratings yet
LIB101 Chapter 1
19 pages
10 Ways Cdos Can Succeed in Forging A Data Driven Organization
No ratings yet
10 Ways Cdos Can Succeed in Forging A Data Driven Organization
19 pages
Data Entry Clerk - Mark Osundwa Barasa CV
No ratings yet
Data Entry Clerk - Mark Osundwa Barasa CV
6 pages
Grant/Revoke Privileges: Description
No ratings yet
Grant/Revoke Privileges: Description
5 pages
PraxisPoeticsRTD Proceedings PDF
No ratings yet
PraxisPoeticsRTD Proceedings PDF
215 pages
Magnetic Tapes Cd-Rom: Adil Yousif, PHD
No ratings yet
Magnetic Tapes Cd-Rom: Adil Yousif, PHD
31 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Mtech Final

Uploaded by

Mtech Final

Uploaded by

Mtech 1st semester

Course Title: Data Science and application Course Code:

Scheme & Solution

1a Define Data science. Explain the Venn diagram of Data Science. 10

Venn Diagram of Data Science  4M

b Write a short note about basics of R. Write an R program to print the 10

2a Explain the following concepts with examples: 10

3a Briefly explain Data Science process with a neat diagram. 2 2 10

5a Explain Naïve Bayes Algorithm for filtering Spam with example. 2 3 10

It produces continuous values instead of distinct class labels (spam or ham).

+ Decision Boundary Issues

+ Probabilities May Go Out of Range

+ Does Not Handle Text Data Well

Scraping the Web with APIs

8a Write short notes on: 3 3 10

You might also like