0% found this document useful (0 votes)
33 views3 pages

DSV Assignment - SJCIT

Uploaded by

vinutha k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views3 pages

DSV Assignment - SJCIT

Uploaded by

vinutha k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

SJCIT Assignment

Estd: 1986

Department of Computer Science & Engineering


ASSIGNMENT
SUBJECT TITLE Data Science and Visulization

SUBJECT TYPE Professional Elective

SUBJECT CODE 21CS644

ACADEMIC YEAR 2023-24(Even) BATCH 2021

SCHEME 2021

SEMESTER VI
FACULTY NAME and
Prof. Ajay N, Assistant Professor & Dr. Shrihari M R, Associate Professor
DESIGNATION

Module -1

Q. Questions Bloom’s COs


No. LL
1 Distinguish between Big Data and Data Science. L3 CO1
What is data science? Is it new, or is it just statistics or analytics
2 rebranded? Is it real, or is it pure hype? And if it’s new and if it’s real, L3 CO1
what does that mean?
3 Analyze the Drew Conway’s Venn diagram of data science. L3 CO1
4 Compare the Data Scientist with respect to Academia and Industry. L4&L5 CO1
5 Illustrate the Data Science Profile. L4&L5 CO1

Module -2

Q. Questions Blooms COs


No. LL
You can simulate fake datasets in R programming language. Build a
1 regression model and see that it recovers the true values of the βs.(hint: L3 CO2
Ch. 3 pg # 70.)
From Q#1, Simulate another fake variable x2 that has a Gamma
distribution with parameters you pick. Now make the truth be that y is a
2 linear combination of both x1 and x2. Fit a model that only depends on L3 CO2
x1. Fit a model that only depends on x2. Fit a model that uses both. Vary
the sample size and make a plot of mean square error of the training set
and of the test set versus sample size.
From Q#1, Create a new variable, z, that is equal to x1 2 . Include this as
one of the predictors in your model. See what happens when you fit a
3 model that depends on x1 only and then also on z. Vary the sample size L3 CO2
and make a plot of mean square error of the training set and of the test set
versus sample size.

Page | 1
SJCIT Assignment

There are 31 datasets named nyt1.csv, nyt2.csv,…,nyt31.csv, which you


can find here: https://github.com/oreillymedia/doing_data_science. Each
one represents one (simulated) day’s worth of ads shown and clicks
recorded on the New York Times home page in May 2012. Each row
represents a single user. There are five columns: age, gender (0=female,
1=male), number impressions, number clicks, and logged-in.
Once you have the data loaded, it’s time for some EDA:
1. Create a new variable, age_group, that categorizes users as "<18",
"18-24", "25-34", "35-44", "45-54", "55-64", and "65+".
2. For a single day:
• Plot the distributions of number impressions and clickthrough-rate
4 (CTR=# clicks/# impressions) for these six age categories. L4&L5 CO2
• Define a new variable to segment or categorize users based on their
click behavior.
• Explore the data and make visual and quantitative comparisons across
user segments/demographics (<18-year-old males versus <
18-year-old females or logged-in versus not, for example).
• Create metrics/measurements/statistics that summarize the data.
Examples of potential metrics include CTR, quantiles, mean, median,
variance, and max, and these can be calculated across the various user
segments. Be selective. Think about what will be important to track
over time—what will compress the data, but still capture user
behavior.
From the Q# 4.
5 i. Now extend your analysis across days. Visualize some metrics and L4&L5 CO2
distributions over time.
ii. Describe and interpret any patterns you find.
Module -3

Q. Questions Bloom’s COs


No. LL
1 What are the Ethical Implications of a Robo-Grader? Analyse Human L3 CO3
graders aren’t always fair?
2 From Q# 1, Are machines making things more structured, and is this L3 CO3
inhibiting creativity?
3 From Q# 1, Is the goal of a test to write a good essay or to do well in a L3 CO3
standardized test?
4 Take a example of college student, implement a decision tree. (hint ch. 7 L4&L5 CO3
pg# 184).
5 Implement a recommendation system on a relatively small datasets. (hint L4&L5 CO3
Ch.8 pg# 214)
Module -4

Q. Questions Bloom’s COs


No. LL
1 Compare Line Chart and Bar Chart. L3 CO4
2 Write a Python program to Demonstrate how to Draw a Pie Chart using L3 CO4
Matplotlib.
3 Write a Python program to illustrate linear plotting with line formatting L3 CO4
using Matplotlib.
4 Write a Python program which explains uses of customizing seaborn plots L4&L5 CO4

Page | 2
SJCIT Assignment

with Aesthetic functions.


5 Create a 3D scatter plot. L4&L5 CO4

Module -5

Q. Questions Bloom’s COs


No. LL
1 Write a Python program to Demonstrate how to Draw a Bar Plot using L3 CO5
Matplotlib
2 Write a Python program to Demonstrate how to Draw a Scatter Plot using L3 CO5
Matplotlib
3 Write a Python program to Demonstrate how to Draw a Histogram Plot L3 CO5
using Matplotlib.
Create a Bokeh line graph to visualize the historical stock prices of two
4 companies (Company A and Company B) and annotate important events L4&L5 CO5
on the graph. Additionally, include legends to identify each company's
stock price.
5 Write a Python program to create an interactive map using Plotly Libraries. L4&L5 CO5

Page | 3

You might also like