This document is an examination paper for the II Semester of the M.Tech program in Computer Science & Engineering, focusing on Data Preparation and Analysis. It consists of five questions, each with multiple parts covering topics such as scalability issues, Big Data, Hadoop, data cleaning methods, and data visualization. Students are required to attempt all questions, with equal marks allocated to each part.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views1 page
Data Preparation and Analysis (MCST-231)
This document is an examination paper for the II Semester of the M.Tech program in Computer Science & Engineering, focusing on Data Preparation and Analysis. It consists of five questions, each with multiple parts covering topics such as scalability issues, Big Data, Hadoop, data cleaning methods, and data visualization. Students are required to attempt all questions, with equal marks allocated to each part.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
Sub Code: MCST-231 ROLL NO……………..……………..
II SEMESTER EXAMINATION, 2022 – 23
Year: 1st, Programme: M.Tech, Branch: Computer Science & Engineering Subject: Data Preparation and Analysis Duration: 3:00 hrs Max Marks: 100 Note: - Attempt all questions. All Questions carry equal marks. In case of any ambiguity or missing data, the same may be assumed and state the assumption made in the answer.
Q 1. Answer any four parts of the following. 5x4=20
a) Explain the Scalability issues in data preparation. b) Write the overview of preparing data tables. c) Explain Regression ANOVA. d) Discuss about Big Data and its importance. e) Explain the working of Hadoop. f) Explain predictive analysis with suitable example. Q 2. Answer any four parts of the following. 5x4=20 a) Explain the 4 V’s of Big Data. b) Describe about Converting Continuous Data to Categories. c) What are the data cleaning methods? d) Discuss about EDA. e) Explain Data Visualization with suitable example f) Differentiate between correlation and simple linear regression. Q 3. Answer any two parts of the following. 10x2= 20 a) Explain creating the components of Hadoop Map reduce jobs b) Discuss the installation of and running Hive QL. c) Explain Oracle Big Data in detail. Q 4. Answer any two parts of the following. 10x2= 20 a) Explain investigating the Hadoop Distributed File System Selecting appropriate execution modes: local, pseudo-distributed, fully distributed. b) Explain Inter- and Trans-Firewall Analytics also explain information management c) Describe with an example of Geolocated data visualization. Q 5. Answer any two parts of the following. 10x2= 20 a) Explain how to deal with the missing data in data cleaning process? b) Explain distributing data processing across server farms in detail with example c) Explain how to Visualize similarities between social network groups using multidimensional scaling (MDS)