0% found this document useful (0 votes)
34 views13 pages

Bda Ap

BDA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

Bda Ap

BDA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

ACADEMIC PLANNER

FOR
ACADEMIC YEAR
2023-24

COURSE : IV YEAR B.TECH CSM- I - SEM - R20

SUBJECT : Big Data Analytics [BDA]


[AI732PE]

PRESENTED BY : Dr. S.RASHEEDUDDIN


CONTENTS
S.No Content

1 Preamble / Introduction

2
Prerequisites
3
Objectives and Outcomes
Syllabus
4 1.JNTU/R20-CMREC
2.GATE
List of Expert Details (Local/National/International with
5 Contact details/Profile link/Blogs/their research Contribution
towards the subject)
6 Journals with min 5 ref paper
7 Subject -Lesson plan
8 Suggested Books (prescribed and References)
Websites for self learning Resources like
9 www.geeksforgeeks.org,www.schools.com, Coursera, edX,
Udemy, Khan Academy, NPTEL etc along Registration
procedures

10 Question Banks
1.JNTU-Model papers
2.GATE
Two case study presentations with Project / Product/
11
Model /Prototypes/ Industrial applications.
12 Assignment Question/Innovative Assignments sets.
13 List of topics for students Seminars with Guidelines
14 STEP/Course material in softcopy
15 Expert Lectures with topics & Schedules(if any)
1. COURSE INTRODUCTION: Big Data Analytics
 Big Data is a collection of data that is huge in volume, yet growing

exponentially with time.

 It is a data with so large size and complexity that none of traditional data

management tools can store it or process it efficiently.

 Big data analytics is the often complex process of examining big data to

uncover information, such as hidden patterns, correlations, market trends

and customer preferences -- that can help organizations make informed

business decisions.

2. PREREQUISITES:
1. SQL,
2. Any one Programming Language

3. OBJECTIVES & OUTCOME


 Understand Big Data and its analytics in the real world

 Analyze the Big Data framework like Hadoop and NOSQL to efficiently store and

process Big Data to generate analytics

 Design of Algorithms to solve Data Intensive Problems using Map Reduce Paradigm

 Design and Implementation of Big Data Analytics using pig and spark to solve data

intensive problems and to generate analytics

 Implement Big Data Activities using Hive

4.1 SYLLABUS – CMREC AUTONOMOUS

AI732PE : BIG DATA ANALYTICS


( Professional Elective III )
L T P C

3 0 0 3
Unit – I

Data Management (NOS 2101): Design Data Architecture and manage the data
for
analysis, understand various sources of Data like Sensors/signal/GPS etc. Data Management,
Data Quality (noise, outliers, missing values, duplicate data) and Data Preprocessing. Export
all the data onto Cloud ex. AWS/Rackspace etc.

Maintain Healthy, Safe & Secure Working Environment (NOS


9003): Introduction,
workplace safety, Report Accidents & Emergencies, protect health & safety as your work,
course conclusion, assessment.

Unit – II
Big Data Tools (NOS 2101): Introduction to Big Data tools like Hadoop, Spark,
Impala etc., Data ETL process, Identify gaps in the data and follow-up for decision
making.
Provide Data/information In Standard Formats (NOS 9004): Introduction,
Knowledge Management, and Standardized reporting & compliances, Decision Models,
course conclusion. Assessment.

Unit – III
Big Data Analytics: Run descriptive to understand the nature of the available data,
collate all the data sources to suffice business requirement, Run descriptive statistics for all
the variables and observer the data ranges, Outlier detection and elimination.

Unit – IV

Machine Learning Algorithms (NOS 9003): Hypothesis testing and determining


the multiple analytical methodologies, Train Model on 2/3 sample data using various
Statistical/Machine learning algorithms, Test model on 1/3 sample for prediction etc.

Unit – V

(NOS 9004) Data Visualization (NOS 2101): Prepare the data for Visualization,
Use
tools like Tableau, QlickView and 03, Draw insights out of Visualization tool. Product
Implementation

TEXT BOOK
1. Student’s Handbook for Associate Analytics.

REFERENCE BOOKS
1. Introduction to Data Mining, Tan, Steinbach and Kumar, Addison Wesley, 2006.
2. Data Mining Analysis and Concepts, M. Zaki and W. Meira (the authors have
kindly made an online version available):
http:/iwww.datamininqbook. infoluploads/book.pdf

3. Mining of Massive Datasets Jure Leskovec Stanford Univ. Anand


RajaramanMilliway Labs Jeffrey D. Ullman Stanford
Univ.(hftp:llwww.vistrails.orglindex.phplCourse:_Big_Data_Analysis)

4.2 SYLLABUS - GATE


NOT APPLICABLE

4.3 SYLLABUS - IES


NOT APPLICABLE

5 EXPERT DETAILS:
The Expert Details which have been mentioned below are only a few of the
eminent ones known Internationally, Nationally and Locally. There are a few
others known as well.

International :
1.
2.

National:
1. Prof. D.Lakshmi,
Professor,
Dept. of CSE
VIT Bhopal - INDIA

2. Ms.Palak Gupta
Assistant Professor
[email protected]
Regional:

1. Prof. Shameem Fathima


Professor Dept. Of CSE,
O.U – Hyderabad
E-mail:[email protected]
Phone :+91 9848519860
040-27097125

2. Prof. C Krishna Mohan


Professor
Department of Computer Science and Engineering &
Dean (Public and Corporate Relations)
Email : [email protected]
Mobile :94917 12312
Website : www.iith.ac.in/-~ckm/

6 Journals:
International
1. International Journal of Data Science and Analytics | Volumes and issues (springer.com)
2. International Journal of Big Data and Analytics in Healthcare (IJBDAH): 2379-738X, 2379-7371:

Medicine & Healthcare Journals | IGI Global (igi-global.com)


3. International Journal of Data Science and Advanced Analytics (ijdsaa.com)

4. 🏆 International Journal of Data Science and Analytics | Impact Factor | Indexing | Acceptance rate |

Abbreviation - Open access journals (openacessjournal.com)


5. Big Data Analytics | Home page (biomedcentral.com)

National:
1. Journal of Big Data | Call for papers: Big Data in Human Behaviour Research: a Contextual Turn?
(springeropen.com)
2. Frontiers | Segmentation of Multi-Regional Skeletal Muscle in Abdominal CT Image for Cirrhotic
Sarcopenia Diagnosis (frontiersin.org)
3. JBD | A Survey of Machine Learning for Big Data Processing (techscience.com)
4. Big Data and Information Analytics (aimspress.com)
5. International Journal of Big Data Management (IJBDM) Inderscience Publishers - linking academia,
business and industry through research
7. SUBJECT (LESSON) PLAN :

NO. OF
Topic Suggested Method Of
S.NO Sub-Topic LECTURES
Syllabus Books Teaching
REQUIRED

UNIT – I
Data Management
(NOS2101): Design Data M4
1 Architecture and manage L1 T1, R2
the data for analysis

Understand various sources M1


2 of Data Like Sensors\ L2 T1, R1
Signal\GPS etc.

Data Management, Data M1


Quality ( Noise, Outliers,
3 missing Values, Duplicate L3 T1,R1
Date

4 Data Preprocessing L4 T1,R1 M1

Export all the data on to M1


5 UNIT-I cloud Ex AWS\Rackspace L5 T1,R3
etc.

Maintain Healthy, Safe & M1


Secure Working T1,R2
6 L6
Environment(NOS9003)

Introduction, Workplace M1
7 Safety L7 T1,R3

Report Accidents & M4


8 Emergencies L8 T1,R2

Protech Health & Safety as M4


9 your Work L9 T1, R3

Course Conclusion &


10 Assessment. L10 T1, R1

UNIT-2

11 UNIT-II Big Data Tools( NOS2101) L11 T1 M1&M4

Introduction to Big Data M1&M4


12 tools like Hadoop, spark, L12 T1, R2
Impala etc

13 Data ETL Process L13 T1, R1 M1&M4


Identify gaps in the data M1&M4
14 and follow-up for decision L14 T1,R1
making.

Provide Data \ Information M1&M4


15 in Standard L15 T1,R1
Formats(NOS9004)

Introduction, Knowledge M1&M4


16 Management L16 T1,R3

Standardized reporting & T1,R2 M1&M4


17 compliaces L17

Decision Models, Course M1&M4


18 Conclusion, Assessment L18 T1,R3

Big Data Analytics: Run M4


Descriptive to understand
19 the nature of the available L19 T1,R2
Data

Big Data Analytics: Run M1


Descriptive to understand
20 the nature of the available L20 T1, R2
Data
UNIT-
Collate all the data sources M1
III to suffice Business
21 L21 T1, R1
requirements

Run descriptive statistics M1


22 for all the variables and L22 T1,R1
observe the data ranges

Outlier Detection & M1


23 Elimination L23 T1,R1

Outlier Detection & M1


24 Elimination L24 T1,R3

UNIT-IV

UNIT-IV Machine Learning M1&M4


Algorithm: NOS(9003)
25 L25 T1,R1
Introduction

26 Hypothesis Testing L26 T1,R3 M1&M4

Hypothesis Testing T1,R2 M1&M4


27 L27

Hypothesis Testing L28 M4&M4


28 T1,R3

29 Analyzing the multiple L29 T1,R2 M4&M4


analytical methodologies
Analyzing the multiple M4&M5
30 analytical methodologies L30 T1, R3

Train Model on 2/3 sample data M1&M4


31 using various statistical / L31 T1, R1
Machine Learning algorithms

Test Model on 1/3 predication M1&M4


32 L32 T1
etc.

33 Seminar L33 -- M5

34 Test L34 M5
--

UNIT-V

Data Visualization ( NOS2101) M1


35 L35 T1,R1

Prepare the data for M1


36 L36 T1,R3
visualization
Using Tools like Tableau T1,R2 M1
37 L37

38 Quick view and D3 L38 T1,R3 M1

UNIT-V Draw insights of out of M4


39 L39 T1,R2
Visualization Tool

40 Product Implementation L40 T1, R3 M4

41 Revision-1 L41 T1, R1 M5

42 Revision-2 L42 T1,R2 M5

43 Revision-3 L43 T1 M5

METHODS OF TEACHING:

M1 : Lecture Method M6 : Tutorial

M2 : Demo Method M7 : Assignment

M3 : Guest Lecture M8 : Industry Visit

M4 : Presentation /PPT M9 : Project Based

M5 : Lab/Practical /Activity M10 : Charts / OHP


8 Suggested Books:

TEXT BOOKS:

T1: 1. Student’s Handbook for Associate Analytics.


REFERENCE BOOKS:

R1. Introduction to Data Mining, Tan, Steinbach and Kumar, Addison Wesley, 2006.

R2.Data Mining Analysis and Concepts, M. Zaki and W. Meira (the authors have kindly made
an online version available)

http:/www.datamininqbook. infoluploads/book.pdf

R3. Mining of Massive Datasets Jure Leskovec Stanford


Univ. Anand RajaramanMilliway Labs Jeffrey D. Ullman Stanford
Univ.(hftp:llwww.vistrails.orglindex.phplCourse:_Big_Data_Analysis)

9. Websites
Do not confine yourself to the list of websites mentioned here alone. Be cognizant and keep yourself abreast of
the others too.
The given list is not exhaustive.

1. Mastering Big Data Analytics - Great Learning (mygreatlearning.com)


2. https://www.sciencedirect.com/
3. www.towardsdatascience.com
4. http://www.analyticsvidhya.com/
5. http://www.dataversity.net/
6. http://www.smartdatacollective.com/
7. http://www.datasciencecentral.com/
8. http://www.planetbigdata.com/
9. http://www.dataroundtable.com/
10. https://gigaom.com/

Mooc- Nptel
https://nptel.ac.in/courses/
Geeksforgeeks
https://www.geeksforgeeks.org/
GURU99
https://www.guru99.com/

11. CASE STUDY:

https://www.bernardmarr.com/img/bigdata-case-studybook_final.pdf

https://techvidvan.com/tutorials/top-10-big-data-case-studies/

https://nap.nationalacademies.org/read/23654/chapter/5#37

2. Google Big data and big business go hand in hand –


this is the first in a series where I will examine the different uses that the world’s leading
corporations are making of the endless amount of digital information the world is producing
every day. Google has not only significantly influenced the way we can now analyse big data
(think MapReduce, BigQuery, etc.) – but they are probably more responsible than anyone
else for making it part of our everyday lives. I believe that many of the innovative things
Google is doing today, most companies will do in years to come. Many people, particularly
those who didn’t get online until this century had started, will have had their first direct
experience of manipulating big data through Google. Although these days Google’s big data
innovation goes well beyond basic search, it’s still their core business. They process 3.5 billion
requests per day, and each request queries a database of 20 billion web pages. big data -
case study collection 3 This is refreshed daily, as Google’s bots crawl the web, copying down
what they see and taking it back to be stored in Google’s index database. What pushed
Google in front of other search engines has been its ability to analyse wider data sets for
their search. Initially it was PageRank which included information about sites that linked to a
particular site in the index, to help take a measure of that site’s importance in the grand
scheme of things. Previously leading search engines worked almost entirely on the principle
of matching relevant keywords in the search query to sites containing those words. PageRank
revolutionized search by incorporating other elements alongside keyword analysis. Their aim
has always been to make as much of the world’s information available to as many people as
possible (and get rich trying, of course…) and the way Google search works has been
constantly revised and updated to keep up with this mission. Moving further away from
keyword-based search and towards semantic search is the current aim. This involves
analysing not just the “objects” (words) in the query, but the connection between them, to
determine what it means as accurately as possible. To this end, Google throws a whole heap
of other information into the mix. Starting in 2007 it launched Universal Search, which pulls
in data from hundreds of sources including language databases, weather forecasts and
historical data, financial data, travel information, currency exchange rates, sports statistics
and a database of mathematical functions. It continued to evolve in 2012 into the Knowledge
Graph, which big data - case study collection 4 displays information on the subject of the
search from a wide range of resources directly into the search results. It then mixes what it
knows about you from your previous search history (if you are signed in), which can include
information about your location, as well as data from your Google+ profile and Gmail
messages, to come up with its best guess at what you are looking for. The ultimate aim is
undoubtedly to build the kind of machine we have become used to seeing in science fiction
for decades – a computer which you can have a conversation with in your native tongue, and
which will answer you with precisely the information you want. Search is by no means all of
what Google does, though. After all, it’s free, right? And Google is one of the most profitable
businesses on the planet. That profit comes from what it gets in return for its searches –
information about you. Google builds up vast amounts of data about the people using it.
Essentially it then matches up companies with potential customers, through its Adsense
algorithm. The companies pay handsomely for these introductions, which appear as adverts
in the customers’ browsers. In 2010 it launched BigQuery, its commercial service for allowing
companies to store and analyse big data sets on its cloud platforms. Companies pay for the
storage space and computer time taken in running the queries. Another big data project
Google is working on is the self-driving car. Using and generating massive amounts of data
from sensors, big data - case study collection 5 cameras, tracking devices and coupling this
with on-board and realtime data analysis from Google Maps, Streetview and other sources
allows the Google car to safely drive on the roads without any input from a human driver.
Perhaps the most astounding use Google have found for their enormous data though, is
predicting the future. In 2008 the company published a paper in the science journal Nature
claiming that their technology had the capability to detect outbreaks of flu with more
accuracy than current medical techniques for detecting the spread of epidemics. The results
were controversial – debate continues over the accuracy of the predictions. But the incident
unveiled the possibility of “crowd prediction”, which in my opinion is likely to be a reality in
the future as analytics becomes more sophisticated. Google may not quite yet be ready to
predict the future – but its position as a main player and innovator in the big data space
seems like a safe bet.

3. GE General Electric – a literal powerhouse of a corporation involved in virtually every area of industry,
has been laying the foundations of what it grandly calls the Industrial Internet for some time now.

But what exactly is it? Here’s a basic overview of the ideas which they are hoping will transform industry, and
how it’s all built around big data.
If you’ve heard about the Internet of Things which I’ve written about previously , a simple way to think of the
industrial internet is as a subset of that, which includes all the data-gathering, communicating and analysis
done in industry.
In essence, the idea is that all the separate machines and tools which make an industry possible will be
“smart” – connected, data-enabled and constantly reporting their status to each other in ways as creative as
their engineers and data scientists can devise. big data - case study collection 7 This will increase efficiency by
allowing every aspect of an industrial operation to be monitored and tweaked for optimal performance, and
reduce down-time – machinery will break down less often if we know exactly the best time to replace a worn
part. Data is behind this transformation, specifically the new tools that technology is giving us to record and
analyse every aspect of a machine’s operation. And GE is certainly not data poor – according to Wikipedia, its
2005 tax return extended across 24,000 pages when printed out. And pioneering is deeply engrained in its
corporate culture – being established by Thomas Edison, as well as being the first private company in the
world to own its own computer system, in the 1960s. So of all the industrial giants of the pre-online world, it
isn’t surprising that they are blazing a trail into the brave new world of big data.
GE generates power at its plants which is used to drive the manufacturing that goes on in its factories, and its
financial divisions enable the multi-million transactions involved when they are bought and sold. With fingers
in this many pies, it’s clearly in the position to generate, analyse and act on a great deal of data.

Sensors embedded in their power turbines, jet engines and hospital scanners will collect the data – it’s
estimated that one typical gas turbine will generate 500Gb of data every day. And if that data can be used to
improve efficiency by just 1% across five of their key sectors that they sell to, those sectors stand to make
combined savings of $300 billion. With those kinds of savings within sight, it isn’t surprising that GE big data -
case study collection 8 is investing heavily. In 2012 they announced $1 billion was being invested over four
years in their state-of-the-art analytics centre in San Ramon, California, in order to attract pioneering data
talent to lay the software foundations of the Industrial Internet.
In aviation, they are aiming to improve fuel economy, maintenance costs, reduction in delays and
cancellations and optimize flight scheduling – while also improving safety. Abu Dhabi-based Etihad Airways
was the first to deploy their Taleris Intelligent Operations technology, developed in partnership with
Accenture. Huge amounts of data are recorded from every aircraft and every aspect of ground operations,
which is reported in real-time and targeted specifically to recovering from disruption, and returning to regular
schedule. And last year it launched its Hadoop based database system to allow its industrial customers to
move its data to the cloud. It claims it has built the first infrastructure which is solid enough to meet the
demands of big industry, and works with its GE Predictivity service to allow real-time automated analysis. This
means machines can order new parts for themselves and expensive downtime minimized – GE estimates that
its contractors lose an average of $8 million per year due to unplanned downtime. Green industries are
benefitting too – its 22,000 wind turbines across the globe are rigged with sensors which stream constant data
to the cloud, which operators can use to remotely fine-tune the pitch, speed, and direction the blades are
facing, to capture as much of the energy from the wind as possible. big data - case study collection 9 Each
turbine will speak to others around it, too – allowing automated responses such as adapting their behaviour to
mimic more efficient neighbours, and pooling of resources (i.e wind speed monitors) if the device on one
turbine should fail. Their data gathering extends into homes too – millions are fitted with their smart meters
which record data on power consumption, which is analysed together with weather and even social media
data to predict when power cuts or shortages will occur. GE has come further and faster into the world of big
data than most of its old-school tech competitors. It’s clear they believe the financial incentive is there –
chairman and CEO Jeff Immelt estimates that they could add $10 trillion to $15 trillion to the world’s economy
over the next two decades. In industry, where everything including resources is finite, efficiency is of utmost
importance – and GE are demonstrating with the Industrial Internet that they believe big data is the key to
unlocking its potential.

You might also like