0% found this document useful (0 votes)
52 views

Assignment 1 Based On Unit 1

The document discusses the roles, skills, and challenges associated with big data and analytics. It defines big data and big data analytics, and describes the different types of analytics including basic, operationalized, advanced, and monetized analytics. It also covers the data analytics lifecycle and issues related to big data such as scale, security, schemas, availability, consistency, and data quality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Assignment 1 Based On Unit 1

The document discusses the roles, skills, and challenges associated with big data and analytics. It defines big data and big data analytics, and describes the different types of analytics including basic, operationalized, advanced, and monetized analytics. It also covers the data analytics lifecycle and issues related to big data such as scale, security, schemas, availability, consistency, and data quality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Assignment 1 Based on Unit 1

Unit I

1. What are the key roles for the New Big Data Ecosystem?
The Big Data ecosystem demands three categories of roles, as shown in Figure

1)Deep Analytical Talent- is technically savvy, with strong analytical skills. Members
possess a combination of skills to handle raw, unstructured data and to apply complex
analytical techniques at massive level.
This group has advanced training in quantitative disciplines, such as mathematics,
statistics, and machine learning. To do their jobs, members need access to a robust
analytic sandbox or workspace where they can perform large-scale analytical data
experiments.
Examples of current professions fitting into this group include statisticians, economists,
mathematicians, and the new role of the Data Scientist.
2) Data Savvy Professionals-has less technical depth but has a basic knowledge of
statistics or machine learning and can define key questions that can be answered using
advanced analytics.
These people tend to have a base knowledge of working with data, or an appreciation
for some of the work being performed by data scientists and others with deep analytical
talent.
Examples of data savvy professionals include financial analysts, market research analysts,
life scientists, operations managers, and business and functional managers.
3) Technology and Data Enablers- This group represents people providing technical
expertise to support analytical projects, such as provisioning and administrating
analytical sandboxes, and managing large-scale data architectures that enable
widespread analytics within companies and other organizations.
This role requires skills related to computer engineering, programming, and database
administration.
2. What are key skill sets and behavioural characteristics of a data scientist?
Data scientists are generally thought of as having five main sets of skills and behavioural
characteristics, as shown in Figure

Quantitative skill: such as mathematics or statistics


Technical aptitude: namely, software engineering, machine learning, and programming
skills
Skeptical mind-set and critical thinking: It is important that data scientists can examine
their work critically rather than in a one-sided way.
Curious and creative: Data scientists are passionate about data and finding creative ways
to solve problems and portray information.
Communicative and collaborative: Data scientists must be able to understand the
business value in a clear way and collaboratively work with other groups, including
project sponsors and key stakeholders.
Data scientists are generally comfortable using this blend of skills to acquire, manage,
analyze, and visualize data and tell compelling stories about it.

3. Define Big data. What is big data analytics? Explain in detail with its example.
i)Big data is high-velocity and high-variety information assets that demand cost effective,
innovative forms of information processing for enhanced insight and decision making.
Big data refers to datasets whose size is typically beyond the storage capacity of and also
complex for traditional database software tools.
ii) Big Data analytics is the process of collecting, organizing and analyzing a large amount
of data to discover hidden pattern, correlation and other meaningful insights.
Big Data Analytics is...
4. Write a short note on Classification of Analytics.
There are basically two schools of thought:
Those that classify analytics into basic, operationalized, advanced and Monetized.
Those that classify analytics into analytics 1.0, analytics 2.0, and analytics 3.0.
i) First School of Thought
It includes Basic analytics, Operationalized analytics, Advanced analytics and Monetized
analytics.
a) Basic analytics: This primarily is slicing and dicing of data to help with basic business
insights. This is about reporting on historical data, basic visualization, etc.
b)Operationalized analytics: It is operationalized analytics if it gets woven into the
enterprises business processes.
c)Advanced analytics: This largely is about forecasting for the future by way of predictive
and prescriptive modelling.
d)Monetized analytics: This is analytics in use to derive direct business revenue.

ii) Second School of Thought

Figure shows the subtle growth of analytics from Descriptive → Diagnostic → Predictive
→ Perspective analytics.
5. Describe the Challenges of Big Data.
There are mainly seven challenges of big data:
Scale: Storage (RDBMS (Relational Database Management System) or NoSQL (Not only
SQL)) is one major concern that needs to be addressed to handle the need for scaling
rapidly and elastically. The need of the hour is a storage that can best withstand the
attack of large volume, velocity and variety of big data. Should you scale vertically or
should you scale horizontally?
Security: Most of the NoSQL big data platforms have poor security mechanisms (lack of
proper authentication and authorization mechanisms) when it comes to safeguarding big
data. A spot that cannot be ignored given that big data carries credit card information,
personal information and other sensitive data.
schema: Rigid schemas have no place. We want the technology to be able to fit our big
data and not the other way around. The need of the hour is dynamic schema. Static (pre-
defined schemas) are obsolete.
Continuous availability: The big question here is how to provide 24/7 support because
almost all RDBMS and NoSQL big data platforms have a certain amount of downtime
built in.
Consistency: Should one opt for consistency or eventual consistency?
Partition tolerant: How to build partition tolerant systems that can take care of both
hardware and software failures?
Data quality: How to maintain data quality- data accuracy, completeness, timeliness,
etc.? Do we have appropriate metadata in place?

6. What is big data analytics? Also write and explain importance of big data.
Reactive-Business Intelligence: What does Business Intelligence (BI) help us with? It
allows the businesses to make faster and better decisions by providing the right
information to the right person at the right time in the right format. It is about analysis of
the past or historical data and then displaying the findings of the analysis or reports in
the form of enterprise dashboards, alerts, notifications, etc.
Reactive - Big Data Analytics: Here the analysis is done on huge datasets, but the
approach is still reactive as it is still based on static data.
Proactive - Analytics: This is to support futuristic decision making by use of data mining
predictive modelling, text mining, and statistical analysis on. This analysis is not on big
data as it still the traditional database management practices on big data and therefore
has severe limitations on the storage capacity and the processing capability.
Proactive - Big Data Analytics: This is filtering through terabytes, petabytes, exabytes of
information to filter out the relevant data to analyze. This also includes high
performance analytics to gain rapid insights from big data and the ability to solve
complex problems using more data.
7. Write a short note on Soft state eventual consistency.
Soft state: The state of the system could change over time, so even during times without
input there may be changes going on due to ‘eventual consistency,’ thus the state of the
system is always ‘soft.’
Eventual consistency: The system will eventually become consistent once it stops
receiving input. The data will propagate to everywhere it should sooner or later, but the
system will continue to receive input and is not checking the consistency of every
transaction before it moves onto the next one. Werner Vogel’s article “Eventually
Consistent – Revisited” covers this topic is much greater detail.

8. What are different phases of the Data Analytics Lifecycle? Explain each in detail.

You might also like