0% found this document useful (0 votes)
91 views

What Is A Data Analytics Lifecycle

The document discusses the data analytics lifecycle, which maps out the stages data goes through from creation to reuse. It is a circular process with 6 main phases: 1) data discovery and formation, 2) data preparation and processing, 3) design a model, 4) model building, 5) result communication and publication, and 6) measuring effectiveness. The lifecycle guides data professionals to proceed sequentially or return to previous stages as needed, though the exact stages are not universally agreed upon.

Uploaded by

prakash N
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

What Is A Data Analytics Lifecycle

The document discusses the data analytics lifecycle, which maps out the stages data goes through from creation to reuse. It is a circular process with 6 main phases: 1) data discovery and formation, 2) data preparation and processing, 3) design a model, 4) model building, 5) result communication and publication, and 6) measuring effectiveness. The lifecycle guides data professionals to proceed sequentially or return to previous stages as needed, though the exact stages are not universally agreed upon.

Uploaded by

prakash N
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

What is a Data Analytics Lifecycle?

Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused,
data goes through several phases/ stages during its entire life. A data analytics architecture maps out
such steps for data science professionals. It is a cyclic structure that encompasses all the data life cycle
phases, where each stage has its significance and characteristics.

The lifecycle’s circular form guides data professionals to proceed with data analytics in one direction,
either forward or backward. Based on the newly received information, professionals can scrap the
entire research and move back to the initial step to redo the complete analysis as per the lifecycle
diagram.

However, while there are talks of the data analytics lifecycle among the experts, there is still no
defined structure of the mentioned stages. You’re unlikely to find a concrete data analytics
architecture that is uniformly followed by every data analysis expert. Such ambiguity gives rise to the
probability of adding extra phases (when necessary) and removing the basic steps. There is also the
possibility of working for different stages at once or skipping a phase entirely.

Yet, suppose, there is ever a discussion about the stages of the data lifecycle. In that case, the below-
listed phases are likely to be present, as they represent the fundamentals of almost every data analysis
process. upGrad follows these basic steps to determine a data professional’s overall work and the data
analysis results.

Phases of Data Analytics Lifecycle

A scientific method that helps give the data analysis process a structured framework is divided into six
phases of data analytics architecture.

Phase 1: Data Discovery and Formation

Everything begins with a defined goal. In this phase, you’ll define your data’s purpose and how to
achieve it by the time you reach the end of the data analytics lifecycle.

The initial stage consists of mapping out the potential use and requirement of data, such as where the
information is coming from, what story you want your data to convey, and how your organization
benefits from the incoming data. Basically, as a data analysis expert, you’ll need to focus on enterprise
requirements related to data, rather than data itself. Additionally, your work also includes assessing the
tools and systems that are necessary to read, organize, and process all the incoming data.

Essential activities in this phase include structuring the business problem in the form of an analytics
challenge and formulating the initial hypotheses (IHs) to test and start learning the data. The
subsequent phases are then based on achieving the goal that is drawn in this stage.

Phase 2: Data Preparation and Processing

This stage consists of everything that has anything to do with data. In phase 2, the attention of experts
moves from business requirements to information requirements.

The data preparation and processing step involve collecting, processing, and cleansing the accumulated
data. One of the essential parts of this phase is to make sure that the data you need is actually available
to you for processing. The earliest step of the data preparation phase is to collect valuable information
and proceed with the data analytics lifecycle in a business ecosystem. Data is collected using the
below methods:

 Data Acquisition: Accumulating information from external sources.


 Data Entry: Formulating recent data points using digital systems or manual data entry techniques
within the enterprise.
 Signal Reception: Capturing information from digital devices, such as control systems and the Internet
of Things.
Phase 3: Design a Model

After mapping out your business goals and collecting a glut of data (structured, unstructured, or semi-
structured), it is time to build a model that utilizes the data to achieve the goal.

There are several techniques available to load data into the system and start studying it:

 ETL (Extract, Transform, and Load) transforms the data first using a set of business rules, before
loading it into a sandbox.
 ELT (Extract, Load, and Transform) first loads raw data into the sandbox and then transform it.
 ETLT (Extract, Transform, Load, Transform) is a mixture; it has two transformation levels.
This step also includes the teamwork to determine the methods, techniques, and workflow to build the
model in the subsequent phase. The model’s building initiates with identifying the relation between
data points to select the key variables and eventually find a suitable model.

Phase 4: Model Building

This step of data analytics architecture comprises developing data sets for testing, training, and
production purposes. The data analytics experts meticulously build and operate the model that they had
designed in the previous step. They rely on tools and several techniques like decision trees, regression
techniques (logistic regression), and neural networks for building and executing the model. The experts
also perform a trial run of the model to observe if the model corresponds to the datasets.

Phase 5: Result Communication and Publication

Remember the goal you had set for your business in phase 1? Now is the time to check if those criteria
are met by the tests you have run in the previous phase.

The communication step starts with a collaboration with major stakeholders to determine if the project
results are a success or failure. The project team is required to identify the key findings of the analysis,
measure the business value associated with the result, and produce a narrative to summarise and
convey the results to the stakeholders.

Phase 6: Measuring of Effectiveness

As your data analytics lifecycle draws to a conclusion, the final step is to provide a detailed report
with key findings, coding, briefings, technical papers/ documents to the stakeholders.

Additionally, to measure the analysis’s effectiveness, the data is moved to a live environment from the
sandbox and monitored to observe if the results match the expected business goal. If the findings are as
per the objective, the reports and the results are finalized. However, suppose the outcome deviates from
the intent set out in phase 1then. You can move backward in the data analytics lifecycle to any of the
previous phases to change your input and get a different output.

Conclusion

The data analytics lifecycle is a circular process that consists of six basic stages that define how
information is created, gathered, processed, used, and analyzed for business goals. However, the
ambiguity in having a standard set of phases for data analytics architecture does plague data experts
in working with the information. But the first step of mapping out a business objective and working
toward achieving them helps in drawing out the rest of the stages.
Big data analytics is the often complex process of examining big data to uncover information -- such as
hidden patterns, correlations, market trends and customer preferences -- that can help organizations
make informed business decisions.

On a broad scale, data analytics technologies and techniques give organizations a way to analyze data
sets and gather new information. Business intelligence (BI) queries answer basic questions about
business operations and performance.

Big data analytics is a form of advanced analytics, which involve complex applications with elements
such as predictive models, statistical algorithms and what-if analysis powered by analytics systems.

Why is big data analytics important?

Organizations can use big data analytics systems and software to make data-driven decisions that can improve
business-related outcomes. The benefits may include more effective marketing, new revenue opportunities,
customer personalization and improved operational efficiency. With an effective strategy, these benefits can
provide competitive advantages over rivals.

THIS ARTICLE IS PART OF

The ultimate guide to big data for businesses

 Which also includes:

 6 big data benefits for businesses


 How to build an enterprise big data strategy in 4 steps
 10 big data challenges and how to address them

DOWNLOAD1

Download this entire guide for FREE now!

How does big data analytics work?

Data analysts, data scientists, predictive modelers, statisticians and other analytics professionals collect, process,
clean and analyze growing volumes of structured transaction data as well as other forms of data not used by
conventional BI and analytics programs.

Here is an overview of the four steps of the data preparation process:

1. Data professionals collect data from a variety of different sources. Often, it is a mix of semi-structured and

unstructured data. While each organization will use different data streams, some common sources include:

 internet clickstream data;


 web server logs;
 cloud applications;
 mobile applications;
 social media content;
 text from customer emails and survey responses;
 mobile phone records; and
 machine data captured by sensors connected to the internet of things (IoT).

1. Data is processed. After data is collected and stored in a data warehouse or data lake, data professionals

must organize, configure and partition the data properly for analytical queries. Thorough data processing
makes for higher performance from analytical queries.
2. Data is cleansed for quality. Data professionals scrub the data using scripting tools or enterprise software.

They look for any errors or inconsistencies, such as duplications or formatting mistakes, and organize and tidy
up the data.
3. The collected, processed and cleaned data is analyzed with analytics software. This includes tools for:

 data mining, which sifts through data sets in search of patterns and relationships
 predictive analytics, which builds models to forecast customer behavior and other future developments
 machine learning, which taps algorithms to analyze large data sets
 deep learning, which is a more advanced offshoot of machine learning
 text mining and statistical analysis software
 artificial intelligence (AI)
 mainstream business intelligence software
 data visualization tools
Key big data analytics technologies and tools

Many different types of tools and technologies are used to support big data analytics processes. Common
technologies and tools used to enable big data analytics processes include:

 Hadoop, which is an open source framework for storing and processing big data sets. Hadoop can handle
large amounts of structured and unstructured data.
 Predictive analytics hardware and software, which process large amounts of complex data, and use machine
learning and statistical algorithms to make predictions about future event outcomes. Organizations use
predictive analytics tools for fraud detection, marketing, risk assessment and operations.
 Stream analytics tools, which are used to filter, aggregate and analyze big data that may be stored in many
different formats or platforms.
 Distributed storage data, which is replicated, generally on a non-relational database. This can be as a
measure against independent node failures, lost or corrupted big data, or to provide low-latency access.
 NoSQL databases, which are non-relational data management systems that are useful when working with
large sets of distributed data. They do not require a fixed schema, which makes them ideal for raw and
unstructured data.
 A data lake is a large storage repository that holds native-format raw data until it is needed. Data lakes use a
flat architecture.
 A data warehouse, which is a repository that stores large amounts of data collected by different sources.
Data warehouses typically store data using predefined schemas.
 Knowledge discovery/big data mining tools, which enable businesses to mine large amounts of structured
and unstructured big data.
 In-memory data fabric, which distributes large amounts of data across system memory resources. This helps
provide low latency for data access and processing.
 Data virtualization, which enables data access without technical restrictions.
 Data integration software, which enables big data to be streamlined across different platforms, including
Apache, Hadoop, MongoDB and Amazon EMR.
 Data quality software, which cleanses and enriches large data sets.
 Data preprocessing software, which prepares data for further analysis. Data is formatted and unstructured
data is cleansed.
 Spark, which is an open source cluster computing framework used for batch and stream data processing.

Big data analytics applications often include data from both internal systems and external sources, such as weather
data or demographic data on consumers compiled by third-party information services providers. In addition,
streaming analytics applications are becoming common in big data environments as users look to perform real-
time analytics on data fed into Hadoop systems through stream processing engines, such as Spark, Flink and
Storm.

Early big data systems were mostly deployed on premises, particularly in large organizations that collected,
organized and analyzed massive amounts of data. But cloud platform vendors, such as Amazon Web Services
(AWS), Google and Microsoft, have made it easier to set up and manage Hadoop clusters in the cloud. The same
goes for Hadoop suppliers such as Cloudera, which supports the distribution of the big data framework on the
AWS, Google and Microsoft Azure clouds. Users can now spin up clusters in the cloud, run them for as long as
they need and then take them offline with usage-based pricing that doesn't require ongoing software licenses.

Big data has become increasingly beneficial in supply chain analytics. Big supply chain analytics utilizes big data
and quantitative methods to enhance decision-making processes across the supply chain. Specifically, big supply
chain analytics expands data sets for increased analysis that goes beyond the traditional internal data found on
enterprise resource planning (ERP) and supply chain management (SCM) systems. Also, big supply chain
analytics implements highly effective statistical methods on new and existing data sources.
TECHTARGET

Big data analytics is a form of advanced analytics, which has marked


differences compared to traditional BI.
Big data analytics uses and examples

Here are some examples of how big data analytics can be used to help organizations:

 Customer acquisition and retention. Consumer data can help the marketing efforts of companies, which can
act on trends to increase customer satisfaction. For example, personalization engines for Amazon, Netflix and
Spotify can provide improved customer experiences and create customer loyalty.
 Targeted ads. Personalization data from sources such as past purchases, interaction patterns and product
page viewing histories can help generate compelling targeted ad campaigns for users on the individual level
and on a larger scale.
 Product development. Big data analytics can provide insights to inform about product viability, development
decisions, progress measurement and steer improvements in the direction of what fits a business' customers.
 Price optimization. Retailers may opt for pricing models that use and model data from a variety of data
sources to maximize revenues.
 Supply chain and channel analytics. Predictive analytical models can help with preemptive replenishment,
B2B supplier networks, inventory management, route optimizations and the notification of potential delays to
deliveries.
 Risk management. Big data analytics can identify new risks from data patterns for effective risk management
strategies.
 Improved decision-making. Insights business users extract from relevant data can help organizations make
quicker and better decisions.
Big data analytics benefits

The benefits of using big data analytics include:

 Quickly analyzing large amounts of data from different sources, in many different formats and types.
 Rapidly making better-informed decisions for effective strategizing, which can benefit and improve the supply
chain, operations and other areas of strategic decision-making.
 Cost savings, which can result from new business process efficiencies and optimizations.
 A better understanding of customer needs, behavior and sentiment, which can lead to better marketing
insights, as well as provide information for product development.
 Improved, better informed risk management strategies that draw from large sample sizes of data.

TECHTARGET

Big data analytics involves analyzing structured and unstructured


data.
Big data analytics challenges

Despite the wide-reaching benefits that come with using big data analytics, its use also comes with challenges:

 Accessibility of data. With larger amounts of data, storage and processing become more complicated. Big
data should be stored and maintained properly to ensure it can be used by less experienced data scientists
and analysts.
 Data quality maintenance. With high volumes of data coming in from a variety of sources and in different
formats, data quality management for big data requires significant time, effort and resources to properly
maintain it.
 Data security. The complexity of big data systems presents unique security challenges. Properly addressing
security concerns within such a complicated big data ecosystem can be a complex undertaking.
 Choosing the right tools. Selecting from the vast array of big data analytics tools and platforms available on
the market can be confusing, so organizations must know how to pick the best tool that aligns with users'
needs and infrastructure.
 With a potential lack of internal analytics skills and the high cost of hiring experienced data scientists and
engineers, some organizations are finding it hard to fill the gaps.
History and growth of big data analytics

The term big data was first used to refer to increasing data volumes in the mid-1990s. In 2001, Doug Laney, then
an analyst at consultancy Meta Group Inc., expanded the definition of big data. This expansion described the
increasing:

 Volume of data being stored and used by organizations;


 Variety of data being generated by organizations; and
 Velocity, or speed, in which that data was being created and updated.

Those three factors became known as the 3Vs of big data. Gartner popularized this concept after acquiring Meta
Group and hiring Laney in 2005.

Another significant development in the history of big data was the launch of the Hadoop distributed processing
framework. Hadoop was launched as an Apache open source project in 2006. This planted the seeds for a clustered
platform built on top of commodity hardware and that could run big data applications. The Hadoop framework of
software tools is widely used for managing big data.

By 2011, big data analytics began to take a firm hold in organizations and the public eye, along with Hadoop and
various related big data technologies.

Initially, as the Hadoop ecosystem took shape and started to mature, big data applications were primarily used by
large internet and e-commerce companies such as Yahoo, Google and Facebook, as well as analytics and
marketing services providers.

More recently, a broader variety of users have embraced big data analytics as a key technology driving digital
transformation. Users include retailers, financial services firms, insurers, healthcare organizations, manufacturers,
energy companies and other enterprises.

Unit 2 (slideshare.net)

You might also like