0% found this document useful (0 votes)
65 views18 pages

Ba Unit 1a

Data analytics refers to analyzing raw data to make conclusions about the information. It relies on tools like spreadsheets, data visualization, and programming languages. The process involves determining data requirements, collecting data from various sources, organizing and cleaning the data, then analyzing it. There are four main types of data analysis: descriptive looks at what happened, diagnostic focuses on why something happened, predictive looks at what will happen, and prescriptive suggests actions. Popular techniques include regression analysis, factor analysis, cohort analysis, and Monte Carlo simulations. Data analytics is used across industries like travel, healthcare, and retail to optimize processes, increase efficiency, and understand customer trends.

Uploaded by

reshmibiotech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views18 pages

Ba Unit 1a

Data analytics refers to analyzing raw data to make conclusions about the information. It relies on tools like spreadsheets, data visualization, and programming languages. The process involves determining data requirements, collecting data from various sources, organizing and cleaning the data, then analyzing it. There are four main types of data analysis: descriptive looks at what happened, diagnostic focuses on why something happened, predictive looks at what will happen, and prescriptive suggests actions. Popular techniques include regression analysis, factor analysis, cohort analysis, and Monte Carlo simulations. Data analytics is used across industries like travel, healthcare, and retail to optimize processes, increase efficiency, and understand customer trends.

Uploaded by

reshmibiotech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT 1

(R. Evans, Business Analytics 2nd Edition)

2 MARK

What is data?
Data in business analytics refers to the information that is collected, processed, and analyzed
to gain insights and make informed decisions. It is the collective information related to a
company and its operations. This can include any statistical information, raw analytical data,
customer feedback data, sales numbers and other sets of information.

Depict the role of business analyst?


Business analysts use data to suggest ways that organisations can operate more efficiently. They
gather and analyse data to develop and investigate potential solutions. They often work closely
with others throughout the business hierarchy to communicate their findings and help implement
changes for the organization or business development.

Define Data Science?


Data Science is a combination of mathematics, statistics, machine learning, and computer
science. Data Science is collecting, analyzing and interpreting data to gather insights into the
data that can help decision-makers make informed decisions. Data Science is used in almost
every industry today that can predict customer behavior and trends and identify new
opportunities.

What is exploratory analysis


Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets
and summarize their main characteristics, often employing data visualization methods. It
analyze the data and discover trends, patterns, or check assumptions in data with the help of
statistical summaries and graphical representations.

What is problem business problem statement?


A problem statement defines the problem faced by a business. It is usually created on the basis
of solving problems that exist amongst consumers. The problem statement for your startup or
business, should include What is the problem, Who It Affects and its Impact.
DATA ANATYLICS
What Is Data Analytics?
Data analytics is the science of analyzing raw data to make conclusions about that
information. Data analytics help a business optimize its performance, perform more
efficiently, maximize profit, or make more strategically-guided decisions.
Data analytics relies on a variety of software tools ranging from spreadsheets, data
visualization, and reporting tools, data mining programs, or open-source languages for the
greatest data manipulation.

Understanding Data Analytics


Data analytics is a broad term that encompasses many diverse types of data analysis.
Any type of information can be subjected to data analytics techniques to get insight that can
be used to improve things. Data analytics techniques can reveal trends and metrics that would
otherwise be lost in the mass of information. This information can then be used to optimize
processes to increase the overall efficiency of a business or system.
Data analytics is important because it helps businesses optimize their performances.
Implementing it into the business model means companies can help reduce costs by
identifying more efficient ways of doing business and by storing large amounts of data. A
company can also use data analytics to make better business decisions and help analyze
customer trends and satisfaction, which can lead to new—and better—products and services.

Data Analysis Steps


The process involved in data analysis involves several different steps:
1. The first step is to determine the data requirements or how the data is grouped. Data
may be separated by age, demographic, income, or gender. Data values may be
numerical or be divided by category.
2. The second step in data analytics is the process of collecting it. This can be done
through a variety of sources such as computers, online sources, cameras,
environmental sources, or through personnel.
3. Once the data is collected, it must be organized so it can be analyzed. This may take
place on a spreadsheet or other form of software that can take statistical data.
4. The data is then cleaned up before analysis. This means it is scrubbed and checked to
ensure there is no duplication or error, and that it is not incomplete. This step helps
correct any errors before it goes on to a data analyst to be analyzed.

Types of Data Analytics


Data analytics is broken down into four basic types.
1. Descriptive analytics: This describes what has happened over a given period of time.
Have the number of views gone up? Are sales stronger this month than last?
2. Diagnostic analytics: This focuses more on why something happened. This involves
more diverse data inputs and a bit of hypothesizing. Did the weather affect beer sales?
Did that latest marketing campaign impact sales?
3. Predictive analytics: This moves to what is likely going to happen in the near term.
What happened to sales the last time we had a hot summer? How many weather models
predict a hot summer this year?
4. Prescriptive analytics: This suggests a course of action. If the likelihood of a hot
summer is measured as an average of these five weather models is above 58%, we
should add an evening shift to the brewery and rent an additional tank to increase
output.
Some of the sectors that have adopted the use of data analytics include the travel and
hospitality industry, where turnarounds can be quick. This industry can collect customer data
and figure out where the problems, if any, lie and how to fix them.
Healthcare combines the use of high volumes of structured and unstructured data and uses
data analytics to make quick decisions. Similarly, the retail industry uses copious amounts of
data to meet the ever-changing demands of shoppers. The information retailers collect and
analyze can help them identify trends, recommend products, and increase profits.

Data Analytics Techniques


There are several different analytical methods and techniques data analysts can use to process
data and extract information. Some of the most popular methods are listed below.
• Regression analysis entails analyzing the relationship between dependent variables to
determine how a change in one may affect the change in another.
• Factor analysis entails taking a large data set and shrinking it to a smaller data set. The
goal is to attempt to discover hidden trends that would otherwise have been more
difficult to see.
• Cohort analysis is the process of breaking a data set into groups of similar data, often
broken into a customer demographic. This allows data analysts and other users of data
analytics to further dive into the numbers relating to a specific subset of data.
• Monte Carlo simulations model the probability of different outcomes happening.
Often used for risk mitigation and loss prevention, these simulations incorporate
multiple values and variables and often have greater forecasting capabilities than other
data analytics approaches.
• Time series analysis tracks data over time and solidifies the relationship between the
value of a data point and the occurrence of the data point. This data analysis technique
is usually used to spot cyclical trends or to project financial forecasts.

Data Analytics Tools


In addition to a broad range of mathematical and statistical approaches to crunching numbers,
data analytics has rapidly evolved in technological capabilities. Today, data analysts have a
broad range of software tools to help acquire data, store information, process data, and report
findings.
Data analytics has always had loose ties to spreadsheets and Microsoft Excel. Now,
data analysts also often interact with raw programming languages to transform and manipulate
databases. Open-source languages such as Python are often utilized. More specific tools for
data analytics like R can be used for statistical analysis or graphical modelling.
Data analysts also have help when reporting or communicating findings. Both Tableau
and Power BI are data visualization and analysis tools to compile information, perform data
analytics, and distribute results via dashboards and reports.
Other tools are also emerging to assist data analysts. SAS is an analytics platform that
can assist with data mining, while Apache Spark is an open-source platform useful for
processing large sets of data. Data analysts now have a broad range of technological
capabilities to further enhance the value they deliver to their company.

Who Is Using Data Analytics


Data analytics has been adopted by several sectors, such as the travel and hospitality industry,
where turnarounds can be quick. This industry can collect customer data and figure out where
the problems, if any, lie and how to fix them. Healthcare is another sector that combines the
use of high volumes of structured and unstructured data and data analytics can help in making
quick decisions. Similarly, the retail industry uses copious amounts of data to meet the ever-
changing demands of shoppers.

What is data analytics in business


Data analytics is the practice of examining data to answer questions, identify trends, and extract
insights. When data analytics is used in business, it’s often called business analytics.
You can use tools, frameworks, and software to analyze data, such as Microsoft Excel and
Power BI, Google Charts, Data Wrapper, Infogram, Tableau, and Zoho Analytics. These can
help you examine data from different angles and create visualizations that illuminate the story
you’re trying to tell.
Algorithms and machine learning also fall into the data analytics field and can be used to gather,
sort, and analyze data at a higher volume and faster pace than humans can. Writing algorithms
is a more advanced data analytics skill, but you don’t need deep knowledge of coding and
statistical modeling to experience the benefits of data-driven decision-making.
Who needs data analytics
Data analytics is important because it helps businesses optimize their performances.
Implementing it into the business model means companies can help reduce costs by
identifying more efficient ways of doing business. A company can also use data analytics to
make better business decisions and help analyze customer trends and satisfaction, which can
lead to new—and better—products and services.
Professionals who can benefit from data analytics skills include:
• Marketers, who utilize customer data, industry trends, and performance data from
past campaigns to plan marketing strategies
• Product managers, who analyze market, industry, and user data to improve their
companies’ products
• Finance professionals, who use historical performance data and industry trends to
forecast their companies’ financial trajectories
• Human resources and diversity, equity, and inclusion professionals, who gain
insights into employees’ opinions, motivations, and behaviors and pair it with
industry trend data to make meaningful changes within their organizations

TYPES OF DATA ANALYTICS


The four types of data analysis are:
• Descriptive Analysis
• Diagnostic Analysis
• Predictive Analysis
• Prescriptive Analysis
Below, we will introduce each type and give examples of how they are utilized in business.
1. Descriptive Analytics
Descriptive analytics is the simplest type of analytics and the foundation the other types are
built on. It allows you to pull trends from raw data and succinctly describe what happened or
is currently happening.
Descriptive analytics answers the question, “What happened?”
For example, imagine you’re analyzing your company’s data and find there’s a seasonal surge
in sales for one of your products: a video game console. Here, descriptive analytics can tell
you, “This video game console experiences an increase in sales in October, November, and
early December each year.”
Data visualization is a natural fit for communicating descriptive analysis because charts,
graphs, and maps can show trends in data—as well as dips and spikes—in a clear, easily
understandable way.
2. Diagnostic Analytics
Diagnostic analytics addresses the next logical question, “Why did this happen?”
Taking the analysis a step further, this type includes comparing coexisting trends or movement,
uncovering correlations between variables, and determining causal relationships where
possible.
Continuing the aforementioned example, you may dig into video game console users’
demographic data and find that they’re between the ages of eight and 18. The customers,
however, tend to be between the ages of 35 and 55. Analysis of customer survey data reveals
that one primary motivator for customers to purchase the video game console is to gift it to
their children. The spike in sales in the fall and early winter months may be due to the holidays
that include gift-giving.
Diagnostic analytics is useful for getting at the root of an organizational issue.
3. Predictive Analytics
Predictive analytics is used to make predictions about future trends or events and answers the
question, “What might happen in the future?”
By analyzing historical data in tandem with industry trends, you can make informed predictions
about what the future could hold for your company.
For instance, knowing that video game console sales have spiked in October, November, and
early December every year for the past decade provides you with ample data to predict that the
same trend will occur next year. Backed by upward trends in the video game industry as a
whole, this is a reasonable prediction to make.
Making predictions for the future can help your organization formulate strategies based on
likely scenarios.
4. Prescriptive Analytics
Finally, prescriptive analytics answers the question, “What should we do next?”
Prescriptive analytics takes into account all possible factors in a scenario and suggests
actionable takeaways. This type of analytics can be especially useful when making data-driven
decisions.
The final type of data analysis is the most sought after, but few organizations are truly equipped
to perform it. Prescriptive analysis is the frontier of data analysis, combining the insight from
all previous analyses to determine the course of action to take in a current problem or decision.
Prescriptive analysis utilizes state of the art technology and data practices. It is a huge
organizational commitment and companies must be sure that they are ready and willing to put
forth the effort and resources.
Artificial Intelligence (AI) is a perfect example of prescriptive analytics. AI systems consume
a large amount of data to continuously learn and use this information to make informed
decisions. Well-designed AI systems are capable of communicating these decisions and even
putting those decisions into action. Business processes can be performed and optimized daily
without a human doing anything with artificial intelligence.
Currently, most of the big data-driven companies (Apple, Facebook, Netflix, etc.) are utilizing
prescriptive analytics and AI to improve decision making. For other organizations, the jump to
predictive and prescriptive analytics can be insurmountable. As technology continues to
improve and more professionals are educated in data, we will see more companies entering the
data-driven realm.

Using data to drive decision-making


The four types of data analysis should be used in tandem to create a full picture of the story
data tells and make informed decisions. To understand your company’s current situation, use
descriptive analytics. To figure out how your company got there, leverage diagnostic analytics.
Predictive analytics is useful for determining the trajectory of a situation—will current trends
continue? Finally, prescriptive analytics can help you consider all aspects of current and future
scenarios and plan actionable strategies.
As we have shown, each of these types of data analysis are connected and rely on each other
to a certain degree. They each serve a different purpose and provide varying insights. Moving
from descriptive analysis towards predictive and prescriptive analysis requires much more
technical ability, but also unlocks more insight for your organization

DATA ANALYTICS LIFECYCLE


The data analytics lifecycle is a structure for doing data analytics that has business
objectives at its core. Following this structure will help you better understand your data and
improve the effectiveness of your data analytics work. In addition to showing you a data
analytics lifecycle tailored to meeting business objectives, this article describes the six phases
in this lifecycle, gives further information about each phase, and explains the benefits of
following the data analytics lifecycle.

What is the data analytics lifecycle?


The data analytics lifecycle is a series of six phases that have each been identified as
vital for businesses doing data analytics. This lifecycle is based on the popular CRISP-
DM analytics process model, which is an open-standard analytics model developed by IBM.
The phases of the data analytics lifecycle include defining your business objectives, cleaning
your data, building models, and communicating with your stakeholders.
This lifecycle runs from identifying the problem you need to solve, to running your
chosen models against some sandboxed data, to finally operationalizing the output of these
models by running them on a production dataset. This will enable you to find the answer to
your initial question and use this answer to inform business decisions.

Why is the data analytics lifecycle important?


The data analytics lifecycle allows you to better understand the factors that affect
successes and failures in your business. It’s especially useful for finding out why customers
behave a certain way. These customer insights are extremely valuable and can help inform your
growth strategy.
The prescribed phases of the data analytics lifecycle cover all the important parts of a
successful analysis of your data. While the order can be deviated from, you should follow all
six steps, as missing one out could lead to a less effective data analysis.
For example, you need a hypothesis to give your study clarity and direction, your data will be
easier to analyze if it has been prepared and transformed in advance, and you will have a higher
chance of working with an effective model if you have spent time and care selecting the most
appropriate one for your particular dataset.
Following the data analytics lifecycle ensures you can recognize the full value of your
data and that all stakeholders are informed of the results and insights derived from analysis, so
they can be actioned promptly.

Phases of the data analytics lifecycle


Each phase in the data analytics lifecycle is influenced by the outcome of the preceding phase.
Because of this, it usually makes sense to perform each step in the prescribed order so that data
teams can decide how to progress: whether to continue to the next phase, redo the phase, or
completely scrap the process. By enforcing these steps, the analytics lifecycle helps guide the
teams through what could otherwise become a convoluted and directionless process with
unclear outcomes.

1. Discovery
This first phase involves getting the context around your problem: you need to know what
problem you are solving and what business outcomes you wish to see.
You should begin by defining your business objective and the scope of the work. Work out
what data sources will be available and useful to you (for example, Google Analytics,
Salesforce, your customer support ticketing system, or any marketing campaign information
you might have available), and perform a gap analysis of what data is required to solve your
business problem analysis compared with what data you have available, working out a plan to
get any data you still need.
Once your objective has been identified, you should formulate an initial hypothesis. Design
your analysis so that it will determine whether to accept or reject this hypothesis. Decide in
advance what the criteria for accepting or rejecting the hypothesis will be to ensure that your
analysis is rigorous and follows the scientific method.

2. Data preparation
In the next stage, you need to decide which data sources will be useful for the analysis, collect
the data from all these disparate sources, and load it into a data analytics sandbox so it can be
used for prototyping.
When loading your data into the sandbox area, you will need to transform it. The two
main types of transformations are preprocessing transformations and analytics
transformations. Preprocessing means cleaning your data to remove things like nulls, defective
values, duplicates, and outliers. Analytics transformations can mean a variety of things, such
as standardizing or normalizing your data so it can be used more effectively with certain
machine learning algorithms, or preparing your datasets for human consumption (for example,
transforming machine labels into human-readable ones, such as “sku123” → “T-Shirt,
brown”).
Depending on whether your transformations take place before or after the loading stage,
this whole process is known as either ETL (extract, transform, load) or ELT (extract, load,
transform). You can set up your own ETL pipeline to deal with all of this, or use an integrated
customer data platform to handle the task all within a unified environment.
It is important to note that the sub-steps detailed here don’t have to take place in separate
systems. For example, if you have all data sources in a data warehouse already, you can simply
use a development schema to perform your exploratory analysis and transformation work in
that same warehouse.

3. Model planning
A model in data analytics is a mathematical or programmatic description of the
relationship between two or more variables. It allows us to study the effects of different
variables on our data and to make statistical assumptions about the probability of an event
happening.
The main categories of models used in data analytics are SQL models, statistical
models, and machine learning models. A SQL model can be as simple as the output of a SQL
SELECT statement, and these are often used for business intelligence dashboards. A statistical
model shows the relationship between one or more variables (a feature that some data
warehouses incorporate into more advanced statistical functions in their SQL processing), and
a machine learning model uses algorithms to recognize patterns in data and must be trained on
other data to do so. Machine learning models are often used when the analyst doesn’t have
enough information to try to solve a problem using easier steps.
You need to decide which models you want to test, operationalize, or deploy. To choose
the most appropriate model for your problem, you will need to do an exploration of your
dataset, including some exploratory data analysis to find out more about it. This will help guide
you in your choice of model because your model needs to answer the business objective that
started the process and work with the data available to you.
Do you want the outcome to be qualitative or quantitative? If your question expects a
quantitative answer (for example, “How many sales are forecast for next month?” or “How
many customers were satisfied with our product last month?”) then you should use a regression
model. However, if you expect a qualitative answer (for example, “Is this email spam?”, where
the answer can be Yes or No, or “Which of our five products are we likely to have the most
success in marketing to customer X?”), then you may want to use a classification or clustering
model.
Is accuracy or speed of the model particularly important? If so, check whether your
chosen model will perform well. The size of your dataset will be a factor when evaluating the
speed of a particular model.
Is your data unstructured? Unstructured data cannot be easily stored in either relational or graph
databases and includes free text data such as emails or files. This type of data is most suited to
machine learning.
Have you analyzed the contents of your data? Analyzing the contents of your data can
include univariate analysis or multivariate analysis (such as factor analysis or principal
component analysis). This allows you to work out which variables have the largest effects and
to identify new factors (that are a combination of different existing variables) that have a big
impact.

4. Building and executing the model


Once you know what your models should look like, you can build them and begin to draw
inferences from your modeled data.
The steps within this phase of the data analytics lifecycle depend on the model you've chosen
to use.
SQL model
You will first need to find your source tables and the join keys. Next, determine where to build
your models. Depending on the complexity, building your model can range from saving SQL
queries in your warehouse and executing them automatically on a schedule, to building more
complex data modeling chains using tooling like dbt or Dataform. In that case, you should first
create a base model, and then create another model to extend it, so that your base model can be
reused for other future models. Now you need to test and verify your extended model, and then
publish the final model to its destination (for example, a business intelligence tool or reverse
ETL tool).
Statistical model
You should start by developing a dataset containing exactly the information required for the
analysis, and no more. Next, you will need to decide which statistical model is appropriate for
your use case. For example, you could use a correlation test, a linear regression model, or an
analysis of variance (ANOVA). Finally, you should run your model on your dataset and publish
your results.
Machine learning model
There is some overlap between machine learning models and statistical models, so you must
begin the same way as when using a statistical model and develop a dataset containing exactly
the information required for your analysis. However, machine learning models require you to
create two samples from this dataset: one for training the model, and another for testing the
model.
for example, linear regression, decision trees, or support vector machines — so you may want
to try multiple models to see which produces the best result.
If you are using a machine learning model, it will need to be trained. This involves executing
your model on your training dataset, and tuning various parameters of your model so you get
the best predictive results. Once this is working well, you can execute your model on your real
dataset, which is used for testing your model. You can now work out which model gave the
most accurate result and use this model for your final results, which you will then need to
publish.
Once you have built your models and are generating results, you can communicate these results
to your stakeholders.

5. Communicating results
You must communicate your findings clearly, and it can help to use data visualizations to
achieve this. Any communication with stakeholders should include a narrative, a list of key
findings, and an explanation of the value your analysis adds to the business. You should also
compare the results of your model with your initial criteria for accepting or rejecting your
hypothesis to explain to them how confident they can be in your analysis.

6. Operationalizing
Once the stakeholders are happy with your analysis, you can execute the same model outside
of the analytics sandbox on a production dataset.
You should monitor the results of this to check if they lead to your business goal being
achieved. If your business objectives are being met, deliver the final reports to your
stakeholders, and communicate these results more widely across the business.

Data analytics lifecycle improves your outcomes


Following the six phases of the data analytics lifecycle will help improve your business
decisions, as each phase is integral to an effective data analytics project. In particular,
understanding your business objectives and your data upfront can be super helpful, as can
ensuring it is cleaned and in a useful format for analysis. Communicating with your
stakeholders is also key before moving on to regularly running your model on production
datasets. An effective data analytics project will give useful business insights, such as the
ability to improve your product or marketing strategy, identify avenues to lower costs, or
increase audience numbers.
A customer data platform (CDP) will vastly improve your data handling practices and
can be integrated into your data analytics lifecycle to assist with the data preparation phase. It
will transform and integrate your data into a structured format for easy analysis and exploration,
ensuring that no data is wasted and the full value of your data investment is real

DATA SCIENCE
Data Science is a combination of mathematics, statistics, machine learning, and
computer science. Data Science is collecting, analyzing and interpreting data to gather insights
into the data that can help decision-makers make informed decisions.
Data Science is used in almost every industry today that can predict customer behavior
and trends and identify new opportunities. Businesses can use it to make informed decisions
about product development and marketing. It is used as a tool to detect fraud and optimize
processes. Governments also use Data Science to improve efficiency in the delivery of public
services.
Nowadays, organizations are overwhelmed with data. Data Science will help in
extracting meaningful insights from that by combining various methods, technology, and tools.
In the fields of e-commerce, finance, medicine, human resources, etc, businesses come across
huge amounts of data. Data Science tools and technologies help them process all of them.
In simple terms, Data Science helps to analyze data and extract meaningful insights from it by
combining statistics & mathematics, programming skills, and subject expertise.

Data Science – Requirements


• Statistics
Data science relies on statistics to capture and transform data patterns into usable evidence
through the use of complex machine-learning techniques.
Check out statistics for Data Science to learn key concepts of Statistics in Data Science,
Machine Learning, and Business Intelligence.
• Programming
Python, R, and SQL are the most common programming languages. To successfully execute a
data science project, it is important to install some level of programming knowledge.
• Machine Learning
Making accurate forecasts and estimates is made possible by Machine Learning, which is a
crucial component of data science. You must have a firm understanding of machine learning if
you want to succeed in the field of data science.
• Databases
A clear understanding of the functioning of Databases, and skills to manage and extract data is
a must in this domain.
• Modeling
You may quickly calculate and predict using mathematical models based on the data you
already know. Modeling helps in determining which algorithm is best suited to handle a certain
issue and how to train these models.

What is the Data Science process?


• Obtaining the data
The first step is to identify what type of data needs to be analyzed, and this data needs to be
exported to an excel or a CSV file.
• Scrubbing the data
It is essential because before you can read the data, you must ensure it is in a perfectly readable
state, without any mistakes, with no missing or wrong values.
• Exploratory Analysis
It is used by data scientists to analyze and investigate data sets and summarize their main
characteristics, often employing data visualization methods. It analyze the data and discover
trends, patterns, or check assumptions in data with the help of statistical summaries and
graphical representations.
• Modeling or Machine Learning
A data engineer or scientist writes down instructions for the Machine Learning algorithm to
follow based on the Data that has to be analyzed. The algorithm iteratively uses these
instructions to come up with the correct output.
• Interpreting the data
In this step, you uncover your findings and present them to the organization. The most critical
skill in this would be your ability to explain your results.

What are different Data Science tools?


Here are a few examples of tools that will assist Data Scientists in making their job easier.
• Data Analysis – Informatica PowerCenter, Rapidminer, Excel, SAS
• Data Visualization – Tableau, Qlikview, RAW, Jupyter
• Data Warehousing – Apache Hadoop, Informatica/Talend, Microsoft HD
insights
• Data Modelling – H2O.ai, Datarobot, Azure ML Studio, Mahout

Benefits of Data Science in Business


• Improves business predictions
• Interpretation of complex data
• Better decision making
• Product innovation
• Improves data security
• Development of user-centric products

Applications of Data Science


• Product Recommendation
The product recommendation technique can influence customers to buy similar products. For
example, a salesperson of Big Bazaar is trying to increase the store’s sales by bundling the
products together and giving discounts. So he bundled shampoo and conditioner together and
gave a discount on them. Furthermore, customers will buy them together for a discounted price.
• Future Forecasting
It is one of the widely applied techniques in Data Science. On the basis of various types of data
that are collected from various sources weather forecasting and future forecasting are done.
• Fraud and Risk Detection
It is one of the most logical applications of Data Science. Since online transactions are
booming, losing your data is possible. For example, Credit card fraud detection depends on the
amount, merchant, location, time, and other variables. If any of them looks unnatural, the
transaction will be automatically canceled, and it will block your card for 24 hours or more.
• Image Recognition
When you want to recognize some images, data science can detect the object and classify it.
The most famous example of image recognition is face recognition – If you tell your
smartphone to unblock it, it will scan your face. So first, the system will detect the face, then
classify your face as a human face, and after that, it will decide if the phone belongs to the
actual owner or not.
• Speech to text Convert
Speech recognition is a process of understanding natural language by the computer. We are
quite familiar with virtual assistants like Siri, Alexa, and Google Assistant.
• Healthcare
Data Science helps in various branches of healthcare such as Medical Image Analysis,
Development of new drugs, Genetics and Genomics, and providing virtual assistance to
patients.
• Search Engines
Google, Yahoo, Bing, Ask, etc. provides us with a lot of results within a fraction of a second.
It is made possible using various data science algorithms.

DATA COLLECTION
Data collection is the methodological process of gathering information about a specific
subject. It’s crucial to ensure your data is complete during the collection phase and that it’s
collected legally and ethically. In the data life cycle, data collection is the second step. After
data is generated, it must be collected to be of use to your team. After that, it can be processed,
stored, managed, analysed, and visualized to aid in your organization’s decision-making.

Primary and secondary methods of data collection are two approaches used to gather
information for research or analysis purposes. Let's explore each method in detail:
1. Primary Data Collection:
Primary data collection involves the collection of original data directly from the source or
through direct interaction with the respondents. This method allows researchers to obtain
firsthand information specifically tailored to their research objectives. There are various
techniques for primary data collection such as survey, experiment, interview and observation2.
Secondary Data Collection:
Secondary data collection involves using existing data collected by someone else for a purpose
different from the original intent. Researchers analyze and interpret this data to extract relevant
information. Secondary data can be obtained from various sources such as published source,
government record, past research study.

7 Data collection methods used in business analytics


1. Surveys
Surveys are physical or digital questionnaires that gather both qualitative and
quantitative data from subjects. One situation in which you might conduct a survey is gathering
attendee feedback after an event. This can provide a sense of what attendees enjoyed, what they
wish was different, and areas in which you can improve or save money during your next event
for a similar audience.
While physical copies of surveys can be sent out to participants, online surveys present the
opportunity for distribution at scale. They can also be inexpensive; running a survey can cost
nothing if you use a free tool. If you wish to target a specific group of people, partnering with
a market research firm to get the survey in front of that demographic may be worth the money.
Something to watch out for when crafting and running surveys is the effect of bias, including:
• Collection bias: It can be easy to accidentally write survey questions with a biased
lean. Watch out for this when creating questions to ensure your subjects answer
honestly and aren’t swayed by your wording.
• Subject bias: Because your subjects know their responses will be read by you, their
answers may be biased toward what seems socially acceptable. For this reason,
consider pairing survey data with behavioral data from other collection methods to
get the full picture.
2. Transactional Tracking
Each time your customers make a purchase, tracking that data can allow you to make
decisions about targeted marketing efforts and understand your customer base better.
Often, e-commerce and point-of-sale platforms allow you to store data as soon as it’s generated,
making this a seamless data collection method that can pay off in the form of customer insights.
3. Interviews and Focus Groups
Interviews and focus groups consist of talking to subjects face-to-face about a specific
topic or issue. Interviews tend to be one-on-one, and focus groups are typically made up of
several people. You can use both to gather qualitative and quantitative data.
Through interviews and focus groups, you can gather feedback from people in your target
audience about new product features. Seeing them interact with your product in real-time and
recording their reactions and responses to questions can provide valuable data about which
product features to pursue.
As is the case with surveys, these collection methods allow you to ask subjects anything
you want about their opinions, motivations, and feelings regarding your product or brand. It
also introduces the potential for bias. Aim to craft questions that don’t lead them in one
particular direction.
One downside of interviewing and conducting focus groups is they can be time-
consuming and expensive. If you plan to conduct them yourself, it can be a lengthy process.
To avoid this, you can hire a market research facilitator to organize and conduct interviews on
your behalf.
4. Observation
Observing people interacting with your website or product can be useful for data
collection because of the candor it offers. If your user experience is confusing or difficult, you
can witness it in real-time.
Yet, setting up observation sessions can be difficult. You can use a third-party tool to record
users’ journeys through your site or observe a user’s interaction with a beta version of your site
or product.
While less accessible than other data collection methods, observations enable you to see
firsthand how users interact with your product or site. You can leverage the qualitative and
quantitative data gleaned from this to make improvements and double down on points of
success.
5. Online Tracking
To gather behavioral data, you can implement pixels and cookies. These are both tools
that track users’ online behavior across websites and provide insight into what content they’re
interested in and typically engage with.
You can also track users’ behavior on your company’s website, including which parts are of
the highest interest, whether users are confused when using it, and how long they spend on
product pages. This can enable you to improve the website’s design and help users navigate to
their destination.
It’s important to note: Tracking online behavior can have legal and ethical privacy implications.
Before tracking users’ online behavior, ensure you’re in compliance with local and
industry data privacy standards.
6. Forms
Online forms are beneficial for gathering qualitative data about users, specifically
demographic data or contact information. They’re relatively inexpensive and simple to set up,
and you can use them to gate content or registrations, such as webinars and email newsletters.
You can then use this data to contact people who may be interested in your product, build out
demographic profiles of existing customers, and in remarketing efforts, such as email
workflows and content recommendations.
7. Social Media Monitoring
Monitoring your company’s social media channels for follower engagement is an
accessible way to track data about your audience’s interests and motivations. Many social
media platforms have analytics built in, but there are also third-party social platforms that give
more detailed, organized insights pulled from multiple channels.
You can use data collected from social media to determine which issues are most important to
your followers. For instance, you may notice that the number of engagements dramatically
increases when your company posts about its sustainability efforts.

The importance of data collection


Collecting data is an integral part of a business’s success; it can enable you to ensure
the data’s accuracy, completeness, and relevance to your organization and the issue at hand.
The information gathered allows organizations to analyze past strategies and stay informed on
what needs to change. By ensuring accurate data collection, business professionals can feel
secure in their business decisions. Understanding the variety of data collection methods
available can help you decide which is best for your timeline, budget, and the question you’re
aiming to answer. When stored together and combined, multiple data types collected through
different methods can give an informed picture of your subjects and help you make better
business decisions.

DATA PREPARATION
Data preparation is the sorting, cleaning, and formatting of raw data so that it can be
better used in business intelligence, analytics, and machine learning applications.
Data comes in many formats, but for the purpose of this guide we’re going to focus on data
preparation for the two most common types of data: numeric and textual.
Numeric data preparation is a common form of data standardization. A good example would
be if you had customer data coming in and the percentages are being submitted as both
percentages (70%, 95%) and decimal amounts (.7, .95) – smart data prep, much like a smart
mathematician, would be able to tell that these numbers are expressing the same thing, and
would standardize them to one format.
Textual data preparation addresses a number of grammatical and context-specific text
inconsistencies so that large archives of text can be better tabulated and mined for useful
insights.
Text tends to be noisy as sentences, and the words they are made up of, vary with language,
context and format (an email vs a chat log vs an online review). So, when preparing our text
data, it is useful to ‘clean’ our text by removing repetitive words and standardizing meaning.
For example, if you receive a text input of:
‘My vacuum’s battery died earlier than I expected this Saturday morning
A very basic text preparation algorithm would omit the unnecessary and repetitive words
leaving you with:
‘Vacuum’s’ [subject] died [active verb] earlier [problem] Saturday morning [time]’
This stripped-down sentence format is now primed to be much easier to be tabulated
analytically

Data preparation steps


The specifics of the data preparation process vary by industry, organization, and need, but the
workflow remains largely the same.

1. Gather data
The data preparation process begins with finding the right data. This can come from an existing
data catalog or data sources can be added ad-hoc.

2. Discover and assess data


After collecting the data, it is important to discover each dataset. This step is about getting to
know the data and understanding what has to be done before the data becomes useful in a
particular context.

3. Cleanse and validate data


Cleaning up the data is traditionally the most time-consuming part of the data preparation
process, but it’s crucial for removing faulty data and filling in gaps.

Important tasks here include:


Removing extraneous data and outliers
Filling in missing values
Conforming data to a standardized pattern
Masking private or sensitive data entries

Once data has been cleansed, it must be validated by testing for errors in the data preparation
process up to this point. Often, an error in the system will become apparent during this
validation step and will need to be resolved before moving forward.

4. Transform and enrich data


Data transformation is the process of updating the format or value entries in order to reach a
well-defined outcome, or to make the data more easily understood by a wider audience.
Enriching data refers to adding and connecting data with other related information to provide
deeper insights.
5. Store data
Once prepared, the data can be stored or channeled into a third party application — such as a
business intelligence tool — clearing the way for processing and analysis to take place.

Data Preparation Tools


Data preparation is a very important process, but it also requires an intense investment of
resources. Data scientists and data analysts report that 80% of their time is spent doing data
prep, rather than analysis.

That’s where self-service data preparation tools like Talend Data Preparation come in. Cloud-
native platforms with machine learning capabilities simplify the data preparation process. This
means that data scientists and business analysts can focus on analyzing data instead of just
cleaning it.
But it also allows business professionals who may lack advanced IT skills to run the process
themselves. This makes data preparation more of a team sport rather than wasting valuable
resources and cycles with IT teams.
To get the best value out of a self-service data preparation tool, look for a platform with:
1. Talend
Talend’s self-service data preparation tool is a fast and accessible first step for any business
seeking to improve its data prep approach. And they offer a series of informative basic guides
to data prep!
2. OpenRefine
Combining a powerful, no-code, GUI with easy Python compatibility,
OpenRefine is a favorite for no-code and Python literates alike. Regardless of your coding skill
level, it’s complex data filtering capacity can be a boon to any business. Plus it’s free.
3. Paxata
Alternatively, Paxata offers a sophisticated, ‘data governing’ approach to data
preparation, promising to clean and effectively govern datasets at scale.
Design and productivity features like automatic documentation, versioning, and
operationalizing into ETL processes

Data Preparation Benefits


Recent studies showed that data preparation comprised a whopping 80% of the entire data
analysis process.
Couple this with the growing need for effective customer analysis in the current competitive
online market and you’re well on your way to understanding the importance of good data
preparation.
Eliminating Dirty Data
Data preparation helps catch errors before processing. After data has been removed from its
original source, these errors become more difficult to understand and correct. Cleaning and
reformatting dataset ensure that all data used in analysis are high quality
Future-Proofing Your Results
According to Talend, a cloud-native self-service data preparation tool, data preparation will
gain even greater importance for businesses as storage standards move to cloud-based models.
The most significant benefits of data preparation + the cloud will include improved scalability,
future proofing, and easier access and collaboration.
1. Improved Scalability - Unhampered by a need for physical storage, your data
preparation process can be developed to custom fit the now unlimited scale that your
data occupies.
2. Future Proof - aka reverse compatibility, meaning any upgrades to your data
preparation process can be applied in real-time to all incoming and previously collected
data.
3. Easier Access and Collaboration - Keeping your data on the cloud will allow for more
intuitive data prep requiring less hard-coding and no manual technical installation,
improving accessibility and thus allowing for greater collaboration.

The future of data preparation


Initially focused on analytics, data preparation has evolved to address a much broader
set of use cases and and is applicable to a larger range of users. Although it improves the
personal productivity of whoever uses it, it has evolved into an enterprise tool that fosters
collaboration between IT professionals, data experts, and business users. And with the growing
popularity of machine learning models and machine learning algorithms, having high-quality,
well-prepared data is crucial, especially as more processes involve automation, and human
intervention and oversight may exist along fewer points in data pipelines.

You might also like