Big Data - Notes
Big Data - Notes
Volume:
● The vast amount of data generated every second from different sources like social media,
sensors, and business transactions.
● Example :Twitter generates terabytes of data from millions of tweets and interactions
daily.
Velocity:
● The speed at which data is created, processed, and analyzed in real-time or near real-time.
● Example :Tweets and trends are processed and displayed in real-time.
Variety:
● The different types of data, including structured, semi-structured, and unstructured data
(e.g., text, images, videos).
● Example :Twitter handles diverse data like text, images, videos, and user interactions.
Veracity:
● The quality or accuracy of the data, addressing the trustworthiness of the data sources and
the data itself.
● Example :Twitter must filter out inaccurate content to ensure data reliability.
Value:
● The usefulness of the data in making decisions or gaining insights, turning raw data into
something meaningful.
● Example :Businesses analyze Twitter data for insights into public sentiment and trends.
2. What are the big data applications
. Healthcare:Big Data helps doctors give better care and predict health issues early.
1
2. Telecommunication:It improves network quality and predicts future service needs.
3. Financial Firms:Big Data helps spot fraud and make smarter financial decisions.
4. Retail:Stores use Big Data to personalize offers and manage stock efficiently.
5. Law Enforcement:Police use Big Data to prevent crime by analyzing patterns.
6. New Product Development:Companies use Big Data to create products customers want.
7. Banking:Big Data helps banks detect fraud and offer personalized services.
8. Insurance:Insurers use Big Data to set prices and detect fake claims.
9. Energy Utilities:Big Data helps manage energy use and predict equipment issues.
10.Marketing: Big Data helps marketers target the right audience and create personalized
campaigns.
3. What is analytics and Explain the phases of data analytics lifecycle
efinition:"Analytics is the science of extracting insights from raw data. Many techniques and processes
D
in analytics have been automated using algorithms that work with raw data for human understanding."
ata analytics techniques can uncover trends and metrics that might be overlooked in large amounts of
D
data. This information can then be used to improve processes and increase overall efficiency in a business
or system.
1. Discovery :Identify the problem and what you want to achieve.
2. Data Preparation :Gather and clean the data to make it ready for analysis.
3. Model Planning :Choose the methods and tools you will use for analysis.
4. Model Building :Create and test your analysis model with the prepared data.
5. Communicate Results :Share your findings and insights clearly with others.
6. Operationalize :Put the insights into action so they can be used in real situations.
ata Analyticsis the process of examining and interpreting raw data to uncover useful information,
D
identify trends, and support decision-making. It involves various techniques and tools to analyze data,
transforming it into actionable insights.
D
● efinition: Summarizes historical data to provide insights about what has happened.
● Example: Analyzing past sales data to understand seasonal trends and performance.
● D efinition: Uses statistical models and machine learning techniques to forecast future outcomes
based on historical data.
● Example: Predicting customer churn by analyzing patterns in customer behavior.
● D efinition: Involves analyzing data sets to discover patterns, trends, and relationships without a
specific hypothesis in mind.
● Example: Exploring social media data to identify emerging trends or sentiment around a brand.
● D efinition: Recommends actions based on data analysis and predictive models, helping
businesses decide on the best course of action.
● Example: Using optimization techniques to recommend inventory levels based on demand
forecasts.
. Explain Statistical Interference with diagram
5
Definition:
Statistical inference is the process of using data analysis to draw conclusions about a
larger population based on a sample of data. It allows researchers to infer properties of
the population, test hypotheses, and make estimates regarding population parameters.
OR
Statistical inference is a method for drawing and measuring the reliability of conclusions
about population based on information obtained from a sample of the population.
In summary, statistical inference is a powerful tool that helps in making informed
decisions and understanding the characteristics of larger populations based on limited
data.
6. Difference Between Structured and Unstructured Digital Data
1. D efine the Problem: Clearly state the problem andwhat you want to achieve
with the analysis. Understanding the problem is key to getting accurate results.
2. Choose Variables: Select the important factors (variables)that affect the
outcome. For example, in agriculture, this might include fertilizer, rainfall, and
temperature.
3. Collect Data: Gather information on the chosen variables.Decide how you will
measure them (e.g., exact age or general categories).
4. Specify the Model: Decide on the structure of theregression model based on
your understanding of the data and the problem.
5. Select Fitting Method: Choose a statistical methodto estimate the model’s
parameters based on the collected data.
6. Fit the Model: Use the chosen method to create theregression model, which
allows you to make predictions.
7. Validate the Model: Check if the model meets necessaryassumptions. This
helps ensure your results are reliable.
8. Use the Model: Apply the regression equation to understandrelationships
between variables and make forecasts based on new data.
Summary
hese steps guide you through the process of regression analysis to effectively analyze
T
relationships and make predictions.