0% found this document useful (0 votes)
14 views11 pages

Sma Unit Iii

Uploaded by

SIVATHMIKA C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

Sma Unit Iii

Uploaded by

SIVATHMIKA C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Analysis:

Data analysis is the process of inspecting, cleaning, transforming and modelling


data to discover useful information. This process also helps to enhance decision
making and to extract meaningful insights from raw data. Analysing data is also
an important step in understanding the trends, making predictions and ensuring
reliable and consistent results.
Key steps in Data Analysis:-
Step 1: Data Collection- Gathering relevant data from various sources (e.g.,
databases, surveys, or sensors) for analysis.
Step 2: Data Cleaning- Removing errors, inconsistencies, or missing values in the
dataset to ensure high-quality data.
Step 3: Data Transformation- Converting data into a suitable format or structure
for analysis, including normalization or aggregation.
Step 4: Data Integration- Combining data from multiple sources into a unified
view, ensuring consistency and completeness.
Step 5: Data Modelling- Applying statistical, machine learning, or other
techniques to build models that describe or predict patterns in the data.
Step 6: Data Interpretation- Analysing model outputs to draw meaningful insights
and conclusions from the data.
Step 7: Data Visualisation- Presenting data insights through charts, graphs, or
other visual tools to make the information easier to understand.
Dimensions of Analysis Taxonomy:
The analysis taxonomy categorises different aspects of data analysis with each
dimensions helping to systematically approach the analysis process and ensure
comprehensive coverage of all relevant factors or attributes.
The common dimensions on analysis taxonomy are:-
1. Purpose of Analysis:
i. Descriptive Analysis - Summarizes past data to explain what has
happened (e.g., reports, dashboards).
ii. Diagnostic Analysis - Identifies the causes behind events or trends
in data (e.g., root cause analysis).
iii. Prescriptive Analysis - Provides recommendations for actions to
achieve specific outcomes (e.g., optimization models).
iv. Predictive Analysis - Uses historical data to forecast future trends
(e.g., demand forecasting).

2. Type of Data:
i. Qualitative - Non-numerical data, often descriptive, such as
opinions or text (e.g., interviews, surveys).
ii. Quantitative - Numerical data used for statistical analysis (e.g.,
sales figures, ratings).
3. Methods and Techniques:
i. Statistical Analysis - Applies statistical tools like mean, variance,
or hypothesis testing to interpret data.
ii. Machine Learning Algorithms - Uses models that learn from data
to predict or classify outcomes (e.g., regression, classification).
iii. Text Analysis - Analyses unstructured text data to extract
meaning or patterns (e.g., sentiment analysis).
iv. Geospatial Analysis - Evaluates data based on geographic
locations (e.g., mapping trends, spatial patterns).

4. Levels of Analysis:
i. Univariate - Examines one variable at a time (e.g., distribution of
sales).
ii. Bivariate - Analyses relationships between two variables (e.g.,
correlation between age and income).
iii. Multivariate - Involves multiple variables to understand complex
relationships (e.g., factors affecting customer satisfaction).

5. Complexity:
i. Simple Analysis- Basic analytical methods like mean, median or
mode (e.g., calculating mean sales).
ii. Advanced Analysis- Involves complex methods like clustering or
regression to uncover deeper insights (e.g., customer segmentation).

6. Temporal Aspect:
i. Static Analysis- Analysing data from a single point in time (e.g., a
yearly report).
ii. Dynamic Analysis- Studies data across time periods to identify
trends or changes (e.g., sales growth over quarters).

7. Scope:
i. Micro level analysis- Focuses on individual or detailed-level data
(e.g., transaction-level analysis).
ii. Macro level analysis- Analyses aggregate or high-level data for
broader insights (e.g., market trends).

8. Outcome:
i. Operational Analysis- Supports day-to-day decision-making (e.g.,
daily sales reports).
ii. Strategic Analysis- Helps in long-term planning and high-level
decision-making (e.g., market expansion strategies).

Applying Dimensions of Taxonomy on the Analytical steps:


Applying the dimensions of taxonomy to analytical steps provides a structured
approach to data analysis. First, the purpose of the analysis must be clearly
defined, determining whether it will be descriptive, diagnostic, prescriptive, or
predictive. Once the purpose is established, the next step is to decide the type of
data to be used, either quantitative or qualitative, based on the nature of the
analysis. Depending on the data type, appropriate methods and techniques, such
as statistical tools or machine learning algorithms, should be applied. The level
of analysis—whether univariate, bivariate, or multivariate—will then guide the
depth of insights required. The complexity of the analysis, whether simple or
advanced, is chosen based on the project’s needs. Furthermore, the temporal
aspect must be considered, deciding between static analysis (a snapshot in time)
or dynamic analysis (data over time). Finally, setting the appropriate scope—
micro-level for detailed insights or macro-level for broad trends—and focusing on
operational or strategic outcomes ensures that the analysis is aligned with the
goals of the project and enhances its overall effectiveness.
The steps in Social Media Analysis:
Step 1: Identify and Define the Problem
The first step involves clearly identifying the problem or objective you wish to
address through the analysis. Defining the problem is crucial as it sets the
direction for the entire analysis process.
Step 2: Locate Relevant Data Sources
Once the problem is defined, the next step is to find the data sources that are
relevant to this problem.
 If the data precisely matches the problem identified, you can proceed to
the next steps.
 If the data doesn't align with the problem, you may need to redefine the
problem or adjust it to fit the available data.
Step 3: Expand or Identify the Scope of Data
With relevant data collected, it’s important to assess the scope of the data to
ensure it’s sufficient for addressing the problem.
 If the scope is insufficient, additional data must be gathered to expand the
dataset.
 If expanding the data isn't feasible, the problem may need to be redefined
to match the scope of the available data, similar to the waterfall model,
where you revisit previous steps based on new requirements.
Step 4: Create a Data Model
Once the data is properly scoped, a data model is created to structure and
analyze the information.
 The model is implemented using suitable algorithms and tools to derive
insights.
 If the implementation is unsuccessful, it may be necessary to revisit
earlier steps—such as data collection or problem definition—and make
adjustments before proceeding.
Step 5: Running Analytics
At this stage, you run analytics on the data using augmentation tools to enhance
efficiency and accuracy. These tools help optimize the analysis by identifying
potential errors and improving performance. It's important to ensure that the
augmentation tools chosen are compatible with the previously established data
model for smooth integration.
Step 6: Interpret Results
After running the analysis, the next step is to interpret the results. This involves
analyzing the outputs of the data model and understanding the implications of
the findings.
Step 7: Develop Insights
Finally, you develop actionable insights based on the interpreted results. These
insights should directly address the original problem and provide valuable
information for decision-making or strategy development.

EXAMPLE
Let’s take a hypothetical beauty brand, GlowBeauty, and walk through the
steps of social media analytics for the brand, correlating each step with an
example to show how they are interconnected.
Step 1: Identify and Define the Problem
The first task is to identify the specific business problem GlowBeauty wants to
solve using social media data. For example, GlowBeauty might want to
understand why engagement on their Instagram posts has decreased
over the past few months. Defining this problem clearly helps focus the analysis.
Example: GlowBeauty defines the problem as "a decrease in user engagement
on Instagram posts despite regular content updates." The goal is to uncover the
reasons behind the decline and develop strategies to boost engagement.
Step 2: Locate Relevant Data Sources
Next, GlowBeauty needs to identify where relevant data for this problem can be
found. In this case, the brand would focus on social media platforms, specifically
Instagram. Data sources could include metrics like likes, comments, shares,
follower growth, hashtag performance, and insights into audience demographics.
 Step 2.1: If GlowBeauty already has access to Instagram Insights or social
listening tools (e.g., Hootsuite, Sprout Social), they can directly use this
data.
 Step 2.2: If they lack certain data (e.g., deeper sentiment analysis or
competitor insights), they may redefine the problem to include the
collection of broader data sources like Twitter mentions, customer
reviews, or influencer interactions.
Example: GlowBeauty accesses Instagram Insights to examine post-level
metrics (likes, comments, etc.). If they find that influencer engagement is
missing from their data, they may redefine the problem to explore data sources
like influencer performance and social mentions.
Step 3: Expand or Identify the Scope of Data
Once the data is collected, the next step is to assess whether the scope of the
data is sufficient to address the problem. GlowBeauty must ensure that the data
covers enough depth and breadth, including timeframes, audience segments,
and competitor comparisons.
 Step 3.1: If the data is too limited (e.g., only covering one week of
engagement), they should expand it to include data from the past 3-6
months to capture more significant trends.
 Step 3.2: If expanding data isn’t feasible (e.g., limited access to third-
party tools), they may adjust the problem definition, narrowing the focus
to more specific metrics like follower activity patterns or post types that
work best.
Example: GlowBeauty initially examines data from the last 2 weeks, but this
scope is too narrow to detect seasonal trends or changes in engagement. They
expand their data collection to 6 months and include competitor activity to
better understand where they might be falling short.
Step 4: Create a Data Model
At this stage, GlowBeauty develops a data model to structure and analyze the
data. This might involve segmenting the audience based on demographics,
engagement behavior, or interaction patterns. Machine learning algorithms or
statistical models could also be applied to predict future engagement trends.
 Step 4.1: The implementation of the model includes using algorithms to
determine the factors that are influencing engagement (e.g., content type,
posting time, hashtags).
 Step 4.2: If the initial analysis doesn’t provide actionable insights (e.g.,
the model fails to identify clear engagement drivers), they might revisit
earlier steps to adjust their data or redefine the problem.
Example: GlowBeauty builds a model using sentiment analysis tools to assess
how positive or negative customer feedback is around their posts. They also
analyze hashtags and post timing to find correlations with engagement. If the
results don’t reveal significant patterns, they might adjust the data inputs or
rework the model.
Step 5: Running Analytics
Now, GlowBeauty runs analytics on the data using augmentation tools, such as
Hootsuite Analytics or Google Data Studio, to identify performance gaps and
errors in their social media strategy. These tools help improve data
interpretation and efficiency, making it easier to spot key patterns and optimize
content.
Example: GlowBeauty uses an augmentation tool to analyze the effectiveness of
specific post elements, such as hashtags or imagery. The tool identifies that
posts with certain hashtags (e.g., #GlowBeauty) perform better than others,
while product-focused posts have lower engagement compared to tutorial
videos.
Step 6: Interpret Results
After running the analytics, the next step is interpreting the results to uncover
meaningful insights. GlowBeauty analyzes the patterns discovered—such as
which content resonates most with their audience, what times of day get the
most engagement, or whether influencer partnerships positively impact post
reach.
Example: GlowBeauty interprets the results and finds that video content,
especially tutorials, drives the most engagement. They also notice that posts
made during evening hours have higher interaction, and that their audience
prefers behind-the-scenes content over product posts.
Step 7: Develop Insights
Finally, GlowBeauty develops actionable insights based on the analysis. These
insights should directly inform future social media strategies and guide decisions
about content, timing, and audience targeting.
Example: Based on the insights, GlowBeauty decides to shift their social media
strategy to focus more on tutorial videos, post consistently in the evening, and
use hashtags that reflect their brand identity. They also explore more
partnerships with influencers who align with their target demographic.

In summary, by systematically following these steps, GlowBeauty can effectively


use social media analytics to address declining engagement, refine their content
strategy, and better connect with their audience. Each step—from identifying the
problem to developing insights—flows logically, allowing for a targeted and data-
driven approach to social media optimization.

Machine Capacity
The amount of CPU needed to process datasets in any given time period can be
referred to as machine capacity. The capacity of machine not only refers to the
processing speed of the CPU, but also the network traffic that is required to be
lower for faster retrieval of data.
Machine capacity refers to the total computational power of a system,
encompassing more than just the processing speed of the CPU. It involves the
system’s ability to process, store, and retrieve data efficiently within a given
time frame. The CPU, or Central Processing Unit, is central to machine capacity,
as it performs the calculations and runs the operations required to process
datasets. CPU power is influenced by several factors, such as the clock speed,
which measures how many cycles per second the processor can execute, and
the number of cores, which determine how many tasks can be processed in
parallel. The more cores a CPU has, the more efficiently it can handle multiple
operations simultaneously, thus increasing overall capacity.
However, machine capacity goes beyond the CPU. Network traffic plays a crucial
role in how quickly data can be retrieved or transmitted. For systems reliant on
large datasets, especially those stored in cloud environments or across
distributed systems, the speed at which data can travel between servers and
machines directly impacts processing efficiency. If network latency or congestion
is high, it can bottleneck the system’s ability to retrieve and process data, even if
the CPU is highly capable. Thus, an efficient network setup, with low latency and
high bandwidth, is critical for maximizing machine capacity.
Depth of Analysis
Depth of Analysis refers to the thoroughness or comprehensiveness with which a
domain or a dataset or a subject is analysed. It can range from surface level
understanding to an in-depth understanding of the complexities and underlying
factors that might influence the outcome of the analysis.
It is important to achieve depth in analysis in order to identify the different
patterns and obtain better insights to the data or domain. Identifying the above
patterns and insights in the data can further help in making informed and
evidence based decisions.
A detailed examination of the data can include the following:
1. Data Quality Assessment:
Data quality assessment checks the accuracy, completeness, and reliability of
data. High-quality data ensures meaningful insights and faster processing.
Inaccurate or incomplete data leads to flawed conclusions and inefficient
processes. Trustworthy data comes from reliable sources, ensuring consistency.
Example: A beauty brand reviews whether customer data, such as product
ratings, is accurate and complete, ensuring reliable decisions on product
improvements.
2. Granularity of Data:
Granularity refers to the level of detail in data. Highly granular data provides
specific insights, while less granular data is more aggregated. Granular analysis
uncovers finer patterns and trends.
Example: A beauty brand analyzing detailed sales data (e.g., time of purchase,
product details) identifies specific customer preferences.
3. Data Exploration:
Data exploration examines patterns, structure, and trends in data. Key
components include descriptive statistics, data visualization, trend analysis, and
correlation analysis, often done through Exploratory Data Analysis (EDA).
Example: A beauty brand uses EDA to analyze customer reviews, identifying
trends in ratings and product preferences, and uncovering relationships between
customer demographics and reviews.
Key Components of EDA:
 Descriptive Statistics: This involves summarizing the central tendency
and distribution of the data using measures like mean, median, and mode.
These statistics provide a basic understanding of the data’s overall
structure and can highlight outliers or skewed distributions.
 Data Visualization: Tools like Tableau or Matplotlib are used to create
charts, graphs, and other visuals to represent the data. Visualization is
essential for identifying trends, patterns, or anomalies that might not be
obvious in raw data.
 Trend Analysis: This focuses on identifying popular or recurring patterns
within the data. For instance, a beauty brand might use trend analysis to
track the seasonal popularity of certain products or ingredients.
 Correlation Analysis: This involves assessing the cohesiveness
between two or more variables to see how they relate to each other. A
high correlation between two variables may indicate a strong relationship,
which can be valuable in predicting future behavior.
Example: A beauty brand conducting EDA on customer reviews might use
descriptive statistics to find the average rating, visualize product ratings using
bar charts, analyze trends in product preferences over time, and use correlation
analysis to determine if customer age is related to the type of products they
review positively.
Analytical Tools and Techniques for Data Analytics
To achieve depth in analysis, appropriate analytical techniques and processing
methods must be incorporated:
1. Statistical Methods – use of statistical models like regression and
hypothesis testing,
regression analysis can enhance the predictive outcomes.

2. Machine Learning and Artificial Intelligence – ML algorithms and AI can


help in deciphering intricate patterns in a data, make accurate
classifications and overcome the biases that may exist in conventional
algorithms.

3. Scenario and Simulative Analysis – scenario analysis and simulation can


help in exploring the different outcomes, possibilities and assess the
impact of various factors under different conditions.

Critical Thinking and Synthesis


Depth of analysis is characterised by critical thinking which involves:
1. Identifying Assumptions – recognising and questioning he underlying
parameters and how they might influence the results.

2. Evaluating Evidence – critically assessing the evidence and the data


supporting the conclusions. This involves distinguishing between
correlations and causations thereby establishing explanation for the
findings.

3. Synthesising Information – integrating insights from different data sources,


theoretic framework and analytical method to form a coherent and
comprehensive understanding on the subject or domain.
Identification of underlying root causes
Identifying symptoms or surface level issues requires a deeper analysis of the
following:-
1. Root Cause Analysis – investigating the underlying causes of the problem
and deriving techniques to systematically extract the root causes can help
in overcoming majority of larger problems that exists.
2. Pattern Recognition – identifying trends, patterns and establishing
anomalies in the data can indicate the existing challenges and
opportunities. This process also involves looking beyond the usual
correlative measures and derive deeper drivers of data behaviour. Drivers
of data behaviour include ratings, sales/dd etc
3. Contextualisation – placing findings within a broader context to
understand the implications and relevance, thereby involving historical
trends, broader industry dynamics and socio-economic factors.
Domain of Analysis
Domain of analysis refers to specific area or the field of study that an analysis
focuses on. It defines the boundaries within which the analysis is conducted and
determines the scope, context, relevance of data, methods and conclusions.
Domain of analysis is crucial as it influences the objectives, methodologies and
the outcomes of the analytical process. Understanding is essential because it
guides the selection of appropriate data sources, analytical techniques and
interpretation of results.
Web Scrapers - Software tools are what people mean when they use the term
web scrapers. Websites are visited, relevant pages are collected, and usable
data is extracted via a web scraping tool. These tools can find a lot of data
quickly by automating this procedure. This has clear advantages in the digital
era when data collection is continually evolving and changing and plays such a
significant role.
Types of Domains
Domain analysis can broadly be categorised into several types depending on the
context and applications
1. Scientific Domain – this domain include fields that involves empirical data
analysis to draw conclusions about natural phenomenon.

2. Business and Economic Domains - areas pertaining to finance, marketing,


operations and economic forecasting are some the fields on this domain.
Analysis on this domain are used to make informed business decisions and
obtain deeper market insights.

3. Technology and Engineering Domains – these domains cover areas like


software development, networking, cyber security, technological
manufacturing process, optimising system and extended technologies.

4. Social Science Domains – this domain includes sub fields like sociology,
social interactions, political sciences and understanding human trends in
social platforms to further comprehend the evolving social trends.

5. Other domains of interest – healthcare, environment and geographical to


name a few.
Challenges in Domain of Analysis
1. Data Silos:
Data silos occur when data is isolated within different departments or systems,
making it difficult to integrate and analyze comprehensively. This leads to
incomplete insights and inefficiencies in decision-making.
2. Domain-Specific Complexity:
Certain industries or fields have highly specialized data that requires deep
knowledge of the domain to interpret and analyze correctly. Misunderstanding
this complexity can lead to inaccurate conclusions.
3. Evolving Domain:
As industries and technologies change, the data and analysis methods must
adapt. Keeping up with these changes can be challenging, as outdated methods
or data may lead to irrelevant insights.
4. Bias and Assumption:
Preconceived notions or assumptions in the analysis process can lead to biased
outcomes. This can skew results and create misleading insights, especially if the
data or methods are not neutral.

Velocity of Data:
Velocity refers to the rate at which data flows into a system. It highlights the
speed at which data must be captured, processed, and analyzed. In today’s
world, vast amounts of data are generated continuously from various sources
like social media, IoT devices, and sensors, requiring real-time processing.
Data Generation Speed:
Data generation speed is the pace at which new data is created or collected. It
depends on factors like the number of data sources and the frequency of data
creation. High data generation speed can overwhelm systems if not managed
properly, leading to challenges in storage and processing.

1. Data Ingestion:
Data ingestion is the process of collecting and importing data from various
sources into a system for storage and analysis. It must handle high-speed
data efficiently to avoid bottlenecks.
2. Stream Processing:
Stream processing analyzes data in real time as it flows into the system,
rather than waiting for all the data to be collected. This allows for
immediate insights and actions.
3. Real-Time Analytics (Static/Dynamic):
Real-time analytics involves continuously processing data to provide
instant insights. Static data remains unchanged during analysis, while
dynamic data evolves, requiring constant updates to the analysis.
Applications of Real-Time Analytics:
 Fraud Detection: Identifying suspicious activities instantly to prevent
fraud in banking or e-commerce.
 Recommendation Systems: Providing personalized content or product
recommendations based on live user interactions.
 Operational Monitoring: Monitoring systems, networks, or machinery in
real-time to detect issues and ensure smooth operations.
Challenges of Data Velocity:
 Scalability: Handling increasing data volumes while maintaining
performance.
 Latency: Minimizing delays in processing to ensure real-time insights.
 Data Quality: Ensuring high-quality data despite the rapid influx.
 Security: Protecting fast-moving data from breaches or unauthorized
access.

You might also like