BDA-Unit 2
BDA-Unit 2
Simulated Annealing came from the concept of annealing in physics. This technique is used to
increase the size of crystals and to reduce the defects in crystals. This was done by heating and
then suddenly cooling of crystals.
Simulated Annealing is an optimization technique which helps us to find the global optimum
value (global maximum or global minimum) from the graph of given function. This technique is
used to choose most probable global optimum value when there is multiple number of local
optimum values in a graph.
For e.g if we are moving upwards using hill climbing algorithm our solution can stuck at some
point because hill climbing do not allow down hill so in this situation, we have to use one more
algorithm which is pure random walk, this algorithm helps to find the efficient solution that must
be global optimum.Whole algorithm is known as Simulated Annealing.
Likewise, in above graph we can see how this algorithm works to find most probable global
maximum value. In above figure, there is lot of local maximum values i.e. A,B,D but our
algorithm helps us to find the global optimum value, in this case global maximum value.
Let’s see algorithm for this technique after that we’ll see how this apply in given figure.
Algorithm :
If C old > C new then go for old solution otherwise go for new solution
Repeat steps 3 to 5 until you reach an acceptable optimized solution of given problem
Let’s try to understand how this algorithm helps us to find the global maximum value i.e. A in
this given figure.
First let’s suppose we generate a random solution and we get B point then we again generate a
random neighbor solution and we get F point then we compare the cost for both random solution,
and in this case cost of former is high so our temporary solution will be F point then we again
repeat above 3 steps and finally we got point A be the global maximum value for the given
function.
Example Code:
def anneal(sol):
old_cost = cost(sol)
T = 1.0
T_min = 0.00001
alpha = 0.9
i=1
new_sol = neighbor(sol)
new_cost = cost(new_sol)
ap = acceptance_probability(old_cost, new_cost, T)
if ap > random():
sol = new_sol
old_cost = new_cos
i += 1
T = T*alpha
In above skeleton code, you may have to fill some gaps like cost() which is used to find the cost
of solution generated, neighbor() which returns random neighbor solution and
acceptance_probability() which helps us to compare the new cost with old cost , if value returned
by this function is more than randomly generated value between 0 and 1 then we will upgrade
our cost from old to new otherwise not.
a=e(c_new-c_old)/T
Here c_new is new cost , c_old is old cost and T is temperature , temperature T is increasing by
alpha(=0.9) times in each iteration.
gets smaller value as temperature decreases(if new solution is worse than old one.
gets smaller as new solution gets more worse than old one.
Applications:
Deployment of mobile wireless base (transceiver) stations (MBTS, vehicles) is expensive, with
the wireless provider often offering a basic coverage of BTS in a normal communication data
flow. However, during a special festival celebration or a popular outdoor concert in a big city,
the quality of the wireless connection would be insufficient. In this situation, wireless provider
increase the number of MBTS to improve data communication among public.
Stochastic:
Although the definition of a stochastic process varies, it is typically characterized as a collection
of random variables indexed by some set. Without the index set being clearly described, the
phrases random process and stochastic process are considered synonyms and are used
interchangeably. The phrases “collection” and “family” are used interchangeably, whereas
“parameter set” or “parameter space” are occasionally used instead of “index set.”
Some theoretically defined stochastic processes include random walks, martingales, Markov
processes, Lévy processes, Gaussian processes, random fields, renewal processes, and branching
processes. Probability, calculus, linear algebra, set theory, and topology, as well as real analysis,
measure theory, Fourier analysis, and functional analysis, are all used in the study of stochastic
processes.
The Poisson process is a stochastic process with several definitions and applications. It’s a
counting process, which is a stochastic process in which a random number of points or
occurrences are displayed over time. A time-dependent Poisson random variable is defined as the
number of points in a process that falls between zero and a certain time. Non-negative numbers
make up the index set of this process, but natural numbers make up the state space. Because it
can be conceived of as a counting operation, this procedure is often referred to as the Poisson
counting process.
Bernoulli Process
One of the most basic stochastic processes is the Bernoulli process. It’s a set of independent and
identically distributed (iid) random variables, each with a probability of one or zero, for example,
one with probability p and zero with probability 1-p. This method is similar to repeatedly
flipping a coin, with the chance of getting a head being p and the value being one, and the
probability of receiving a tail being zero. A Bernoulli process, in other words, is a set of iid
Bernoulli random variables, with each coin flip representing a Bernoulli trial.
Random Walk
The simple random walk is a typical example of a random walk. It is a stochastic process in
discrete time with integers as the state space and is based on a Bernoulli process, with each
Bernoulli variable taking either a positive or a negative value. In other words, the simple random
walk occurs on integers, and its value grows by one with probability p or lowers by one with
probability 1-p, hence the index set of this random walk is natural numbers, but its state space is
integers. If p=0.5, this random walk is referred to as a symmetric random walk.
Let’s compare stochastic systems to other similar terms that are occasionally used as synonyms
for stochastic to gain a better grasp of it. Stochastic is synonymous with random and
probabilistic, although non-deterministic is distinct from stochastic.
Stochastic Vs Gradient
Stochastic Vs Probabilistic
The terms stochastic and probabilistic are frequently interchanged. Probabilistic is most likely
the broader term. Stochastic is dependent on a previous occurrence, such as fluctuations in stock
price based on the previous day’s price, but probabilistic is independent of other observations,
such as winning lottery numbers, which are supposed to be independent of one another.
Stochastic Vs Non-Deterministic
Deterministic refers to a variable or process that can predict the result of an occurrence based on
the current situation. In simple terms, we can state that nothing in a deterministic model is
random. Non-deterministic, on the other hand, is a variable or process in which the same input
might result in different results.
Because the outcome is unpredictable, stochasticity is often used interchangeably with non-
deterministic methods. In the way that we may undertake analysis using probability tools like
anticipated result and variance, stochasticity is slightly different from non-deterministic. As a
result, defining a variable as stochastic rather than non-deterministic is a stronger claim.
Stochastic Vs Random
Domains involving uncertainty are known as stochastics. Statistical noise or random errors can
cause uncertainty in a target or objective function. It could also be due to the fact that the data
used to fit a model is a sample of a larger population. Finally, the models adopted are rarely able
to capture all elements of the domain, and must instead generalize to unknown scenarios,
resulting in a loss of fidelity.
Optimization approaches that create and employ random variables are known as stochastic
optimization (SO). Random variables exist in the formulation of the optimization problem itself
for stochastic issues, which incorporates random objective functions or random constraints.
Random iterate methods are also included in stochastic optimization approaches. The following
are some instances of stochastic optimization algorithms:
Below are some general and popular applications which involve the stochastic processes:-
Stochastic models are used in financial markets to reflect the seemingly random
behaviour of assets such as stocks, commodities, relative currency values (i.e., the price
of one currency relative to another, such as the price of the US Dollar relative to the price
of the Euro), and interest rates.
Manufacturing procedures are thought to be stochastic. This assumption holds true for
both batch and continuous manufacturing processes. A process control chart depicts a
particular process control parameter across time and is used to record testing and
monitoring of the process.
Stanislaw Ulam and Nicholas Metropolis popularized the Monte Carlo approach, which
is a stochastic method. The use of randomness and the repetitive nature of the procedure
is reminiscent of casino activities. Simulation and statistical sampling methods were
typically used to test a previously understood deterministic problem, rather than the other
way around. Though historical examples of an “inverted” technique exist, they were not
regarded as a generic strategy until the Monte Carlo method gained popularity.
GENETIC ALGORITHM:
A genetic algorithm is a search heuristic that is inspired by Charles Darwins theory of natural
evolution. This algorithm reflects the process of natural selection where the fittest individuals are
selected for reproduction in order to produce offspring of the next generation.
The process of natural selection starts with the selection of fittest individuals from a population.
They produce offspring which inherit the characteristics of the parents and will be added to the
next generation. If parents have better fitness, their offspring will be better than parents and have
a better chance at surviving. This process keeps on iterating and at the end, a generation with the
fittest individuals will be found. This notion can be applied for a search problem. We consider a
set of solutions for a problem and select the set of best ones out of them.
1. Initial population
2. Fitness function
3. Selection
4. Crossover
5. Mutation
6. Initial Population
The process begins with a set of individuals which is called a Population. Each individual is a
solution to the problem you want to solve.
Fitness Function
The fitness function determines how fit an individual is (the ability of an individual to compete
with other individuals). It gives a fitness score to each individual. The probability that an
individual will be selected for reproduction is based on its fitness score.
Selection
The idea of selection phase is to select the fittest individuals and let them pass their genes to the
next generation.Two pairs of individuals (parents) are selected based on their fitness scores.
Individuals with high fitness have more chance to be selected for reproduction.
Crossover
Crossover is the most significant phase in a genetic algorithm. For each pair of parents to be
mated, a crossover point is chosen at random from within the genes.
Crossover point
Offspring are created by exchanging the genes of parents among themselves until the
crossover point is reached.
Exchanging genes among parents
The new offspring are added to the population.
Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached.
Mutation
In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped.
Mutation occurs to maintain diversity within the population and prevent premature convergence.
Termination
The algorithm terminates if the population has converged (does not produce offspring which are
significantly different from the previous generation). Then it is said that the genetic algorithm
has provided a set of solutions to our problem.
Pseudocode
START
Compute fitness
REPEAT
Selection
Crossover
Mutation
Compute fitness
UNTIL population has converged
STOP
Genetic Algorithms are categorized as global search heuristics. A genetic algorithm is a search
technique used in computing to find true or approximate solutions to optimization and search
problems. It uses techniques inspired by biological evolution such as inheritance, mutation,
selection, and crossover.
Evaluate: next, the population is evaluated by assigning a fitness value to each individual in the
population. In this stage we would often want to take note of the current fittest solution, and the
average fitness of the population.After evaluation, the algorithm decides whether it should
terminate the search depending on the termination conditions set.
Crossover: the next stage is to apply crossover and mutation to the selected individuals. This
stage is where new individuals (children) are created for the next generation.
Mutation: at this point the new population goes back to the evaluation step and the process starts
again. We call each cycle of this loop a generation.
A third-generation can be formed in which the word Inh can produce the word Anh in which the
I is randomly mutated into A. This example is pretty simple to understand GA.
_letters = [a..zA..Z]
target = "Anh"
guess = get 3 random letters from _letters while guess != target:
GENETIC PROGRAMMING:
Genetic programming is a form of artificial intelligence that mimics natural selection in order to
find an optimal result. Genetic programming is iterative, and at each new stage of the algorithm,
it chooses only the fittest of the “offspring” to cross and reproduce in the next generation, which
is sometimes referred to as a fitness function. Just like in biological evolution, evolutionary
algorithms can sometimes have randomly mutating offspring, but since only the offspring that
have the highest fitness measure are reproduced, the fitness will almost always improve over
generations. Genetic programming will generally terminate once it reaches a predefined fitness
measure. Additionally, architecture-altering operations can be introduced to an already running
program in order to allow for new sources of information to be analyzed for a given fitness
function.
Saving time: Genetic algorithms are able to process large amounts of data much more quickly
than humans can. Additionally, these algorithms run free of human biases, and are thereby able
to come up with ideas that might otherwise not have been considered.
Data and text classification: Genetic programming can quickly identify and classify various
forms of data without the need for human oversight. Genetic programming can use data tree
construction in order to optimize these classifications, especially when dealing with big data.
Ensuring network security: Rule evolution approaches have been successfully applied to
identify new attacks on networks. By quickly identifying intrusions, businesses and
organizations can ensure that they can respond to such attacks before they are able to access
confidential information.
Supporting other machine learning methods: Genetic programming can be included in larger
systems of machine learning, such as with neural networks. By having genetic programming
focus on only specific subsets of data, organizations can ensure that this data is quickly
processed for ingestion into larger or different learning methods. This allows organizations to
gain as much useful and actionable information as possible.
DATA VISUALIZATION:
The importance of data visualization is simple: it helps people see, interact with, and better
understand data. Whether simple or complex, the right visualization can bring everyone on the
same page, regardless of their level of expertise.
Its hard to think of a professional industry that doesnt benefit from making data more
understandable. Every STEM field benefits from understanding data—and so do fields in
government, finance, marketing, history, consumer goods, service industries, education, sports,
and so on.
While well always wax poetically about data visualization (youre on the Tableau website, after
all) there are practical, real-life applications that are undeniable. And, since visualization is so
prolific, its also one of the most useful professional skills to develop. The better you can convey
your points visually, whether in a dashboard or a slide deck, the better you can leverage that
information. The concept of the citizen data scientist is on the rise. Skill sets are changing to
accommodate a data-driven world. It is increasingly valuable for professionals to be able to use
data to make decisions and use visuals to tell stories of when data informs the who, what, when,
where, and how.
While traditional education typically draws a distinct line between creative storytelling and
technical analysis, the modern professional world also values those who can cross between the
two: data visualization sits right in the middle of analysis and visual storytelling.
Big data is crucial because of its untapped potential, but recent technology such as visual
analytics finally allows businesses to discover critical, even surprising insights that give us a
clearer view into processes and human behaviors.
Structured data
Structured data is the neatly organized data you keep in databases, datasets, and spreadsheets. Its
easy for traditional analytics tools to read this data. Organizing unstructured data into structured
data is time-consuming, but possible with the right solution. It involves data cataloging, data
mapping, and data transformation. You can learn more about these processes here.
Unstructured data
Unstructured data, or raw data, is increasing at a higher rate compared to structured data.
Platforms like Facebook generate hundreds of terabytes of information per day. Unstructured
data can also include survey data from customers, notes, and emails. Because unstructured data
is growing, big data technologies that can seamlessly analyze this data will be crucial to
businesses. Solutions like Hadoop are very adept at ingesting raw data for analysis.
Semi-structured
Semi-structured data has some organizational structure, but isnt easy to analyze as-is. With some
organizing or cleaning, semi-structured data could be imported into a relational database just like
structured data. Semi-structured data and structured data can be analyzed and visualized with
solutions like Tableau. With a combination of solutions like Hadoop and Tableau, all three of
these types of data can be used for analysis.
Its easy to be overwhelmed by big data, but the good news is that technologies and analytics
platforms are becoming more efficient and comfortable to use. Industries and teams such as
sales, IT, and government agencies have used their big data to discover trends and reduce
analysis time. To effectively use big data, focus on the following best practices:
A big data best practice is to always think about long-term solutions. As discussed, big data is
growing at a fast, steep trajectory; so your data management solution and strategy should scale
with it. Technology will evolve and work together more easily. Dont be afraid to upgrade to new
innovative solutions. Example: Abercrombie & Fitch saw the benefits of upgrading from
spreadsheets and deployed Tableau fast so they could use match shopper insights with inventory.
Secondly, be aware that companies often need more than one solution to manage their big data
from first ingestion to final data visualizations. This isnt a bad thing. The best big data platforms
can talk to each other and form a symbiotic relationship. Example: PepsiCo worked with Tableau
and Trifacta to wrangle disparate data and uncover insights.
Third, keep in mind that data cultures empower their business teams to use and play with data.
With big data, all hands are on deck. Thats the only way to keep up with the volume and velocity
of data. Example: Charles Schwab knew this, so they democratized data analysis across hundreds
of branch locations. Read more about best practices for big data.
Graph: A diagram of points, lines, segments, curves, or areas that represents certain variables in
comparison to each other, usually along two axes at a right angle.
Geospatial: A visualization that shows data in map form using different shapes and colors to
show the relationship between pieces of data and specific locations.
Infographic: A combination of visuals and words that represent data. Usually uses charts or
diagrams.
Dashboards: A collection of visualizations and data displayed in one place to help with
analyzing and presenting data.
Area Map: A form of geospatial visualization, area maps are used to show specific values set
over a map of a country, state, county, or any other geographic location. Two common types of
area maps are choropleths and isopleths.
Bar Chart: Bar charts represent numerical values compared to each other. The length of the bar
represents the value of each variable
Box-and-whisker Plots: These show a selection of ranges (the box) across a set measure (the
bar).
Bullet Graph: A bar marked against a background to show progress or performance against a
goal, denoted by a line on the graph.
Gantt Chart: Typically used in project management, Gantt charts are a bar chart depiction of
timelines and tasks.
Heat Map: A type of geospatial visualization in map form which displays specific data values as
different colors (this doesnt need to be temperatures, but that is a common use).
Highlight Table: A form of table that uses color to categorize similar data, allowing the viewer
to read it more easily and intuitively.
Histogram: A type of bar chart that split a continuous measure into different bins to help
analyze the distribution. .
Pie Chart: A circular chart with triangular segments that shows data as a percentage of a whole.
Treemap: A type of chart that shows different, related values in the form of rectangles nested
together.
ADVANTAGES:
DISADVANTAGES:
Data analysis is the process of collecting, modeling, and analyzing data to extract insights that
support decision-making. There are several methods and techniques to perform analysis
depending on the industry and the aim of the investigation.
Visual analytics is the use of sophisticated tools and processes to analyze datasets using visual
representations of the data. Visualizing the data in graphs, charts, and maps helps users identify
patterns and thereby develop actionable insights. These insights help organizations make better,
data-driven decisions.
1. Collaborate your needs
Before you begin analyzing or drilling down into any techniques, its crucial to sit down
collaboratively with all key stakeholders within your organization, decide on your primary
campaign or strategic goals, and gain a fundamental understanding of the types of insights that
will best benefit your progress or provide you with the level of vision you need to evolve your
organization.
Once youve outlined your core objectives, you should consider which questions will need
answering to help you achieve your mission. This is one of the most important techniques as it
will shape the very foundations of your success.
To help you ask the right things and ensure your data works for you, you have to ask the right
data analysis questions.
3. Data democratization
After giving your data analytics methodology some real direction, and knowing which questions
need answering to extract optimum value from the information available to your organization,
you should continue with democratization.
Data democratization is an action that aims to connect data from various sources efficiently and
quickly so that anyone in your organization can access it at any given moment. You can extract
data in text, images, videos, numbers, or any other format. And then perform cross-database
analysis to achieve more advanced insights to share with the rest of the company interactively.
4. Think of governance
When collecting data in a business or research context you always need to think about security
and privacy. With data breaches becoming a topic of concern for businesses, the need to protect
your client's or subjects sensitive information becomes critical.
After harvesting from so many sources you will be left with a vast amount of information that
can be overwhelming to deal with. At the same time, you can be faced with incorrect data that
can be misleading to your analysis. The smartest thing you can do to avoid dealing with this in
the future is to clean the data. This is fundamental before visualizing it, as it will ensure that the
insights you extract from it are correct.
Once youve set your sources, cleaned your data, and established clear-cut questions you want
your insights to answer, you need to set a host of key performance indicators (KPIs) that will
help you track, measure, and shape your progress in a number of key areas.
KPIs are critical to both qualitative and quantitative analysis research. This is one of the primary
methods of data analysis you certainly shouldnt overlook.
Having bestowed your data analysis techniques and methods with true purpose and defined your
mission, you should explore the raw data youve collected from all sources and use your KPIs as
a reference for chopping out any information you deem to be useless.
While, at this point, this particular step is optional (you will have already gained a wealth of
insight and formed a fairly sound strategy by now), creating a data governance roadmap will help
your data analysis methods and techniques become successful on a more sustainable basis. These
roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.
9. Integrate technology
There are many ways to analyze data, but one of the most vital aspects of analytical success in a
business context is integrating the right decision support software and technology.
Robust analysis platforms will not only allow you to pull critical data from your most valuable
sources while working with dynamic KPIs that will offer you actionable insights; it will also
present them in a digestible, visual, interactive format from one central, live dashboard. A data
methodology you can count on.
By integrating the right technology within your data analysis methodology, youll avoid
fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum
value from your businesss most valuable insights.
By considering each of the above efforts, working with the right technology, and fostering a
cohesive internal culture where everyone buys into the different ways to analyze data as well as
the power of digital intelligence, you will swiftly start to answer your most burning business
questions. Arguably, the best way to make your data concepts accessible across the organization
is through data visualization.
Online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing
users across the organization to extract meaningful insights that aid business evolution — and it
covers all the different ways to analyze data.
Confirmation bias: This phenomenon describes the tendency to select and interpret only the data
necessary to prove one hypothesis, often ignoring the elements that might disprove it. Even if it's
not done on purpose, confirmation bias can represent a real problem, as excluding relevant
information can lead to false conclusions and, therefore, bad business decisions. To avoid it,
always try to disprove your hypothesis instead of proving it, share your analysis with other team
members, and avoid drawing any conclusions before the entire analytical project is finalized.
Now, were going to look at how you can bring all of these elements together in a way that will
benefit your business - starting with a little something called data storytelling.
The human brain responds incredibly well to strong stories or narratives. Once youve cleansed,
shaped, and visualized your most invaluable data using various BI dashboard tools, you should
strive to tell a story - one with a clear-cut beginning, middle, and end. By doing so, you will
make your analytical efforts more accessible, digestible, and universal, empowering more people
within your organization to use your discoveries to their actionable advantage.
Autonomous technologies, such as artificial intelligence (AI) and machine learning (ML), play a
significant role in the advancement of understanding how to analyze data more effectively.
Gartner predicts that by the end of this year, 80% of emerging technologies will be developed
with AI foundations. This is a testament to the ever-growing power and value of autonomous
technologies. At the moment, these technologies are revolutionizing the analysis industry. Some
examples that we mentioned earlier are neural networks, intelligent alarms, and sentiment
analysis.
If you work with the right tools and dashboards, you will be able to present your metrics in a
digestible, value-driven format, allowing almost everyone in the organization to connect with
and use relevant data to their advantage.
Modern dashboards consolidate data from various sources, providing access to a wealth of
insights in one centralized location, no matter if you need to monitor recruitment metrics or
generate reports that need to be sent across numerous departments. Moreover, these cutting-edge
tools offer access to dashboards from a multitude of devices, meaning that everyone within the
business can connect with practical insights remotely - and share the load.
Business Intelligence: BI tools allow you to process significant amounts of data from several
sources in any format. Like this, you can not only analyze and monitor your data to extract
relevant insights but also create interactive reports and dashboards to visualize your KPIs and use
them for your company's good. datapine is an amazing online BI software that is focused on
delivering powerful online analysis features that are accessible for beginner and advanced users.
Like this, it offers a full-service solution that includes cutting-edge analysis of data, KPIs
visualization, live dashboards, and reporting, as well as artificial intelligence technologies to
predict trends and minimize risk.
Last is a step that might seem obvious to some people, but it can be easily ignored if you think
your are done. Once you have extracted the needed results, you should always take a
retrospective look at your project and think about what you can improve. As you saw throughout
this long list of techniques, data analysis is a complex process that requires constant refinement.
For this reason, you should always go one step further and keep improving.
DATA VISUALIZATION TECHNIQUES:
The type of data visualization technique you leverage will vary based on the type of data youre
working with, in addition to the story youre telling with your data.
1. Pie Chart
2. Bar Chart
3. Histogram
4. Gantt Chart
5. Heat Map
6. Box and Whisker Plot
7. Waterfall Chart
8. Area Chart
9. Scatter Plot
10. Pictogram Chart
11. Timeline
12. Highlight Table
13. Bullet Graph
14. Choropleth Map
15. Word Cloud
16. Network Diagram
17. Correlation Matrices
1. Pie Chart
Pie charts are one of the most common and basic data visualization techniques, used across a
wide range of applications. Pie charts are ideal for illustrating proportions, or part-to-whole
comparisons.
Because pie charts are relatively simple and easy to read, theyre best suited for audiences who
might be unfamiliar with the information or are only interested in the key takeaways. For viewers
who require a more thorough explanation of the data, pie charts fall short in their ability to
display complex information.
2. Bar Chart
The classic bar chart, or bar graph, is another common and easy-to-use method of data
visualization. In this type of visualization, one axis of the chart shows the categories being
compared, and the other, a measured value. The length of the bar indicates how each group
measures according to the value.
One drawback is that labeling and clarity can become problematic when there are too many
categories included. Like pie charts, they can also be too simple for more complex data sets.
3. Histogram
Unlike bar charts, histograms illustrate the distribution of data over a continuous interval or
defined period. These visualizations are helpful in identifying where values are concentrated, as
well as where there are gaps or unusual values.
4. Gantt Chart
Gantt charts are particularly common in project management, as theyre useful in illustrating a
project timeline or progression of tasks. In this type of chart, tasks to be performed are listed on
the vertical axis and time intervals on the horizontal axis. Horizontal bars in the body of the chart
represent the duration of each activity.
Utilizing Gantt charts to display timelines can be incredibly helpful, and enable team members to
keep track of every aspect of a project. Even if youre not a project management professional,
familiarizing yourself with Gantt charts can help you stay organized.
5. Heat Map
A heat map is a type of visualization used to show differences in data through variations in color.
These charts use color to communicate values in a way that makes it easy for the viewer to
quickly identify trends. Having a clear legend is necessary in order for a user to successfully read
and interpret a heatmap.
There are many possible applications of heat maps. For example, if you want to analyze which
time of day a retail store makes the most sales, you can use a heat map that shows the day of the
week on the vertical axis and time of day on the horizontal axis. Then, by shading in the matrix
with colors that correspond to the number of sales at each time of day, you can identify trends in
the data that allow you to determine the exact times your store experiences the most sales.
A box and whisker plot, or box plot, provides a visual summary of data through its quartiles.
First, a box is drawn from the first quartile to the third of the data set. A line within the box
represents the median. “Whiskers,” or lines, are then drawn extending from the box to the
minimum (lower extreme) and maximum (upper extreme). Outliers are represented by individual
points that are in-line with the whiskers.
This type of chart is helpful in quickly identifying whether or not the data is symmetrical or
skewed, as well as providing a visual summary of the data set that can be easily interpreted.
7. Waterfall Chart
A waterfall chart is a visual representation that illustrates how a value changes as its influenced
by different factors, such as time. The main goal of this chart is to show the viewer how a value
has grown or declined over a defined period. For example, waterfall charts are popular for
showing spending or earnings over time.
8. Area Chart
An area chart, or area graph, is a variation on a basic line graph in which the area underneath the
line is shaded to represent the total value of each data point. When several data series must be
compared on the same graph, stacked area charts are used.
This method of data visualization is useful for showing changes in one or more quantities over
time, as well as showing how each quantity combines to make up the whole. Stacked area charts
are effective in showing part-to-whole comparisons.
9. Scatter Plot
Another technique commonly used to display data is a scatter plot. A scatter plot displays data
for two variables as represented by points plotted against the horizontal and vertical axis. This
type of data visualization is useful in illustrating the relationships that exist between variables
and can be used to identify trends or correlations in data.
Scatter plots are most effective for fairly large data sets, since its often easier to identify trends
when there are more data points present. Additionally, the closer the data points are grouped
together, the stronger the correlation or trend tends to be.
Pictogram charts, or pictograph charts, are particularly useful for presenting simple data in a
more visual and engaging way. These charts use icons to visualize data, with each icon
representing a different value or category. For example, data about time might be represented by
icons of clocks or watches. Each icon can correspond to either a single unit or a set number of
units (for example, each icon represents 100 units).
In addition to making the data more engaging, pictogram charts are helpful in situations where
language or cultural differences might be a barrier to the audiences understanding of the data.
11. Timeline
Timelines are the most effective way to visualize a sequence of events in chronological order.
Theyre typically linear, with key events outlined along the axis. Timelines are used to
communicate time-related information and display historical data.
Timelines allow you to highlight the most important events that occurred, or need to occur in the
future, and make it easy for the viewer to identify any patterns appearing within the selected time
period. While timelines are often relatively simple linear visualizations, they can be made more
visually appealing by adding images, colors, fonts, and decorative shapes.
INTERACTION TECHNIQUES:
Interactive data visualization supports exploratory thinking so that decision-makers can actively
investigate intriguing findings. Interactive visualization supports faster decision making, greater
data access and stronger user engagement along with desirable results in several other metrics.
Some of the key findings include:
Visuals are especially helpful when youre trying to find relationships among hundreds or
thousands of variables to determine their relative importance. Click to explore about, Data
Visualization with React and Graph QL.
Todays 93% of human communication is visual, and it tells that human eyes are processing
images 60,000 times more than the text-based data.
DATA TYPES:
Having a good understanding of the different data types, also called measurement scales, is a
crucial prerequisite for doing exploratory data analysis (EDA), since you can use certain
statistical measurements only for specific data types.
You also need to know which data type you are dealing with to choose the right visualization
method. Think of data types as a way to categorize different types of variables. We will discuss
the main types of variables and look at an example for each. We will sometimes refer to them as
measurement scales.
CATEGORICAL DATA
Categorical data represents characteristics. Therefore it can represent things like a persons
gender, language, etc. Categorical data can also take on numerical values (Example: 1 for female
and 0 for male). Note that those numbers dont have mathematical meaning.
NOMINAL DATA
Nominal values represent discrete units and are used to label variables that have no quantitative
value. Just think of them as “labels.” Note that nominal data that has no order. Therefore, if you
would change the order of its values, the meaning would not change. You can see two examples
of nominal features below:
Ordinal values represent discrete and ordered units. It is therefore nearly the same as nominal
data, except that its ordering matters. You can see an example below:
Note that the difference between Elementary and High School is different from the difference
between High School and College. This is the main limitation of ordinal data, the differences
between the values is not really known. Because of that, ordinal scales are usually used to
measure non-numeric features like happiness, customer satisfaction and so on.
Numerical Data
DISCRETE DATA
We speak of discrete data if its values are distinct and separate. In other words: We speak of
discrete data if the data can only take on certain values. This type of data cant be measured but it
can be counted. It basically represents information that can be categorized into a classification.
An example is the number of heads in 100 coin flips.
CONTINUOUS DATA
Continuous data represents measurements and therefore their values cant be counted but they can
be measured. An example would be the height of a person, which you can describe by using
intervals on the real number line.
Interval Data
Interval values represent ordered units that have the same difference. Therefore we speak of
interval data when we have a variable that contains numeric values that are ordered and where
we know the exact differences between the values. An example would be a feature that contains
temperature of a given diagram.
Ratio Data
Ratio values are also ordered units that have the same difference. Ratio values are the same as
interval values, with the difference that they do have an absolute zero. Good examples are height,
weight, length, etc.
Box plots
Histograms
Heat maps
Charts
Tree maps
Box Plots
The image above is a box plot. A boxplot is a standardized way of displaying the distribution of
data based on a five-number summary (“minimum”, first quartile (Q1), median, third quartile
(Q3), and “maximum”). It can also tell you if your data is symmetrical, how tightly your data is
grouped, and if and how your data is skewed.
List of Methods to Visualize Data
Column Chart: It is also called a vertical bar chart where each category is represented
by a rectangle. The height of the rectangle is proportional to the values that are plotted.
Bar Graph: It has rectangular bars in which the lengths are proportional to the values
which are represented.
Stacked Bar Graph: It is a bar style graph that has various components stacked together
so that apart from the bar, the components can also be compared to each other.
Stacked Column Chart: It is similar to a stacked bar; however, the data is stacked
horizontally.
Area Chart: It combines the line chart and bar chart to show how the numeric values of
one or more groups change over the progress of a viable area.
Dual Axis Chart: It combines a column chart and a line chart and then compares the two
variables.
Line Graph: The data points are connected through a straight line; therefore, creating a
representation of the changing trend.
Mekko Chart: It can be called a two-dimensional stacked chart with varying column
widths.
Pie Chart: It is a chart where various components of a data set are presented in the form
of a pie which represents their proportion in the entire data set.
Waterfall Chart: With the help of this chart, the increasing effect of sequentially
introduced positive or negative values can be understood.
Scatter Plot Chart: It is also called a scatter chart or scatter graph. Dots are used to
denote values for two different numeric variables.
Bullet Graph: It is a variation of a bar graph. A bullet graph is used to swap dashboard
gauges and meters.
Funnel Chart: The chart determines the flow of users with the help of a business or sales
process.
Heat Map: It is a technique of data visualization that shows the level of instances as
color in two dimensions.
Charts
Line Chart
The simplest technique, a line plot is used to plot the relationship or dependence of one variable
on another. To plot the relationship between the two variables, we can simply call the plot
function.
Bar Charts
Bar charts are used for comparing the quantities of different categories or groups. Values of a
category are represented with the help of bars and they can be configured with vertical or
horizontal bars, with the length or height of each bar representing the value.
Pie Chart
It is a circular statistical graph which decides slices to illustrate numerical proportion. Here the
arc length of each slide is proportional to the quantity it represents. As a rule, they are used to
compare the parts of a whole and are most effective when there are limited components and
when text and percentages are included to describe the content. However, they can be difficult to
interpret because the human eye has a hard time estimating areas and comparing visual angles.
Scatter Charts
Bubble Charts
It is a variation of scatter chart in which the data points are replaced with bubbles, and an
additional dimension of data is represented in the size of the bubbles.
Timeline Charts
Timeline charts illustrate events, in chronological order — for example the progress of a project,
advertising campaign, acquisition process — in whatever unit of time the data was recorded —
for example week, month, year, quarter. It shows the chronological sequence of past or future
events on a timescale.
Tree Maps
A treemap is a visualization that displays hierarchically organized data as a set of nested
rectangles, parent elements being tiled with their child elements. The sizes and colours of
rectangles are proportional to the values of the data points they represent. A leaf node rectangle
has an area proportional to the specified dimension of the data.