It125 Finals
It125 Finals
ORDINAL DATA
1. Ordinal data have natural ordering where
a number is present in some kind of order
QUALITATIVE VS. QUANTITATIVE by their position on the scale.
2. Ordinal data are used for observation like
customer satisfaction, happiness, etc., but
we can't do any arithmetical tasks on
them.
3. Ordinal data is qualitative data for which
their values have some kind of relative
position.
4. These kinds of data can be considered
"in-between" qualitative and quantitative
data. The ordinal data only shows the
CATEGORIES OF DATA
sequences and cannot use for statistical
analysis.
1. Data are further classified into four
categories : Nominal data, Ordinal data,
5. Compared to nominal data, ordinal data CONTINUOUS DATA
have some kind of order that is not present
1. Continuous data are in the form of
in nominal data.
fractional numbers or real numbers. It
6. Examples of Ordinal Data :
can be the version of an android phone,
● Feedback, experience, or
the height of a person, the length of an
satisfaction on a scale of 1 to 10
object, etc.
● Letter grades in the exam (A, B, C, D,
2. Continuous data represents information
etc.)
that can be divided into smaller levels. The
● Ranking of people in a competition
continuous variable can take any value
(First, Second, Third, etc.)
within a range.
● Economic Status (High, Medium,
3. The key difference between discrete and
and Low)
continuous data is that discrete data
● Education Level (Higher, Secondary,
contains the integer or whole number
Primary)
while continuous data contains numeric
NOMINAL VS ORDINAL DATA value with fractional b..
4. The continuous data stores the fractional
NOMINAL ORDINAL
Can’t be quantified, neither Give some kind of numbers to record different types of data
they have any intrinsic sequential order by their such as temperature, height, width, time,
ordering position on the scale speed, etc.
Is qualitative or categorical Said to be “in-between 5. Examples of Continuous Data :
data qualitative and quantitative
data ● Height of a person
Does not provide any Provides sequence and can ● Speed of a vehicle
quantitative value, neither assign numbers to ordinal ● "Time-taken" to finish the work
can we perform any data but cannot perform ● Wi-Fi Frequency
arithmetical operation the arithmetical operation
Cannot be used to Can help to compare one ● Market share price
compare with one another item with another by
ranking or ordering DIFFERENCE BETWEEN DISCRETE AND
Eye color, housing style, Economic status, customer CONTINUOUS
gender, hair, color, religion, satisfaction, education
marital status, etc. level, letter grades DISCRETE CONTINUOUS
Are countable and finite; Are measurable; they are in
they are whole numbers or the form of fractions or
DISCRETE DATA
integers decimals
Represented mainly by bar Represented in the form of
1. The discrete data contain the values that graphs a histogram
fall under integers or whole numbers. The Are values that cannot be Values that can be divided
term discrete means distinct or separate. divided into subdivisions into subdivisions and
and smaller pieces smaller pieces
2. The total number of students in a class is
Have spaces between the In the form of continuous
an example of discrete data. values sequences
3. These data can't be broken into decimal or Total number of students in Temperature of room, the
fraction values. class, number of days in a weight of a person, length
week, size of a shoe, etc. . of an object
4. The discrete data are countable and have
finite values; their subdivision is not
3.3 INTRODUCTION TO DATA ANALYTICS
possible. These data are represented
mainly by a bar graph, number line, or TYPES OF DATA ANALYTICS
frequency table. Various approaches to data analytics include
5. Examples of Discrete Data : looking at what happened (descriptive analytics),
● Total numbers of students present why something happened (diagnostic analytics),
in a class what is going to happen (predictive analytics), or
● Cost of a cell phone what should be done next (prescriptive analytics).
● Numbers of employees in a
TYPES OF DATA ANALYTICS QUESTIONS ANSWERED
company
Descriptive Analytics What happened?
● The total number of players who Diagnostic Analytics Why did it happen?
participated in a competition Predictive Analytics What will happen?
● Days in a week Prescriptive Analytics What should we do?
DESCRIPTIVE ANALYTICS behind specific outcomes or trends
observed in descriptive analytics.
1. Descriptive analytics is the examination of 3. Diagnostics analytics identifies trends or
data or content to answer the question patterns in the past and then goes a step
"What happened?" (or What is further to explain why the trends occurred
happening?), characterized by traditional the way they did. It's a logical step after
business intelligence (BI) and visualizations descriptive analytics because it answers
such as pie charts, bar charts, line graphs, questions like why a certain amount was
tables, or generated narratives. sold or why Q1 targets were hit.
2. Descriptive analytics involves analyzing 4. Diagnostic analytics is also a useful tool for
historical data to understand what businesses that want more confidence to
happened in the past. It focuses on duplicate good outcomes and avoid
summarizing and visualizing data to negative ones. Descriptive analytics can
provide insights into trends, patterns, and tell you what happened but then it is up to
relationships. your team to figure out what to do with
3. Descriptive analytics examines what that data.
happened in the past. You're utilizing 5. Diagnostic analytics applies data to figure
descriptive analytics when you examine out why something happened so you can
past data sets for patterns and trends. This develop better strategies without so much
is the core of most businesses' analytics trial and error. The main flaw with
because it answers important questions diagnostic analytics is its limitation of
like how much you sold and if you hit providing actionable observations about
specific goals. Its easy to understand even the future by focusing on past occurrences.
for non-data analysts. 6. Understanding the causal relationships
4. Descriptive analytics functions by and sequences may be enough for some
identifying what metrics you want to businesses, but it may not provide
measure, collecting that data, and sufficient answers for others. For the latter,
analyzing it. It turns the stream of facts managing big data will likely require more
your business has collected into advanced analytics solutions and you
information you can act on, plan around, might have to implement additional tools -
and measure. venturing into predictive or prescriptive
5. Once descriptive analytics is done, it's up to analytics - to find meaningful insights
your team to ask how or why those trends 7. Diagnostic analytics answer the following
occurred, brainstorm and develop possible questions:
responses or solutions, and choose how to ● Why did year-over-year sales go up?
move forward. ● Why did a certain product perform
6. Example: Generating reports, creating above expectations?
dashboards, and using data visualization ● Why did we lose customers in Q3?
techniques like charts and graphs to 8. Example of diagnostic analytics include
present historical data. Other examples conducting root cause analysis, performing
include Annual revenue reports, survey variance analysis and using techniques like
response summaries and year-over-year drill-down and data discovery to
sales reports investigate data anomalies
1. Diagnostic analytics helps explain why 1. Predictive analytics aims to predict likely
things happened the way they did. It's a outcomes and make educated forecasts
more complex version of descriptive using historical data Simply put, it seeks to
analytics, extending beyond what answer the question. "What will happen?".
happened to why it happened Predictive analytics involves using
2. Diagnostic analytics involves digging historical data to predict future outcomes
deeper into historical data to understand or trends. It focuses on forecasting and
why certain events occurred. It focuses on making informed predictions based on
identifying the root causes or factors patterns and relationships identified in
historical data.
2. Predictive analytics extends trends into the ● Building predictive models, using
future to see possible outcomes. This is a statistical techniques like regression
more complex version of data analytics analysis and machine learning
because it uses probabilities for algorithms to forecast future sales,
predictions instead of simply interpreting demand, or customer behavior.
existing facts.
PRESCRIPTIVE ANALYTICS
3. Use predictive analytics by first identifying
what you want to predict and then 1. Prescriptive analytics is the use of
bringing existing data together to project advanced processes and tools to analyze
possibilities to a particular date. Statistical data and content to recommend the
modeling or machine learning are optimal course of action or strategy
commonly used with predictive analytics. moving forward. Simply put, it seeks to
This is how you answer planning questions answer the question, "What should we
such as how much you might sell or if do?". Prescriptive analytics involves
you're on track to hit your Q4 targets recommending actions or decisions based
4. A business is in a better position to set on insights derived from descriptive,
realistic goals and avoid risks if they use diagnostic, and predictive analytics. It
data to create a list of likely outcomes. focuses on providing actionable
Predictive analytics can keep your team or recommendations to optimize outcomes
the company as a whole aligned on the or achieve specific objectives.
same strategic vision. 2. Ultimately, prescriptive analytics helps you
5. The primary challenge with predictive make better decisions about what your
analytics is that the insights it generates next course of action should be. This can
are limited to the data First, that means involve any aspect of your business, such
that smaller or incomplete data sets will as increasing revenue, reducing customer
not yield predictions as accurate as larger churn, preventing fraud, and increasing
data sets might. efficiency.
6. Getting good business intelligence (BI) 3. Prescriptive analytics uses the data from a
from predictive analytics requires sufficient variety of sources - including statistics,
data, but what counts as "sufficient" machine learning. and data mining - to
depends on the industry, business, identify possible future outcomes and
audience, and the use case show the best option.
7. Additionally, the challenge of predictive 4. Prescriptive analytics is the most advanced
analytics being restricted to the data of the four types because it provides
simply means that even the best actionable insights instead of raw data.
algorithms with the biggest data sets can't This methodology is how you determine
weigh intangible or distinctly human what should happen, not just what could
factors. A sudden economic shift or even a happen. Using prescriptive analytics
change in the weather can affect enables you to not only envision future
spending, but a predictive analytics model outcomes but to understand why they will
cant account for those variables happen.
8. Examples of predictive analytics include: 5. Prescriptive analytics also can predict the
● Ecommerce businesses that use a effect of future decisions, including the
customer's browsing and ripple effects those decisions can have on
purchasing history to make product different parts of the business. And it does
recommendations. this in whatever order the decisions may
● Financial organizations that need occur.
help determining whether a 6. Prescriptive analytics is a complex
customer is likely to pay their credit process that involves many variables and
card bill on time tools like algorithms, machine learning.
● Marketers who analyze data to and big data. Proper data infrastructures
determine the likelihood that new need to be established or this type of
customers will respond favorably to analytics could be a challenge to manage.
a given campaign or product 7. The most common issue with prescriptive
offering. analytics is that it requires a lot of data to
produce useful results, but a large amount to ingest, transform, and store large
of data isn't always available. This type of volumes of data for analysis.
analytics could easily become inaccessible 4. Marketing Analyst: Marketing analysts
for most. analyze marketing data to measure the
8. Examples of prescriptive analytics effectiveness of marketing campaigns,
include: understand customer behavior, and
● Calculating client risk in the identify opportunities for targeting and
insurance industry to determine segmentation. They use data analytics
what plans and rates an account techniques to optimize marketing
should be offered. strategies and drive business growth.
● Discovering what features to 5. Quantitative Analyst (Quant): Quants use
include in a new product to ensure mathematical and statistical techniques to
its success in the market, possibly analyze financial data and develop
by analyzing data like customer quantitative models for trading, risk
surveys and market research to management, and investment strategies.
identify what features are most They often work in the finance industry
desirable for customers and and require strong analytical and
prospects. programming skills.
● Identifying tactics to optimize 6. Data Architect: Data architects design and
patient care in healthcare, like oversee the structure and organization of
assessing the risk for developing data systems and databases to ensure they
specific health problems in the meet the needs of an organization's data
future and targeting treatment analytics initiatives. They collaborate with
decisions to reduce those risks. data engineers and analysts to design data
● Implementing decision support models, schemas, and architectures that
systems, using optimization support data analysis and reporting.
algorithms, and recommending 7. Data Visualization Specialist: Data
courses of action based on visualization specialists design and create
predictive models to improve visual representations of data, such as
business processes or strategies. charts, graphs, and dashboards, to
communicate insights effectively to
JOBS RELATED TO DATA ANALYTICS
stakeholders. They use data visualization
1. Data Analyst: Data analysts are tools like Tableau. Power BI. and D3.is to
responsible for collecting, processing, and create interactive and informative
analyzing data to uncover insights and visualizations.
trends that can inform business decisions. 8. Machine Learning Engineer. Machine
They often work with databases, learning engineers develop and deploy
spreadsheets, statistical software, and data machine learning models and algorithms
visualization tools to analyze data and to solve complex problems and make
present findings to stakeholders predictions based on data. They work
2. Business Analyst: Business analysts focus closely with data scientists and software
on understanding business processes, engineers to build and optimize machine
identifying opportunities for improvement, learning pipelines and algorithms.
and making data-driven 9. Data Scientist: Data scientists use
recommendations to enhance business advanced statistical and machine learning
performance. They use data analysis techniques to analyze complex datasets
techniques to assess market trends, and extract valuable insights. They often
customer behavior, and operational work with big data technologies,
efficiency. programming languages like Python and
3. Data Engineer: Data engineers are R, and machine learning frameworks to
responsible for designing, building, and develop predictive models and algorithms
maintaining data pipelines and
infrastructure to support data analytics These are just a few examples of job roles related
initiatives. They work with big data to data analytics. The field of data analytics is
technologies like Hadoop, Spark, and Kafka constantly evolving, and new job roles and
specialties continue to emerge as organizations 4. Structured data is commonly stored in
increasingly rely on data-driven insights to inform data warehouses and unstructured data is
decision-making and drive innovation. stored in data lakes (storage). Both have
cloud-use potential, but structured data
3.4 STRUCTURED AND UNSTRUCTURED DATA allows for less storage space and
STRUCTURED AND UNSTRUCTURED DATA unstructured data requires more.
5. Structured Data is regarded as the most
Data can come in various variants, like structured traditional' type of data storage. This is
and unstructured data. Structured data is highly because the oldest implementations of
organized and formatted so that its easily relational DBMS were capable of storing,
searchable in relational databases. Unstructured processing, and accessing structured data.
data has no predefined format or organization, In RDBMS, fields store length-delimited
making it much more difficult to collect, process, data like phone numbers, Social Security
and analyze numbers, or ZIP codes, and records even
contain text strings of variable length like
STRUCTURED DATA UNSTRUCTURED DATA names, making it a simple matter to
Organized information Diverse structure for
information
search
Requires less storage Requires more storage 6. Structured data consists of clearly defined
Easier to manage and More difficult to manage data types with patterns that make them
protect with legacy and protect with legacy easily searchable, while unstructured
systems and solutions. systems and solutions.
data-'everything else"-is composed of data
Can be displayed in rows, Cannot be displayed in
columns and relational rows, columns and that is usually not as easily searchable,
databases. relational databases. including formats like audio, video, and
Estimated 20% of Estimated 80% of social media postings
enterprise data (Garter) enterprise data (Garter)
7. Structured data analytics is a mature
Numbers, Dates, Strings Images, audio, video, word
processing files, emails, text process and technology, whereas
files unstructured data analytics is a
Examples: ZIP codes, Examples: Text files, Email, developing industry with a lot of new
Phone numbers, Email Social media, Website,
investment in research and development.
addresses, ATM activity. Mobile data, Satellite
Inventory control, Student imagery. Scientific data, BENEFITS OF USING STRUCTURED DATA
fee payment databases, Digital surveillance and
Airline reservation and Sensor data. 1. Easy to Use
ticketing ● Business users who understand
what the subject matter of the data
STRUCTURED DATA is and how it is related to their
infrastructure can easily understand
1. Structured data is highly organized and
how to structure their data.
formatted so that it's easily searchable in
● Tools such as Excel or Google
relational databases.
Sheets make structured data easy,
2. Structured data is more finite and sorted
or more advanced users can lean
into data arrays, while unstructured data is
further into SQL or business
scattered and variable. Structured data
intelligence tools.
adheres to a predefined data model; thus,
2. Convenient Storage
they are easy to analyze
● Because structured data is
3. Structured data relies on the existence of a
organized, it is commonly stored in
data model-a specification for how data
data centers for easy access of the
can be organized, processed, and
data.
interpreted; they are easy to analyze.
● The data warehouses hold their
Structured data adheres to the table
own space for businesses that
format - the relationship between rows
choose to use it.
and columns. Excel files or SQL databases
3. Instant Usability
are two prominent examples of structured
● Structured data is organized,
data. Both (excel files and SQL databases)
making it easy for a company to
consists of structured rows and columns
find exactly what they are looking
which can be easily ordered and
for.
categorized
● With this method, a company can times, the ability to access and analyze
begin using the data instantly. unstructured data has expanded
DISADVANTAGE OF USING STRUCTURED DATA tremendously, with several emerging
technologies and software coming onto
1. Limitations On Use the market that can store different forms
● Due to the organization style of of unstructured data.
structured data, it is more difficult 6. Unstructured data has an internal
to have flexibility or varied use structure but is not structured via
cases. predefined data models or schema It may
● Structured data can only be used be textual or non-textual and
for its intended purpose. This limits human-generated or
its flexibility and use cases. machine-generated.
2. Limited Storage
● Structured data is stored in specific Human-generated unstructured data includes:
spaces of data warehouses. ● Text Files: Word processing, presentations,
● While accessing the data is easy, emails, and logs.
scalability can be difficult. ● Email: Message field, largely text, but has
● Changes within data warehouses some internal structure thanks to its
can become hard to manage. metadata (eg. including the visible 'to*,
● Using cloud data centers help with "from", "date / time", "subject entered to
the storage problems. send an email) but also mixes in
3. High Overhead unstructured data via the message body.
● Data centers or other storage for For this reason, email is also referred to as
structured data can become semi-structured data.
expensive and be part of the ● Social Media: Data from Facebook, Twitter,
structured data ordeal. and Linkedin.
● Any change in requirements means ● Websites: YouTube, Instagram, and photo
updating all of that structured data sharing sites.
to meet the new needs. This results ● Mobile Data: Text messages and locations.
in massive expenditure of ● Communications: Chat, IM, phone
resources. recordings, and collaboration software.
● Again, cloud data centers are ● Media: MP3, digital photos, audio
recommended, but the storage can recording and video files.
still require significant work to keep ● Business Applications: Microsoft Office
the data maintained properly. documents, PDFs and productivity
applications.
UNSTRUCTURED DATA
1. Unstructured data is data stored in its Machine-generated unstructured data includes:
native format and not processed until ● Satellite Imagery. Weather data,
used, which is known as schema-on-read. landforms, and military movements.
2. Unstructured data comes in a myriad of ● Scientific Data: Oil and gas exploration,
file formats, including email, social media space exploration, seismic imagery, and
posts, presentations, chats, loT sensor data, atmospheric data.
and satellite imagery. ● Digital Surveillance: Surveillance photos
3. Unstructured data are those data that and video, cctv Sensor Data: Traffic,
have no predetermined data model. weather, and oceanographic sensors.
4. Usually, unstructured data is text-heavy, BENEFITS OF UNSTRUCTURED DATA
but may also include data like dates,
1. Limitless Use
numbers, and statistics. This leads to
● Use cases for unstructured data are
inconsistencies and contradictions that
significantly larger than structured
make it hard to comprehend conventional
data due to its flexibility.
systems as opposed to data stored in
● From social media posts to
structured databases
scientific data, unstructured data
5. Unstructured data may include audio,
gives companies the flexibility to
video, or No-SQL databases. In recent
use the data how they want.
2. Greater Insights that scale data into records and fields in a
● When a company has more dataset. Common examples of
unstructured data than structured semi-structured data are JSON and XML
data, there is more data to work 3. Semi-structured data is more complex
with. than structured data but less complex
● Unstructured data may be difficult than unstructured data.
to analyze, but through processing, 4. Semi-structured data also relatively easier
a company can benefit from the to store than unstructured data, bridging
data the gap between the two data types. An
3. Low Overhead XML sitemap contains page information
● Because of the ability to store for a website. It embeds URLs, domain
unstructured data at data lakes, a scores, do-follow pages, and meta tags.
business can save money with how 5. Email is another common example of a
they choose to store the data semi-structured data type. Although more
DISADVANTAGES OF UNSTRUCTURED DATA advanced analysis tools are necessary for
thread tracking, near-dedupe, and concept
1. Hard To Analyze searching, email's native metadata enables
● If a company uses unstructured classification and keyword searching
data, it is more difficult to take the without any additional tools
raw data and analyze it despite its
flexibility. COMPARISON OF DATA VARIANTS
● Users require a proficient STRUCTURED SEMI-STRUCTURED UNSTRUCTURED
background in data science and Structured in a Some degree of No predefined
spreadsheet like organizational organizational
machine learning to prepare, What is it?
manner (e.g. in a structure form and no
table specific format
analyze and integrate it with Think of a Think of a Text file Essentially
machine learning algorithms spreadsheet with text that has anything that is
To put it simply (excel) or data in a
some structure not structured or
2. Data Analytic Tools tabular form (header, semi-structured
paragraphs, etc. data (a lot)
● Unstructured data cannot be Excel, spreadsheet, HTML files, Images (jpg png),
Comma separated JavaScript Object Videos (mp4 avi),
managed by business tools values (.csv), Notation (JSON) Sound files (mp3
● Its inconsistent nature makes it Example Formats Relational files, Extensible wav). Plain text
database tables Markup Language files
more difficult than structured data. (XML) files Word files/PDF
files
● Currently, there aren't many tools •Within the table, •Tags or other •Data can take any
entries have the markers separate form and thus be
that can manipulate unstructured same format and elements and stored as any kind
data apart from cloud commodity predefined length enforce of file (formless).
and follow the hierarchies, but • within that file,
servers and open-source NoSQL same order. the size of there is no
•Is easily elements can vary structure of
DBMS machine-readable and their order is content.
and can therefore not important. • Typically needs a
3. Numerous Formats be analyzed •Needs some major
● Unstructured data comes in many Characteristics
without major
processing of the
preprocessing
before it can be
pre-processing
before it can be
different forms, such as medical data. analyzed by a analyzed by a
•It is commonly computer. computer, but
records, social media posts, and said that around • Has gained often easily
20% of the world's importance with consumable for
emails data is structured. the emergence of humans (e.g
the World Wide pictures, videos,
● This information may be Web. plain texts)
challenging with analysis. • Most of the data
created today is
4. Less Secured unstructured.