0% found this document useful (0 votes)
43 views3 pages

Glossary of Problem & Approach

The document defines key terms used in data science and analytics. It provides an alphabetized glossary with over 30 terms and their meanings, including analytic approach, clustering, cohort study, CRISP-DM, data cleansing, data science, decision tree, machine learning, mean, median, model, predictors, and text analysis.

Uploaded by

Aashish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views3 pages

Glossary of Problem & Approach

The document defines key terms used in data science and analytics. It provides an alphabetized glossary with over 30 terms and their meanings, including analytic approach, clustering, cohort study, CRISP-DM, data cleansing, data science, decision tree, machine learning, mean, median, model, predictors, and text analysis.

Uploaded by

Aashish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

03/03/2024, 10:28 about:blank

Glossary
Module 1 Lesson 1: From Problem to Approach

Welcome! This alphabetized glossary contains many of the terms you'll find within this lesson. These terms
are important for you to recognize when working in the industry, when participating in user groups, and when
participating in other certificate programs.

Term Definition
The process of selecting the appropriate method or path to address a specific data
Analytic Approach
science question or problem.
The systematic analysis of data using statistical, mathematical, and computational
Analytics
techniques to uncover insights, patterns, and trends.
Business The initial phase of data science methodology involves seeking clarification and
Understanding understanding the goals, objectives, and requirements of a given task or problem.
Clustering An approach used to learn about human behavior and identify patterns and
Association associations in data.
A group of individuals who share a common characteristic or experience is studied or
Cohort
analyzed as a unit.
An observational study where a group of individuals with a specific characteristic or
Cohort study exposure is followed over time to determine the incidence of outcomes or the
relationship between exposures and outcomes.
A chronic condition in which the heart cannot pump enough blood to meet the body's
Congestive Heart
needs, resulting in fluid buildup and symptoms such as shortness of breath and
Failure (CHF)
fatigue.
Cross-Industry Standard Process for Data Mining is a widely used methodology for
CRISP-DM data mining and analytics projects encompassing six phases: business understanding,
data understanding, data preparation, modeling, evaluation, and deployment.
The process of inspecting, cleaning, transforming, and modeling data to discover
Data analysis
useful information, draw conclusions, and support decision-making.
The process of identifying and correcting or removing errors, inconsistencies, or
Data cleansing
inaccuracies in a dataset to improve its quality and reliability
An interdisciplinary field that combines scientific methods, processes, algorithms,
Data science
and systems to extract knowledge and insights from structured and unstructured data.
Data science A structured approach to solving business problems using data analysis and data-
methodology driven insights.
A professional using scientific methods, algorithms, and tools to analyze data, extract
Data scientist
insights, and develop models or solutions to complex business problems.
Professionals with data science and analytics expertise who apply their skills to solve
Data scientists
business problems.
Data-Driven
Insights derived from analyzing and interpreting data to inform decision-making
Insights
A supervised machine learning algorithm that uses a tree-like structure of decisions
Decision tree
and their possible consequences to make predictions or classify instances.
Decision Tree A model that uses a tree-like structure to classify data based on conditions and
Classification Model thresholds provides predicted outcomes and associated probabilities.
about:blank 1/3
03/03/2024, 10:28 about:blank

Term Definition
Decision Tree A classification model that uses a decision tree to determine outcomes based on
Classifier specific conditions and thresholds.
A model used to review scenarios and identify relationships in data, such as the
Decision-Tree Model
reasons for patient readmissions
Descriptive An approach used to show relationships and identify clusters of similar activities
approach based on events and preferences
Descriptive Modeling technique that focuses on describing and summarizing data, often through
modeling statistical analysis and visualization, without making predictions or inferences
Expertise and understanding of a specific subject area or field, including its concepts,
Domain knowledge
principles, and relevant data
The sought-after outcomes and specific objectives that support the overall goal of the
Goals and objectives
task or problem.
A single cycle or repetition of a process often involves refining or modifying a
Iteration
solution based on feedback or new information.
A process that involves repeating a series of steps or actions to refine and improve a
Iterative process
solution or analysis. Each iteration builds upon the previous one.
Leaf The final nodes of a decision tree where data is categorized into specific outcomes.
A field of study that enables computers to learn from data without being explicitly
Machine Learning
programmed, identifying hidden relationships and trends.
The average value of a set of numbers is calculated by summing all the values and
Mean
dividing by the total number of values.
When arranged in ascending or descending order, the middle value in a set of
Median
numbers divides the data into two equal halves.
Model (Conceptual A simplified representation or abstraction of a real-world system or phenomenon used
model) to understand, analyze, or predict its behavior.
The process of developing predictive models to gain insights and make informed
Model building
decisions based on data analysis.
Pairwise
A statistical technique that measures the strength and direction of the linear
comparison
relationship between two variables by calculating a correlation coefficient.
(correlation)
A recurring or noticeable arrangement or sequence in data can provide insights or be
Pattern
used for prediction or classification.
A model used to determine probabilities of an action or outcome based on historical
Predictive model
data.
Variables or features in a model that are used to predict or explain the outcome
Predictors
variable or target variable.
The process of organizing objectives and tasks based on their importance and impact
Prioritization
on the overall goal.
The process of addressing challenges and finding solutions to achieve desired
Problem solving
outcomes.
Individuals or groups with a vested interest in the data science model's outcome and
Stakeholders its practical application, such as solution owners, marketing, application developers,
and IT administration.
A measure of the dispersion or variability of a set of values from their mean; It
Standard deviation
provides information about the spread or distribution of the data.
Stand deviations are applied to problems that require counts, such as yes/no answers
Statistical analysis
or classification tasks.
The collection, analysis, interpretation, presentation, and organization of data to
Statistics
understand patterns, relationships, and variability in the data.
Structured data Data organized and formatted according to a predefined schema or model and is
(data model) typically stored in databases or spreadsheets.
about:blank 2/3
03/03/2024, 10:28 about:blank

Term Definition
The process of extracting useful information or knowledge from unstructured textual
Text analysis data
data through techniques such as natural language processing, text mining, and
mining
sentiment analysis.
Threshold value The specific value used to split data into groups or categories in a decision tree.

Author(s)
Dr. Pooja
Patsy Kravitz

Changelog
Date Version Changed by Change Description
2023-08-03 0.1 Patsy Kravitz Initial version created

© IBM Corporation 2023. All rights reserved.

about:blank 3/3

You might also like