Curso Data Analis
Curso Data Analis
solve problems
Analytical thinking: The process of identifying and defining a problem, then
solving it by using data in an organized, step-by-step manner
Context: The condition in which something exists or happens
Data: A collection of facts
Data analysis: The collection, transformation, and organization of data in order to
draw conclusions, make predictions, and drive informed decision-making
Data analyst: Someone who collects, transforms, and organizes data in order to draw
conclusions, make predictions, and drive informed decision-making
Data analytics: The science of data
Data design: How information is organized
Data-driven decision-making: Using facts to guide business strategy
Data ecosystem: The various elements that interact with one another in order to
produce, manage, store, organize, analyze, and share data
Data science: A field of study that uses raw data to create new ways of modeling
and understanding the unknown
Data strategy: The management of the people, processes, and tools used in data
analysis
Data visualization: The graphical representation of data
Dataset: A collection of data that can be manipulated or analyzed as one unit
Gap analysis: A method for examining and evaluating the current state of a process
in order to identify opportunities for improvement in the future
Root cause: The reason why a problem occurs
Technical mindset: The ability to break things down into smaller steps or pieces
and work with them in an orderly and logical way
Visualization: (Refer to data visualization)
Making predictions
A company that wants to know the best advertising method to bring in new customers
is an example of a problem requiring analysts to make predictions. Analysts with
data on location, type of media, and number of new customers acquired as a result
of past ads can't guarantee future results, but they can help predict the best
placement of advertising to reach the target audience.
Categorizing things
An example of a problem requiring analysts to categorize things is a company's goal
to improve customer satisfaction. Analysts might classify customer service calls
based on certain keywords or scores. This could help identify top-performing
customer service representatives or help correlate certain actions taken with
higher customer satisfaction scores.
Spotting something unusual
A company that sells smart watches that help people monitor their health would be
interested in designing their software to spot something unusual. Analysts who have
analyzed aggregated health data can help product developers determine the right
algorithms to spot and set off alarms when certain data doesn't trend normally.
Identifying themes
User experience (UX) designers might rely on analysts to analyze user interaction
data. Similar to problems that require analysts to categorize things, usability
improvement projects might require analysts to identify themes to help prioritize
the right product features for improvement. Themes are most often used to help
researchers explore certain aspects of data. In a user study, user beliefs,
practices, and needs are examples of themes.
By now you might be wondering if there is a difference between categorizing things
and identifying themes. The best way to think about it is: categorizing things
involves assigning items to categories; identifying themes takes those categories a
step further by grouping them into broader themes.
Discovering connections
A third-party logistics company working with another company to get shipments
delivered to customers on time is a problem requiring analysts to discover
connections. By analyzing the wait times at shipping hubs, analysts can determine
the appropriate schedule changes to increase the number of on-time deliveries.
Finding pattern
Minimizing downtime caused by machine failure is an example of a problem requiring
analysts to find patterns in data.
Specific:
Is the question specific? Does it address the problem? Does it have context? Will
it uncover a lot of the information you need?
Measurable: Will the question give you answers that you can measure?
Action-oriented: Will the answers provide information that helps you devise some
type of plan?
Relevant: Is the question about the particular problem you are trying to solve?
Time-bound: Are the answers relevant to the specific time being studied?
Action-oriented question: A question whose answers lead to change
Cloud: A place to keep data online, rather than a computer hard drive
Data analysis process: The six phases of ask, prepare, process, analyze, share, and
act whose purpose is to gain insights that drive informed decision-making
Data life cycle: The sequence of stages that data experiences, which include plan,
capture, manage, analyze, archive, and destroy
Leading question: A question that steers people toward a certain response
Measurable question: A question whose answers can be quantified and assessed
Problem types: The various problems that data analysts encounter, including
categorizing things, discovering connections, finding patterns, identifying themes,
making predictions, and spotting something unusual
Relevant question: A question that has significance to the problem to be solved
SMART methodology: A tool for determining a question’s effectiveness based on
whether it is specific, measurable, action-oriented, relevant, and time-bound
Specific question: A question that is simple, significant, and focused on a single
topic or a few closely related ideas
Structured thinking: The process of recognizing the current problem or situation,
organizing available information, revealing gaps and opportunities, and identifying
options
Time-bound question: A question that specifies a timeframe to be studied
Unfair question: A question that makes assumptions or is difficult to answer
honestly
Algorithm: A process or set of rules followed for a specific task
Big data: Large, complex datasets typically involving long periods of time, which
enable data analysts to address far-reaching business problems
Dashboard: A tool that monitors live, incoming data
Data-inspired decision-making: The process of exploring different data sources to
find out what they have in common
Metric: A single, quantifiable type of data that is used for measurement
Metric goal: A measurable goal set by a company and evaluated using metrics
Pivot chart: A chart created from the fields in a pivot table
Pivot table: A data summarization tool used to sort, reorganize, group, count,
total, or average data
Problem types: The various problems that data analysts encounter, including
categorizing things, discovering connections, finding patterns, identifying themes,
making predictions, and spotting something unusual
Qualitative data: A subjective and explanatory measure of a quality or
characteristic
Quantitative data: A specific and objective measure, such as a number, quantity, or
range
Report: A static collection of data periodically given to stakeholders
Return on investment (ROI): A formula that uses the metrics of investment and
profit to evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services
Small data: Small, specific data points typically involving a short period of time,
which are useful for making day-to-day decisions
AVERAGE: A spreadsheet function that returns an average of the values from a
selected range
Borders: Lines that can be added around two or more cells on a spreadsheet
Cell reference: A cell or a range of cells in a worksheet typically used in
formulas and functions
COUNT: A spreadsheet function that counts the number of cells in a range that meet
a specific criteria
Equation: A calculation that involves addition, subtraction, multiplication, or
division (also called a math expression)
Fill handle: A box in the lower-right-hand corner of a selected spreadsheet cell
that can be dragged through neighboring cells in order to continue an instruction
Filtering: The process of showing only the data that meets a specified criteria
while hiding the rest
Header: The first row in a spreadsheet that labels the type of data in each column
Math expression: A calculation that involves addition, subtraction, multiplication,
or division (also called an equation)
Math function: A function that is used as part of a mathematical formula
MAX: A spreadsheet function that returns the largest numeric value from a range of
cells
MIN: A spreadsheet function that returns the smallest numeric value from a range of
cells
Open data: Data that is available to the public
Operator: A symbol that names the operation or calculation to be performed
Order of operations: Using parentheses to group together spreadsheet values in
order to clarify the order in which operations should be performed
Problem domain: The area of analysis that encompasses every activity affecting or
affected by a problem
Range: A collection of two or more cells in a spreadsheet
Report: A static collection of data periodically given to stakeholders
Return on investment (ROI): A formula that uses the metrics of investment and
profit to evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services
Scope of work (SOW): An agreed-upon outline of the tasks to be performed during a
project
Sorting: The process of arranging data into a meaningful order to make it easier to
understand, analyze, and visualize
SUM: A spreadsheet function that adds the values of a selected range of cells
Cloud: A place to keep data online, rather than a computer hard drive
Reframing: Restating a problem or challenge, then redirecting it toward a potential
resolutio
Turnover rate: The rate at which employees voluntarily leave a company
Completa
Agenda: A list of scheduled appointments
Audio file: Digitized audio storage usually in an MP3, AAC, or other compressed
format
Boolean data: A data type with only two possible values, usually true or false
Continuous data: Data that is measured and can have almost any numeric value
Cookie: A small file stored on a computer that contains information about its users
Data element: A piece of information in a dataset
Data model: A tool for organizing data elements and how they relate to one another
Digital photo: An electronic or computer-based image usually in BMP or JPG format
Discrete data: Data that is counted and has a limited number of values
External data: Data that lives, and is generated, outside of an organization
Field: A single piece of information from a row or column of a spreadsheet; in a
data table, typically a column in the table
First-party data: Data collected by an individual or group using their own
resources
Long data: A dataset in which each row is one time point per subject, so each
subject has data in multiple rows
Nominal data: A type of qualitative data that is categorized without a set order
Ordinal data: Qualitative data with a set order or scale
Ownership: The aspect of data ethics that presumes individuals own the raw data
they provide and have primary control over its usage, processing, and sharing
Pixel: In digital imaging, a small area of illumination on a display screen that,
when combined with other adjacent areas, forms a digital image
Population: In data analytics, all possible data values in a dataset
Record: A collection of related data in a data table, usually synonymous with row
Sample: In data analytics, a segment of a population that is representative of the
entire population
Second-party data: Data collected by a group directly from its audience and then
sold
Social media: Websites and applications through which users create and share
content or participate in social networking
String data type: A sequence of characters and punctuation that contains textual
information (Refer to Text data type)
Structured data: Data organized in a certain format such as rows and columns
Text data type: A sequence of characters and punctuation that contains textual
information (also called string data type)
United States Census Bureau: An agency in the U.S. Department of Commerce that
serves as the nation’s leading provider of quality data about its people and
economy
Unstructured data: Data that is not organized in any easily identifiable manner
Video file: A collection of images, audio files, and other data usually encoded in
a compressed format such as MP4, MV4, MOV, AVI, or FLV
Wide data: A dataset in which every data subject has a single row with multiple
columns to hold the values of various attributes of the subject
Third-party data is collected by an entity that doesn’t have a direct relationship
with the data. Personal identifiable information (PII) is data that is reasonably
likely to identify a person and make information known about them. It is important
to keep this data safe.
Bad data source: A data source that is not reliable, original, comprehensive,
current, and cited (ROCCC)
Bias: A conscious or subconscious preference in favor of or against a person, group
of people, or thi
Confirmation bias: The tendency to search for or interpret information in a way
that confirms pre-existing beliefs
Consent: The aspect of data ethics that presumes an individual’s right to know how
and why their personal data will be used before agreeing to provide it
Cookie: A small file stored on a computer that contains information about its users
Currency: The aspect of data ethics that presumes individuals should be aware of
financial transactions resulting from the use of their personal data and the scale
of those transactions
Data anonymization: The process of protecting people's private or sensitive data by
eliminating identifying information
Data bias: When a preference in favor of or against a person, group of people, or
thing systematically skews data analysis results in a certain direction
Data ethics: Well-founded standards of right and wrong that dictate how data is
collected, shared, and used
Data interoperability: A key factor leading to the successful use of open data
among companies and governments
Data privacy: Preserving a data subject’s information any time a data transaction
occurs
Ethics: Well-founded standards of right and wrong that prescribe what humans ought
to do, usually in terms of rights, obligations, benefits to society, fairness, or
specific virtues
Experimenter bias: The tendency for different people to observe things differently
(also called observer bias)
Fairness: A quality of data analysis that does not create or reinforce bias
First-party data: Data collected by an individual or group using their own
resources
General Data Protection Regulation of the European Union (GDPR): Policy-making body
in the European Union created to help protect people and their data
Good data source: A data source that is reliable, original, comprehensive, current,
and cited (ROCCC)
Interpretation bias: The tendency to interpret ambiguous situations in a positive
or negative way
Observer bias: The tendency for different people to observe things differently
(also called experimenter bias)
Open data: Data that is available to the public
Openness: The aspect of data ethics that promotes the free access, usage, and
sharing of data
Sampling bias: Overrepresenting or underrepresenting certain members of a
population as a result of working with a sample that is not representative of the
population as a whole
Transaction transparency: The aspect of data ethics that presumes all data-
processing activities and algorithms should be explainable and understood by the
individual who provides the data
Unbiased sampling: When the sample of the population being measured is
representative of the population as a whole
File or document type: What type of file or document are you examining?
Date, time, and creator: When was it created? Who created it? When was it last
modified?
Title and description: What is the name of the item you are examining? What type of
content does it contain?
Geolocation: If you’re examining a photo, where was it taken?
Tags and categories: What is the general overview of the item that you have? Is it
indexed or described in a specific way?
Who last modified it and when: Were any changes made to the file? If yes, when were
the most recent modifications made?
Who can access or update it: If you’re examining a dataset, is it public? Are
special permissions needed to customize or modify it?
SELECT is the section of a query that indicates what data you want SQL to return to
you.
FROM is the section of a query that indicates which table the desired data comes
from. You must provide a full path to the table. The path includes the project
name, database name, and table name, each separated by a period.
WHERE is the section of a query that indicates any filters you’d like to apply to
your tabl
Administrative metadata: Metadata that indicates the technical source of a digital
asset
CSV (comma-separated values) file: A delimited text file that uses a comma to
separate values
Data governance: A process for ensuring the formal management of a company’s data
assets
Descriptive metadata: Metadata that describes a piece of data and can be used to
identify it at a later point in time
Foreign key: A field within a database table that is a primary key in another table
(Refer to primary key)
FROM: The section of a query that indicates where the selected data comes from
Geolocation: The geographical location of a person or device by means of digital
information
Metadata: Data about data
Metadata repository: A database created to store metadata
Naming conventions: Consistent guidelines that describe the content, creation date,
and version of a file in its name
Normalized database: A database in which only related data is stored in each table
Notebook: An interactive, editable programming environment for creating data
reports and showcasing data skills
Primary key: An identifier in a database that references a column in which each
value is unique (Refer to foreign key)
Redundancy: When the same piece of data is stored in two or more places
Schema: A way of describing how something, such as data, is organized
SELECT: The section of a query that indicates the subset of a dataset
Structural metadata: Metadata that indicates how a piece of data is organized and
whether it is part of one or more than one data collection
WHERE: The section of a query that specifies criteria that the requested data must
meet
World Health Organization: An organization whose primary role is to direct and
coordinate international health within the United Nations system