0% found this document useful (0 votes)
51 views7 pages

Curso Data Analis

The document outlines key concepts related to data analysis, including analytical skills, data-driven decision-making, and the data life cycle. It describes various roles in data analytics, the phases of data analysis, and the importance of data ethics. Additionally, it covers different types of data and methodologies for effective data analysis and problem-solving.

Uploaded by

isaac gomez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views7 pages

Curso Data Analis

The document outlines key concepts related to data analysis, including analytical skills, data-driven decision-making, and the data life cycle. It describes various roles in data analytics, the phases of data analysis, and the importance of data ethics. Additionally, it covers different types of data and methodologies for effective data analysis and problem-solving.

Uploaded by

isaac gomez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 7

Analytical skills: Qualities and characteristics associated with using facts to

solve problems
Analytical thinking: The process of identifying and defining a problem, then
solving it by using data in an organized, step-by-step manner
Context: The condition in which something exists or happens
Data: A collection of facts
Data analysis: The collection, transformation, and organization of data in order to
draw conclusions, make predictions, and drive informed decision-making
Data analyst: Someone who collects, transforms, and organizes data in order to draw
conclusions, make predictions, and drive informed decision-making
Data analytics: The science of data
Data design: How information is organized
Data-driven decision-making: Using facts to guide business strategy
Data ecosystem: The various elements that interact with one another in order to
produce, manage, store, organize, analyze, and share data
Data science: A field of study that uses raw data to create new ways of modeling
and understanding the unknown
Data strategy: The management of the people, processes, and tools used in data
analysis
Data visualization: The graphical representation of data
Dataset: A collection of data that can be manipulated or analyzed as one unit
Gap analysis: A method for examining and evaluating the current state of a process
in order to identify opportunities for improvement in the future
Root cause: The reason why a problem occurs
Technical mindset: The ability to break things down into smaller steps or pieces
and work with them in an orderly and logical way
Visualization: (Refer to data visualization)

Variations of the data life cycle


Plan: Decide what kind of data is needed, how it will be managed, and who will be
responsible for it.
Capture: Collect or bring in data from a variety of different sources.
Manage: Care for and maintain the data. This includes determining how and where it
is stored and the tools used to do so.
Analyze: Use the data to solve problems, make decisions, and support business
goals.
Archive: Keep relevant data stored for long-term and future reference.
Destroy: Remove data from storage and delete any shared copies of the data.

The ask phase


At the start of any successful data analysis, the data analyst:
Takes the time to fully understand stakeholder expectations
Defines the problem to be solved
Decides which questions to answer in order to solve the problem

The prepare phase


In the prepare phase, the emphasis is on identifying and locating data you can use
to answer your questions

The process phase


In this phase, the aim is to refine the data. Data analysts find and eliminate any
errors and inaccuracies that can get in the way of results. This usually means:
Cleaning data
Transforming data into a more useful format
Combining two or more datasets to make information more complete
Removing outliers (data points that could skew the information)

Database: A collection of data stored in a computer system


Formula: A set of instructions used to perform a calculation using the data in a
spreadsheet
Function: A preset command that automatically performs a specified process or task
using the data in a spreadsheet
Query: A request for data or information from a database
Query language: A computer programming language used to communicate with a database
Stakeholders: People who invest time and resources into a project and are
interested in its outcome
Structured Query Language: A computer programming language used to communicate with
a database
Spreadsheet: A digital worksheet
SQL: (Refer to Structured Query Language)

The syntax of every SQL query is the same:


Use SELECT to choose the columns you want to return.
Use FROM to choose the tables where the columns you want are located.
Use WHERE to filter for certain information.

Attribute: A characteristic or quality of data used to label a column in a table


Function: A preset command that automatically performs a specified process or task
using the data in a spreadsheet
Observation: The attributes that describe a piece of data contained in a row of a
table
Oversampling: The process of increasing the sample size of nondominant groups in a
population. This can help you better represent them and address imbalanced datasets

Self-reporting: A data collection technique where participants provide information


about themselves

Business analyst—analyzes data to help businesses improve processes, products, or


services
Data analytics consultant—analyzes the systems and models for using data
Data engineer—prepares and integrates data from different sources for analytical
use
Data scientist—uses expert skills in technology and social science to find trends
through data analysis
Data specialist—organizes or converts data for use in databases or software systems
Operations analyst—analyzes data to assess the performance of business operations
and workflows

Marketing analyst—analyzes market conditions to assess the potential sales of


products and services
HR/payroll analyst—analyzes payroll data for inefficiencies and errors
Financial analyst—analyzes financial status by collecting, monitoring, and
reviewing data
Risk analyst—analyzes financial documents, economic conditions, and client data to
help companies determine the level of risk involved in making a particular business
decision
Healthcare analyst—analyzes medical data to improve the business aspect of
hospitals and medical facilities
Business task: The question or problem data analysis resolves for a business

Making predictions
A company that wants to know the best advertising method to bring in new customers
is an example of a problem requiring analysts to make predictions. Analysts with
data on location, type of media, and number of new customers acquired as a result
of past ads can't guarantee future results, but they can help predict the best
placement of advertising to reach the target audience.
Categorizing things
An example of a problem requiring analysts to categorize things is a company's goal
to improve customer satisfaction. Analysts might classify customer service calls
based on certain keywords or scores. This could help identify top-performing
customer service representatives or help correlate certain actions taken with
higher customer satisfaction scores.
Spotting something unusual
A company that sells smart watches that help people monitor their health would be
interested in designing their software to spot something unusual. Analysts who have
analyzed aggregated health data can help product developers determine the right
algorithms to spot and set off alarms when certain data doesn't trend normally.
Identifying themes
User experience (UX) designers might rely on analysts to analyze user interaction
data. Similar to problems that require analysts to categorize things, usability
improvement projects might require analysts to identify themes to help prioritize
the right product features for improvement. Themes are most often used to help
researchers explore certain aspects of data. In a user study, user beliefs,
practices, and needs are examples of themes.
By now you might be wondering if there is a difference between categorizing things
and identifying themes. The best way to think about it is: categorizing things
involves assigning items to categories; identifying themes takes those categories a
step further by grouping them into broader themes.
Discovering connections
A third-party logistics company working with another company to get shipments
delivered to customers on time is a problem requiring analysts to discover
connections. By analyzing the wait times at shipping hubs, analysts can determine
the appropriate schedule changes to increase the number of on-time deliveries.
Finding pattern
Minimizing downtime caused by machine failure is an example of a problem requiring
analysts to find patterns in data.

Specific:
Is the question specific? Does it address the problem? Does it have context? Will
it uncover a lot of the information you need?
Measurable: Will the question give you answers that you can measure?
Action-oriented: Will the answers provide information that helps you devise some
type of plan?
Relevant: Is the question about the particular problem you are trying to solve?
Time-bound: Are the answers relevant to the specific time being studied?
Action-oriented question: A question whose answers lead to change
Cloud: A place to keep data online, rather than a computer hard drive
Data analysis process: The six phases of ask, prepare, process, analyze, share, and
act whose purpose is to gain insights that drive informed decision-making
Data life cycle: The sequence of stages that data experiences, which include plan,
capture, manage, analyze, archive, and destroy
Leading question: A question that steers people toward a certain response
Measurable question: A question whose answers can be quantified and assessed
Problem types: The various problems that data analysts encounter, including
categorizing things, discovering connections, finding patterns, identifying themes,
making predictions, and spotting something unusual
Relevant question: A question that has significance to the problem to be solved
SMART methodology: A tool for determining a question’s effectiveness based on
whether it is specific, measurable, action-oriented, relevant, and time-bound
Specific question: A question that is simple, significant, and focused on a single
topic or a few closely related ideas
Structured thinking: The process of recognizing the current problem or situation,
organizing available information, revealing gaps and opportunities, and identifying
options
Time-bound question: A question that specifies a timeframe to be studied
Unfair question: A question that makes assumptions or is difficult to answer
honestly
Algorithm: A process or set of rules followed for a specific task
Big data: Large, complex datasets typically involving long periods of time, which
enable data analysts to address far-reaching business problems
Dashboard: A tool that monitors live, incoming data
Data-inspired decision-making: The process of exploring different data sources to
find out what they have in common
Metric: A single, quantifiable type of data that is used for measurement
Metric goal: A measurable goal set by a company and evaluated using metrics
Pivot chart: A chart created from the fields in a pivot table
Pivot table: A data summarization tool used to sort, reorganize, group, count,
total, or average data
Problem types: The various problems that data analysts encounter, including
categorizing things, discovering connections, finding patterns, identifying themes,
making predictions, and spotting something unusual
Qualitative data: A subjective and explanatory measure of a quality or
characteristic
Quantitative data: A specific and objective measure, such as a number, quantity, or
range
Report: A static collection of data periodically given to stakeholders
Return on investment (ROI): A formula that uses the metrics of investment and
profit to evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services

Small data: Small, specific data points typically involving a short period of time,
which are useful for making day-to-day decisions
AVERAGE: A spreadsheet function that returns an average of the values from a
selected range
Borders: Lines that can be added around two or more cells on a spreadsheet
Cell reference: A cell or a range of cells in a worksheet typically used in
formulas and functions
COUNT: A spreadsheet function that counts the number of cells in a range that meet
a specific criteria
Equation: A calculation that involves addition, subtraction, multiplication, or
division (also called a math expression)
Fill handle: A box in the lower-right-hand corner of a selected spreadsheet cell
that can be dragged through neighboring cells in order to continue an instruction
Filtering: The process of showing only the data that meets a specified criteria
while hiding the rest
Header: The first row in a spreadsheet that labels the type of data in each column
Math expression: A calculation that involves addition, subtraction, multiplication,
or division (also called an equation)
Math function: A function that is used as part of a mathematical formula
MAX: A spreadsheet function that returns the largest numeric value from a range of
cells
MIN: A spreadsheet function that returns the smallest numeric value from a range of
cells
Open data: Data that is available to the public
Operator: A symbol that names the operation or calculation to be performed
Order of operations: Using parentheses to group together spreadsheet values in
order to clarify the order in which operations should be performed
Problem domain: The area of analysis that encompasses every activity affecting or
affected by a problem
Range: A collection of two or more cells in a spreadsheet
Report: A static collection of data periodically given to stakeholders
Return on investment (ROI): A formula that uses the metrics of investment and
profit to evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services
Scope of work (SOW): An agreed-upon outline of the tasks to be performed during a
project
Sorting: The process of arranging data into a meaningful order to make it easier to
understand, analyze, and visualize
SUM: A spreadsheet function that adds the values of a selected range of cells

Cloud: A place to keep data online, rather than a computer hard drive
Reframing: Restating a problem or challenge, then redirecting it toward a potential
resolutio
Turnover rate: The rate at which employees voluntarily leave a company
Completa
Agenda: A list of scheduled appointments
Audio file: Digitized audio storage usually in an MP3, AAC, or other compressed
format
Boolean data: A data type with only two possible values, usually true or false
Continuous data: Data that is measured and can have almost any numeric value
Cookie: A small file stored on a computer that contains information about its users
Data element: A piece of information in a dataset
Data model: A tool for organizing data elements and how they relate to one another
Digital photo: An electronic or computer-based image usually in BMP or JPG format
Discrete data: Data that is counted and has a limited number of values
External data: Data that lives, and is generated, outside of an organization
Field: A single piece of information from a row or column of a spreadsheet; in a
data table, typically a column in the table
First-party data: Data collected by an individual or group using their own
resources
Long data: A dataset in which each row is one time point per subject, so each
subject has data in multiple rows
Nominal data: A type of qualitative data that is categorized without a set order
Ordinal data: Qualitative data with a set order or scale
Ownership: The aspect of data ethics that presumes individuals own the raw data
they provide and have primary control over its usage, processing, and sharing
Pixel: In digital imaging, a small area of illumination on a display screen that,
when combined with other adjacent areas, forms a digital image
Population: In data analytics, all possible data values in a dataset
Record: A collection of related data in a data table, usually synonymous with row
Sample: In data analytics, a segment of a population that is representative of the
entire population
Second-party data: Data collected by a group directly from its audience and then
sold
Social media: Websites and applications through which users create and share
content or participate in social networking
String data type: A sequence of characters and punctuation that contains textual
information (Refer to Text data type)
Structured data: Data organized in a certain format such as rows and columns
Text data type: A sequence of characters and punctuation that contains textual
information (also called string data type)
United States Census Bureau: An agency in the U.S. Department of Commerce that
serves as the nation’s leading provider of quality data about its people and
economy
Unstructured data: Data that is not organized in any easily identifiable manner
Video file: A collection of images, audio files, and other data usually encoded in
a compressed format such as MP4, MV4, MOV, AVI, or FLV
Wide data: A dataset in which every data subject has a single row with multiple
columns to hold the values of various attributes of the subject
Third-party data is collected by an entity that doesn’t have a direct relationship
with the data. Personal identifiable information (PII) is data that is reasonably
likely to identify a person and make information known about them. It is important
to keep this data safe.
Bad data source: A data source that is not reliable, original, comprehensive,
current, and cited (ROCCC)
Bias: A conscious or subconscious preference in favor of or against a person, group
of people, or thi
Confirmation bias: The tendency to search for or interpret information in a way
that confirms pre-existing beliefs
Consent: The aspect of data ethics that presumes an individual’s right to know how
and why their personal data will be used before agreeing to provide it
Cookie: A small file stored on a computer that contains information about its users
Currency: The aspect of data ethics that presumes individuals should be aware of
financial transactions resulting from the use of their personal data and the scale
of those transactions
Data anonymization: The process of protecting people's private or sensitive data by
eliminating identifying information
Data bias: When a preference in favor of or against a person, group of people, or
thing systematically skews data analysis results in a certain direction
Data ethics: Well-founded standards of right and wrong that dictate how data is
collected, shared, and used
Data interoperability: A key factor leading to the successful use of open data
among companies and governments
Data privacy: Preserving a data subject’s information any time a data transaction
occurs
Ethics: Well-founded standards of right and wrong that prescribe what humans ought
to do, usually in terms of rights, obligations, benefits to society, fairness, or
specific virtues
Experimenter bias: The tendency for different people to observe things differently
(also called observer bias)
Fairness: A quality of data analysis that does not create or reinforce bias
First-party data: Data collected by an individual or group using their own
resources
General Data Protection Regulation of the European Union (GDPR): Policy-making body
in the European Union created to help protect people and their data
Good data source: A data source that is reliable, original, comprehensive, current,
and cited (ROCCC)
Interpretation bias: The tendency to interpret ambiguous situations in a positive
or negative way
Observer bias: The tendency for different people to observe things differently
(also called experimenter bias)
Open data: Data that is available to the public
Openness: The aspect of data ethics that promotes the free access, usage, and
sharing of data
Sampling bias: Overrepresenting or underrepresenting certain members of a
population as a result of working with a sample that is not representative of the
population as a whole
Transaction transparency: The aspect of data ethics that presumes all data-
processing activities and algorithms should be explainable and understood by the
individual who provides the data
Unbiased sampling: When the sample of the population being measured is
representative of the population as a whole

File or document type: What type of file or document are you examining?
Date, time, and creator: When was it created? Who created it? When was it last
modified?
Title and description: What is the name of the item you are examining? What type of
content does it contain?
Geolocation: If you’re examining a photo, where was it taken?
Tags and categories: What is the general overview of the item that you have? Is it
indexed or described in a specific way?
Who last modified it and when: Were any changes made to the file? If yes, when were
the most recent modifications made?
Who can access or update it: If you’re examining a dataset, is it public? Are
special permissions needed to customize or modify it?
SELECT is the section of a query that indicates what data you want SQL to return to
you.
FROM is the section of a query that indicates which table the desired data comes
from. You must provide a full path to the table. The path includes the project
name, database name, and table name, each separated by a period.
WHERE is the section of a query that indicates any filters you’d like to apply to
your tabl
Administrative metadata: Metadata that indicates the technical source of a digital
asset
CSV (comma-separated values) file: A delimited text file that uses a comma to
separate values
Data governance: A process for ensuring the formal management of a company’s data
assets
Descriptive metadata: Metadata that describes a piece of data and can be used to
identify it at a later point in time
Foreign key: A field within a database table that is a primary key in another table
(Refer to primary key)
FROM: The section of a query that indicates where the selected data comes from
Geolocation: The geographical location of a person or device by means of digital
information
Metadata: Data about data
Metadata repository: A database created to store metadata
Naming conventions: Consistent guidelines that describe the content, creation date,
and version of a file in its name
Normalized database: A database in which only related data is stored in each table
Notebook: An interactive, editable programming environment for creating data
reports and showcasing data skills
Primary key: An identifier in a database that references a column in which each
value is unique (Refer to foreign key)
Redundancy: When the same piece of data is stored in two or more places
Schema: A way of describing how something, such as data, is organized
SELECT: The section of a query that indicates the subset of a dataset
Structural metadata: Metadata that indicates how a piece of data is organized and
whether it is part of one or more than one data collection
WHERE: The section of a query that specifies criteria that the requested data must
meet
World Health Organization: An organization whose primary role is to direct and
coordinate international health within the United Nations system

You might also like