Data Analitics 3
Data Analitics 3
Cookies: small files stored on computers that contain information about users.
Cookies can help inform advertisers about your personal interests and habits
.based on your online surfing, without personally identifying you
second-party which is data collected by a group directly from its audience and
then sold
third-party data collected from outside sources who did not collect it directly
Sample
A part of a population that is representative of the population
data Discrete
Data that is counted and has a limited number Of values
Continuous data
Data that is measured and can have almost any numeric value
Nominal data
A type of qualitative data that is categorized without a set order
Structured data
Data organized in a certain format such as rows and columns
Unstructured data
Data that is not organized in any easily identifiable manner
Unstructured data examples:
Audio files
Video files
Emails
Photos
social media
Data model
A model that is used for organizing data elements and how they relate to one
another
Data type
A specific kind of data attribute that tells what kind Of value the data is
Data types in spreadsheets:
• Number
• Text or string
• Boolean
Wide data
Data in which every data subject has a single row with multiple columns to hold
the values of various attributes of the subject
Long data
Data in which each row is one time point per subject, so each subject will have
data in multiple rows
Bias
A preference in favor of or against a person, group of people, or thing
Data bias
A type of error that systematically skews results in a certain direction
Sampling bias
A sample that isn't representative of the population as a whole
Unbiased sampling
When a sample is representative of the population being measured
Observer bias
(experimenter bias/ research bias)
The tendency for different people to observe things differently
Interpretation bias
The tendency to always interpret ambiguous situations in a positive or negative
way
Confirmation bias
The tendency to search for or interpret information in a way that confirms pre-
existing beliefs
good data :
• reliable R
• original O
• comprehensive C
• current C
• cited C
Ethics
Well-founded standards of right and wrong that prescribe what humans ought to
do, usually in terms of rights, obligations, benefits to society, fairness, or specific
virtues
Data ethics
Well-founded standards of right and wrong that dictate how data is collected,
shared, and used
GDPR
General Data Protection Regulation of the European Union
Aspects of data ethics
• Ownership
• Transaction transparency
• Consent
• Currency
• Privacy
• Openness
Ownership
Individuals own the raw data they provide and they have primary control over its
usage, how it's processed, and how it's shared
Transaction transparency
All data-processing activities and algorithms should be completely explainable and
understood by the individual who provides their data
Consent
An individual's right to know explicit details about how and why their data will be
used before agreeing to provide it
Currency
Individuals should be aware of financial transactions resulting from the use of their
personal data and the scale of these transactions
Privacy
Preserving a data subject's information and activity any time a data transaction
occurs
• Protection from unauthorized access to our private data
• Freedom from inappropriate use of our data
• The right to inspect, update, or correct our data
• Ability to give consent to use our data
• Legal right to access the data
Openness
Free access, usage, and sharing of data
____________________________________________
Data interoperability
The ability of data systems and services to openly connect and share data
open data is part of data ethics, which has to do with using data ethically.
Openness refers to free access, usage, and sharing of data. But for data to be
considered open, it has to:
• Be available and accessible to the public as a complete dataset
• Be provided under terms that allow it to be reused and redistributed
• Allow universal participation so that anyone can use, reuse, and redistribute
the data
Healthcare and financial data are two of the most sensitive types of data
____________________________
Database
A collection of data stored in a computer system
Relational database
A database that contains a series of related tables that can be connected via their
relationships
Primary key
An identifier that references a column in which each value is unique
Used to ensure data in a specific column is unique
Uniquely identifies a record in a relational database table
Only one primary key is allowed in a table
Foreign key
A field within a table that is a primary key in another table
A column or group of columns in a relational database
table that provides a link between the data in two tables
Refers to the field in a table that's the primary key of another table
More than one foreign key is allowed to exist in a table
Metadata
Data about data
Metadata is used in database management to help data analysts interpret the
contents of the data within the database
Descriptive metadata
Metadata that describes a piece of data and can be used to identify it at a later
point in time
Structural metadata
Metadata that indicates how a piece of data is organized and whether it is part of
one, or morel than one, data collection
Administrative metadata
Metadata that indicates the technical source of a digital asset
Examples of metadata
Photos,Emails,Spreadsheets and documents ,Websites,Digital files,Books
Elements of metadata
Title and description
Tags and categories
Who created it and when
Who last modified it and when
Who can access or update it
Metadata creates a single source of truth by keeping things consistent and uniform
Metadata also makes data more reliable by making sure it's accurate, precise,
relevant, and timely
Metadata repository
A database specifically created to store metadata , Metadata repositories make it
easier and faster to bring together multiple sources for data analysis
• Describe the state and location of the metadata
• Describe the structures of the tables inside
• Describe how the data flows through the repository
Data governance
A process to ensure the formal management of a company's data assets
Internal data
Data that lives within a company's own systems
External data
Data that lives and is generated outside an organization
Sorting data
Arranging data into a meaningful order to make it easier to understand, analyze,
and visualize
Filtering
Showing only the data that meets a specific criteria while hiding the rest
Naming conventions
Consistent guidelines that describe the content, date, or version of a file in its
name ,Use logical and descriptive names for your files to make them easier to
find and use
Tokenization replaces the data elements you want to protect with randomly
generated data referred to as a “token.” The original data is stored in a separate
location and mapped to the tokens. To access the complete original data, the
user or application needs to have permission to use the tokenized data and the
token mapping. This means that even if the tokenized data is hacked, the
original data is still safe and secure in a separate location.
___________________________________________
Partially Derivative
O'Reilly Data Show
O'Reilly
Kaggle
KDnuggets
GitHub
Medium
Mentor
A professional who shares their knowledge, skills, and experience to help you
develop and grow
Sponsor
A professional advocate who's committed to moving a sponsee's career forward
within an organization