0% found this document useful (0 votes)
14 views14 pages

Data Analitics 3

The document discusses various methods of data collection including interviews, observations, surveys, forms, questionnaires and cookies. It also discusses different types of data like first-party data collected by an individual, second-party data collected and sold by a group, and third-party data collected from outside sources. The document also covers data considerations when collecting data.

Uploaded by

sara sy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Data Analitics 3

The document discusses various methods of data collection including interviews, observations, surveys, forms, questionnaires and cookies. It also discusses different types of data like first-party data collected by an individual, second-party data collected and sold by a group, and third-party data collected from outside sources. The document also covers data considerations when collecting data.

Uploaded by

sara sy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

?

How data is collected


 Interviews
 Observations
 surveys
 Forms
 Questionnaires
 Cookies

Cookies: small files stored on computers that contain information about users.
Cookies can help inform advertisers about your personal interests and habits
.based on your online surfing, without personally identifying you

first-party data:. This is data collected by an individual or group using their


own resources

second-party which is data collected by a group directly from its audience and
then sold

third-party data collected from outside sources who did not collect it directly

:Data collection consideration


 How the data will be collected
 Choose data Sources
 Decide what data to use
 How much data to collect
 Select the right data type
 Determine the time frame
Population
All possible data values in a certain dataset

Sample
A part of a population that is representative of the population

data Discrete
Data that is counted and has a limited number Of values

Continuous data
Data that is measured and can have almost any numeric value

Nominal data
A type of qualitative data that is categorized without a set order

Structured data
Data organized in a certain format such as rows and columns

Unstructured data
Data that is not organized in any easily identifiable manner
Unstructured data examples:
 Audio files
 Video files
 Emails
 Photos
 social media
Data model
A model that is used for organizing data elements and how they relate to one
another

Data type
A specific kind of data attribute that tells what kind Of value the data is
Data types in spreadsheets:
• Number
• Text or string
• Boolean
Wide data
Data in which every data subject has a single row with multiple columns to hold
the values of various attributes of the subject
Long data
Data in which each row is one time point per subject, so each subject will have
data in multiple rows

Bias
A preference in favor of or against a person, group of people, or thing

Data bias
A type of error that systematically skews results in a certain direction

Types of data bias :

Sampling bias
A sample that isn't representative of the population as a whole
Unbiased sampling
When a sample is representative of the population being measured
Observer bias
(experimenter bias/ research bias)
The tendency for different people to observe things differently
Interpretation bias
The tendency to always interpret ambiguous situations in a positive or negative
way
Confirmation bias
The tendency to search for or interpret information in a way that confirms pre-
existing beliefs

good data :
• reliable R
• original O
• comprehensive C
• current C
• cited C

Ethics
Well-founded standards of right and wrong that prescribe what humans ought to
do, usually in terms of rights, obligations, benefits to society, fairness, or specific
virtues

Data ethics
Well-founded standards of right and wrong that dictate how data is collected,
shared, and used
GDPR
General Data Protection Regulation of the European Union
Aspects of data ethics
• Ownership
• Transaction transparency
• Consent
• Currency
• Privacy
• Openness
Ownership
Individuals own the raw data they provide and they have primary control over its
usage, how it's processed, and how it's shared
Transaction transparency
All data-processing activities and algorithms should be completely explainable and
understood by the individual who provides their data

Consent
An individual's right to know explicit details about how and why their data will be
used before agreeing to provide it

Currency
Individuals should be aware of financial transactions resulting from the use of their
personal data and the scale of these transactions

Privacy
Preserving a data subject's information and activity any time a data transaction
occurs
• Protection from unauthorized access to our private data
• Freedom from inappropriate use of our data
• The right to inspect, update, or correct our data
• Ability to give consent to use our data
• Legal right to access the data
Openness
Free access, usage, and sharing of data
____________________________________________
Data interoperability
The ability of data systems and services to openly connect and share data
open data is part of data ethics, which has to do with using data ethically.
Openness refers to free access, usage, and sharing of data. But for data to be
considered open, it has to:
• Be available and accessible to the public as a complete dataset
• Be provided under terms that allow it to be reused and redistributed
• Allow universal participation so that anyone can use, reuse, and redistribute
the data

Personally identifiable information, or PII, is information that can be used by itself


or with other data to track down a person's identity.

Data anonymization is the process of protecting people's private or sensitive


data by eliminating that kind of information. Typically, data anonymization
involves blanking, hashing, or masking personal information, often by using fixed-
length codes to represent data columns or hiding data with altered values.

Healthcare and financial data are two of the most sensitive types of data
____________________________
Database
A collection of data stored in a computer system

Relational database
A database that contains a series of related tables that can be connected via their
relationships

Normalization is a process of organizing data in a relational database. For


example, creating tables and establishing relationships between those tables. It is
applied to eliminate data redundancy, increase data integrity, and reduce
complexity in a database.

Primary key
An identifier that references a column in which each value is unique
 Used to ensure data in a specific column is unique
 Uniquely identifies a record in a relational database table
 Only one primary key is allowed in a table

Foreign key
A field within a table that is a primary key in another table
 A column or group of columns in a relational database
 table that provides a link between the data in two tables
 Refers to the field in a table that's the primary key of another table
 More than one foreign key is allowed to exist in a table
Metadata
Data about data
Metadata is used in database management to help data analysts interpret the
contents of the data within the database

Descriptive metadata
Metadata that describes a piece of data and can be used to identify it at a later
point in time

Structural metadata
Metadata that indicates how a piece of data is organized and whether it is part of
one, or morel than one, data collection

Administrative metadata
Metadata that indicates the technical source of a digital asset

Examples of metadata
Photos,Emails,Spreadsheets and documents ,Websites,Digital files,Books

Elements of metadata
Title and description
Tags and categories
Who created it and when
Who last modified it and when
Who can access or update it
Metadata creates a single source of truth by keeping things consistent and uniform
Metadata also makes data more reliable by making sure it's accurate, precise,
relevant, and timely

Metadata repository
A database specifically created to store metadata , Metadata repositories make it
easier and faster to bring together multiple sources for data analysis
• Describe the state and location of the metadata
• Describe the structures of the tables inside
• Describe how the data flows through the repository

Metadata is stored in a single, central location, and gives the company


standardized information about all of its data

Data governance
A process to ensure the formal management of a company's data assets

Internal data
Data that lives within a company's own systems

External data
Data that lives and is generated outside an organization

CSV = Comma-separated values


A CSV file saves data in a table format
Open data helps create a lot of public datasets that you can access to make data-
driven decisions.

Sorting data
Arranging data into a meaningful order to make it easier to understand, analyze,
and visualize

Filtering
Showing only the data that meets a specific criteria while hiding the rest

Types of BigQuery accounts:


Sandbox:
• 12 projects at a time
• Cannot insert new records to a database
• Cannot update field values of existing records
Free trial:
• days 90 first the during credit in 300$
• account paid a tO upgrade to Select
• charged automatically be never will You
_________________________________________________
Benefits of organizing data
Makes it easier to find and use
Helps you avoid making mistakes during your analysis
Helps to protect your data

Best practices when organizing data:


• Naming conventions
• Foldering
• Archiving older files
• align your naming and storage practices With your team
• develop metadata practices

Naming conventions
Consistent guidelines that describe the content, date, or version of a file in its
name ,Use logical and descriptive names for your files to make them easier to
find and use

File naming DO'S:


• Work out your conventions early
• Align file naming with your team
• Make sure file names are meaningful
• Keep file names short and sweet
• Format dates yyyymmdd:SalesReport20201125
• Lead revision numbers with O: SalesReport20201125v02
• Use hyphens, underscores, or capitalized letters: SalesReport_2020 11 25
v02
Data security
Protecting data from unauthorized access or corruption by adopting safety
measures

Encryption uses a unique algorithm to alter data and make it unusable by


users and applications that don’t know the algorithm. This algorithm is saved as
a “key” which can be used to reverse the encryption; so if you have the key, you
can still use the data in its original form.

Tokenization replaces the data elements you want to protect with randomly
generated data referred to as a “token.” The original data is stored in a separate
location and mapped to the tokens. To access the complete original data, the
user or application needs to have permission to use the tokenized data and the
token mapping. This means that even if the tokenized data is hacked, the
original data is still safe and secure in a separate location.

___________________________________________

A professional online presence can :


 Help potential employers find you
 Make connections with other analysts
 Learn and share data findings
 Participate in community events
Networking
Professional relationship building
Search for public meetups in your area.

Partially Derivative
O'Reilly Data Show

O'Reilly
Kaggle
KDnuggets
GitHub
Medium

Mentor
A professional who shares their knowledge, skills, and experience to help you
develop and grow

Sponsor
A professional advocate who's committed to moving a sponsee's career forward
within an organization

A mentor helps you skill up


A sponsor helps you move up

You might also like