0% found this document useful (0 votes)

21 views22 pages

Course 3

Uploaded by

haminjohn15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views22 pages

Course 3

Uploaded by

haminjohn15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Selecting the right data

Following are some data-collection considerations to keep in mind for your analysis:

How the data will be collected

Decide if you will collect the data using your own resources or receive (and possibly purchase it)
from another party. Data that you collect yourself is called first-party data.

Data sources
If you don’t collect the data using your own resources, you might get data from second-party or third-
party data providers. Second-party data is collected directly by another group and then sold. Third-
party data is sold by a provider that didn’t collect the data themselves. Third-party data might come
from a number of different sources.

Solving your business problem

Datasets can show a lot of interesting information. But be sure to choose data that can actually help
solve your problem question. For example, if you are analyzing trends over time, make sure you use
time series data — in other words, data that includes dates.

How much data to collect

If you are collecting your own data, make reasonable decisions about sample size. A random
sample from existing data might be fine for some projects. Other projects might need more strategic
data collection to focus on certain criteria. Each project has its own needs.

Time frame
If you are collecting your own data, decide how long you will need to collect it, especially if you are
tracking trends over a long period of time. If you need an immediate answer, you might not have
time to collect new data. In this case, you would need to use historical data that already exists.

Use the flowchart below if data collection relies heavily on how much time you have:
Data formats in practice
When you think about the word "format," a lot of things might come to mind. Think of an
advertisement for your favorite store. You might find it in the form of a print ad, a billboard, or even a
commercial. The information is presented in the format that works best for you to take it in. The
format of a dataset is a lot like that, and choosing the right format will help you manage and use your
data in the best way possible.
Data format examples
As with most things, it is easier for definitions to click when we can pair them with real life examples.
Review each definition first and then use the examples to lock in your understanding of each data
format.

the
following table highlights the differences between primary and secondary data and examples of each
Data Format
Definition Examples
Classification
- Data from an interview you conducted - Data from a survey retu
Collected by a researcher
Primary data 20 participants - Data from questionnaires you got back from a gr
from first-hand sources
workers
- Data you bought from a local data analytics firm’s customer pro
Gathered by other people
Secondary data Demographic data collected by a university - Census data gathere
or from other research
federal government

the following table highlights the differences between internal and external data and examples of
each
Data Format
Definition Examples
Classification
- Wages of employees across different business units tracked by HR -
Data that lives inside a
Internal data Sales data by store location - Product inventory levels across
company’s own systems
distribution centers
Data that lives outside of a - National average wages for the various positions throughout your
External data
company or organization organization - Credit reports for customers of an auto dealership
the following table highlights the differences between continuous and discrete data and examples of
each
Data Format
Definition Examples
Classification
Data that is measured and can - Height of kids in third grade classes (52.5 inches, 65.7 in
Continuous data
have almost any numeric value Runtime markers in a video - Temperature
- Number of people who visit a hospital on a daily basis (1
Data that is counted and has a
Discrete data 200) - Room’s maximum capacity allowed - Tickets sold i
limited number of values
current month

the following table highlights the differences between qualitative and quantitative data and examples
of each
Data Format
Definition Examples
Classification
Subjective and explanatory measures - Exercise activity most enjoyed - Favorite brands o
Qualitative
of qualities and characteristics loyal customers - Fashion preferences of young adu
- Percentage of board certified doctors who are wom
Specific and objective measures of
Quantitative Population of elephants in Africa - Distance from E
numerical facts
Mars

the following table highlights the differences between nominal and ordinal data and examples of
each
Data Format
Definition Examples
Classification
A type of qualitative data that - First time customer, returning customer, regular customer -
Nominal isn’t categorized with a set applicant, existing applicant, internal applicant - New listing
order price listing, foreclosure
- Movie ratings (number of stars: 1 star, 2 stars, 3 stars) - Ra
A type of qualitative data with
Ordinal choice voting selections (1st, 2nd, 3rd) - Income level (low i
a set order or scale
middle income, high income)
Data Format
Definition Examples
Classification
Data organized in a certain format, like rows and - Expense reports - Tax returns - Store
Structured data
columns inventory
Data that isn’t organized in any easily identifiable
Unstructured data - Social media posts - Emails - Videos
manner

the following table highlights the differences between structured and unstructured data and
examples of each

The structure of data

Data is everywhere and it can be stored in lots of ways. Two general categories of data are:

 Structured data: Organized in a certain format, such as rows and columns.

 Unstructured data: Not organized in any easy-to-identify way.
For example, when you rate your favorite restaurant online, you're creating structured data. But
when you use Google Earth to check out a satellite image of a restaurant location, you're using
unstructured data.

Here's a refresher on the characteristics of structured and unstructured data:

S
tructured data: - Defined data types - Most often quantitative data - Easy to organize - Easy to
search - Easy to analyze - Stored in relational databases - Contained in rows and columns -
Examples: Excel, Google Sheets, SQL, customer data, phone records, transaction history
Unstructured data: - Varied data types - Most often qualitative data - Difficult to search - Provides
more freedom for analysis - Stored in data lakes and NoSQL databases - Can't be put in rows and
columns - Examples: Text messages, social media comments, phone call transcriptions, various log
files, images, audio, video

Structured data
As we described earlier, structured data is organized in a certain format. This makes it easier to
store and query for business needs. If the data is exported, the structure goes along with the data.

Unstructured data
Unstructured data can’t be organized in any easily identifiable manner. And there is much more
unstructured than structured data in the world. Video and audio files, text files, social media content,
satellite imagery, presentations, PDF files, open-ended survey responses, and websites all qualify
as types of unstructured data.

The fairness issue

The lack of structure makes unstructured data difficult to search, manage, and analyze. But recent
advancements in artificial intelligence and machine learning algorithms are beginning to change that.
Now, the new challenge facing data scientists is making sure these tools are inclusive and unbiased.
Otherwise, certain elements of a dataset will be more heavily weighted and/or represented than
others. And as you're learning, an unfair dataset does not accurately represent the population,
causing skewed outcomes, low accuracy levels, and unreliable analysis.

Data modeling levels and techniques

This reading introduces you to data modeling and different types of data models. Data models help
keep data consistent and enable people to map out how data is organized. A basic understanding
makes it easier for analysts and other stakeholders to make sense of their data and use it in the right
ways.

Important note: As a junior data analyst, you won't be asked to design a data model. But you might
come across existing data models your organization already has in place.

What is data modeling?

Data modeling is the process of creating diagrams that visually represent how data is organized and
structured. These visual representations are called data models. You can think of data modeling as a
blueprint of a house. At any point, there might be electricians, carpenters, and plumbers using that
blueprint. Each one of these builders has a different relationship to the blueprint, but they all need it
to understand the overall structure of the house. Data models are similar; different users might have
different data needs, but the data model gives them an understanding of the structure as a whole.

Levels of data modeling

Each level of data modeling has a different level of detail.
1. Conceptual data modeling gives a high-level view of the data structure, such as how data interacts
across an organization. For example, a conceptual data model may be used to define the business
requirements for a new database. A conceptual data model doesn't contain technical details.
2. Logical data modeling focuses on the technical details of a database such as relationships,
attributes, and entities. For example, a logical data model defines how individual records are
uniquely identified in a database. But it doesn't spell out actual names of database tables. That's the
job of a physical data model.
3. Physical data modeling depicts how a database operates. A physical data model defines all entities
and attributes used; for example, it includes table names, column names, and data types for the
database.
More information can be found in this comparison of data models.

Data-modeling techniques
There are a lot of approaches when it comes to developing data models, but two common methods
are the Entity Relationship Diagram (ERD) and the Unified Modeling Language (UML) diagram.
ERDs are a visual way to understand the relationship between entities in the data model. UML
diagrams are very detailed diagrams that describe the structure of a system by showing the system's
entities, attributes, operations, and their relationships. As a junior data analyst, you will need to
understand that there are different data modeling techniques, but in practice, you will probably be
using your organization’s existing technique.

You can read more about ERD, UML, and data dictionaries in this data modeling techniques article.
Data analysis and data modeling
Data modeling can help you explore the high-level details of your data and how it is related across
the organization’s information systems. Data modeling sometimes requires data analysis to
understand how the data is put together; that way, you know how to map the data. And finally, data
models make it easier for everyone in your organization to understand and collaborate with you on
your data. This is important for you and everyone on your team!

Understanding Boolean logic

In this reading, you will explore the basics of Boolean logic and learn how to use multiple conditions
in a Boolean statement. These conditions are created with Boolean operators, including AND, OR,
and NOT. These operators are similar to mathematical operators and can be used to create logical
statements that filter your results. Data analysts use Boolean statements to do a wide range of data
analysis tasks, such as creating queries for searches and checking for conditions when writing
programming code.

Boolean logic example

Imagine you are shopping for shoes, and are considering certain preferences:

 You will buy the shoes only if they are pink and grey
 You will buy the shoes if they are entirely pink or entirely grey, or if they are pink and grey
 You will buy the shoes if they are grey, but not if they have any pink
Below are Venn diagrams that illustrate these preferences. AND is the center of the Venn diagram,
where two conditions overlap. OR includes either condition. NOT includes only the part of the Venn
diagram that doesn't contain the exception.
The AND operator
Your condition is “If the color of the shoe has any combination of grey and pink, you will buy them.”
The Boolean statement would break down the logic of that statement to filter your results by both
colors. It would say “IF (Color=”Grey”) AND (Color=”Pink”) then buy them.” The AND operator lets
you stack multiple conditions.

Below is a simple truth table that outlines the Boolean logic at work in this statement. In the Color is
Grey column, there are two pairs of shoes that meet the color condition. And in the Color is Pink
column, there are two pairs that meet that condition. But in the If Grey AND Pink column, there is
only one pair of shoes that meets both conditions. So, according to the Boolean logic of the
statement, there is only one pair marked true. In other words, there is one pair of shoes that you can
buy.

Color is Grey Color is Pink If Grey AND Pink, then Buy Boolean Logic
Grey/True Pink/True True/Buy True AND True = True
Grey/True Black/False False/Don't buy True AND False = False
Red/False Pink/True False/Don't buy False AND True = False
Red/False Green/False False/Don't buy False AND False = False
The OR operator
The OR operator lets you move forward if either one of your two conditions is met. Your condition is
“If the shoes are grey or pink, you will buy them.” The Boolean statement would be “IF
(Color=”Grey”) OR (Color=”Pink”) then buy them.” Notice that any shoe that meets either the Color is
Grey or the Color is Pink condition is marked as true by the Boolean logic. According to the truth
table below, there are three pairs of shoes that you can buy.

Color is Grey Color is Pink If Grey OR Pink, then Buy Boolean Logic
Red/False Black/False False/Don't buy False OR False = False
Black/False Pink/True True/Buy False OR True = True
Grey/True Green/False True/Buy True OR False = True
Grey/True Pink/True True/Buy True OR True = True
The NOT operator
Finally, the NOT operator lets you filter by subtracting specific conditions from the results. Your
condition is "You will buy any grey shoe except for those with any traces of pink in them." Your
Boolean statement would be “IF (Color="Grey") AND (Color=NOT “Pink”) then buy them.” Now, all of
the grey shoes that aren't pink are marked true by the Boolean logic for the NOT Pink condition. The
pink shoes are marked false by the Boolean logic for the NOT Pink condition. Only one pair of shoes
is excluded in the truth table below.
Boolean Logic for NOT If Grey AND (NOT Pink), then
Color is Grey Color is Pink Boolean Logic
Pink Buy
Grey/True Red/False Not False = True True/Buy True AND True = True
Grey/True Black/False Not False = True True/Buy True AND True = True
Grey/True Green/False Not False = True True/Buy True AND True = True
Grey/True Pink/True Not True = False False/Don't buy True AND False = False

The power of multiple conditions

For data analysts, the real power of Boolean logic comes from being able to combine multiple
conditions in a single statement. For example, if you wanted to filter for shoes that were grey or pink,
and waterproof, you could construct a Boolean statement such as: “IF ((Color = “Grey”) OR (Color =
“Pink”)) AND (Waterproof=“True”).” Notice that you can use parentheses to group your conditions
together.

Whether you are doing a search for new shoes or applying this logic to your database queries,
Boolean logic lets you create multiple conditions to filter your results. And now that you know a little
more about how Boolean logic is used, you can start using it!

Transforming data
What is data transformation?

A woman presenting data, a hand holding a medal, two people chatting, a ship's wheel being
steered, two people high-fiving each other
In this reading, you will explore how data is transformed and the differences between wide and long
data. Data transformation is the process of changing the data’s format, structure, or values. As a
data analyst, there is a good chance you will need to transform data at some point to make it easier
for you to analyze it.

Data transformation usually involves:

 Adding, copying, or replicating data

 Deleting fields or records
 Standardizing the names of variables
 Renaming, moving, or combining columns in a database
 Joining one set of data with another
 Saving a file in a different format. For example, saving a spreadsheet as a comma separated values
(CSV) file.

Why transform data?

Goals for data transformation might be:

 Data organization: better organized data is easier to use

 Data compatibility: different applications or systems can then use the same data
 Data migration: data with matching formats can be moved from one system to another
 Data merging: data with the same organization can be merged together
 Data enhancement: data can be displayed with more detailed fields
 Data comparison: apples-to-apples comparisons of the data can then be made

Data transformation example: data merging

Mario is a plumber who owns a plumbing company. After years in the business, he buys another
plumbing company. Mario wants to merge the customer information from his newly acquired
company with his own, but the other company uses a different database. So, Mario needs to make
the data compatible. To do this, he has to transform the format of the acquired company’s data.
Then, he must remove duplicate rows for customers they had in common. When the data is
compatible and together, Mario’s plumbing company will have a complete and merged customer
database.

Data transformation example: data organization (long

to wide)
To make it easier to create charts, you may also need to transform long data to wide data. Consider
the following example of transforming stock prices (collected as long data) to wide data.

Long data is data where each row contains a single data point for a particular item. In the long data
example below, individual stock prices (data points) have been collected for Apple (AAPL), Amazon
(AMZN), and Google (GOOGL) (particular items) on the given dates.

Long data example: Stock prices

Wide data is data where each row contains multiple data points for the particular items identified in the
columns.

Wide data example: Stock prices

With data transformed to wide data, you can create a chart comparing how each company's stock
changed over the same period of time.

You might notice that all the data included in the long format is also in the wide format. But wide data
is easier to read and understand. That is why data analysts typically transform long data to wide data
more often than they transform wide data to long data. The following table summarizes when each
format is preferred:

Wide data is preferred when Long data is preferred when

Creating tables and charts with a few variables Storing a lot of variables about each subject. For example, 60 years
about each subject worth of interest rates for each bank
Wide data is preferred when Long data is preferred when
Comparing straightforward line graphs Performing advanced statistical analysis or graphing

Data anonymization
What is data anonymization?
You have been learning about the importance of privacy in data analytics. Now, it is time to talk
about data anonymization and what types of data should be anonymized. Personally identifiable
information, or PII, is information that can be used by itself or with other data to track down a
person's identity.

Data anonymization is the process of protecting people's private or sensitive data by eliminating that
kind of information. Typically, data anonymization involves blanking, hashing, or masking personal
information, often by using fixed-length codes to represent data columns, or hiding data with altered
values.

Your role in data anonymization

Organizations have a responsibility to protect their data and the personal information that data might
contain. As a data analyst, you might be expected to understand what data needs to be anonymized,
but you generally wouldn't be responsible for the data anonymization itself. A rare exception might
be if you work with a copy of the data for testing or development purposes. In this case, you could be
required to anonymize the data before you work with it.

What types of data should be anonymized?

Healthcare and financial data are two of the most sensitive types of data. These industries rely a lot
on data anonymization techniques. After all, the stakes are very high. That’s why data in these two
industries usually goes through de-identification, which is a process used to wipe data clean of all
personally identifying information.
Data anonymization is used in just about every industry. That is why it is so important for data
analysts to understand the basics. Here is a list of data that is often anonymized:

 Telephone numbers
 Names
 License plates and license numbers
 Social security numbers
 IP addresses
 Medical records
 Email addresses
 Photographs
 Account numbers
For some people, it just makes sense that this type of data should be anonymized. For others, we
have to be very specific about what needs to be anonymized. Imagine a world where we all had
access to each other’s addresses, account numbers, and other identifiable information. That would
invade a lot of people’s privacy and make the world less safe. Data anonymization is one of the
ways we can keep data private and secure!

The open-data debate

Just like data privacy, open data is a widely debated topic in today’s world. Data analysts think a lot
about open data, and as a future data analyst, you need to understand the basics to be successful in
your new role.
What is open data?
In data analytics, open data is part of data ethics, which has to do with using data ethically. Openness
refers to free access, usage, and sharing of data. But for data to be considered open, it has to:

 Be available and accessible to the public as a complete dataset

 Be provided under terms that allow it to be reused and redistributed
 Allow universal participation so that anyone can use, reuse, and redistribute the data
Data can only be considered open when it meets all three of these standards.

The open data debate: What data should be publicly

available?
One of the biggest benefits of open data is that credible databases can be used more widely.
Basically, this means that all of that good data can be leveraged, shared, and combined with other
data. This could have a huge impact on scientific collaboration, research advances, analytical
capacity, and decision-making. But it is important to think about the individuals being represented by
the public, open data, too.

Third-party data is collected by an entity that doesn’t have a direct relationship with the data. You
might remember learning about this type of data earlier. For example, third parties might collect
information about visitors to a certain website. Doing this lets these third parties create audience
profiles, which helps them better understand user behavior and target them with more effective
advertising.

Personal identifiable information (PII) is data that is reasonably likely to identify a person and make
information known about them. It is important to keep this data safe. PII can include a person’s
address, credit card information, social security number, medical records, and more.

Everyone wants to keep personal information about themselves private. Because third-party data is
readily available, it is important to balance the openness of data with the privacy of individuals.
Databases in data analytics
Databases enable analysts to manipulate, store, and process data. This helps them search through
data a lot more efficiently to get the best insights.

Relational databases
A relational database is a database that contains a series of tables that can be connected to show
relationships. Basically, they allow data analysts to organize and link data based on what the data
has in common.

In a non-relational table, you will find all of the possible variables you might be interested in
analyzing all grouped together. This can make it really hard to sort through. This is one reason
why relational databases are so common in data analysis: they simplify a lot of analysis
processes and make data easier to find and use across an entire database.

Database Normalization
Normalization is a process of organizing data in a relational database. For example, creating
tables and establishing relationships between those tables. It is applied to eliminate data
redundancy, increase data integrity, and reduce complexity in a database.

The key to relational databases

Tables in a relational database are connected by the fields they have in common. You might
remember learning about primary and foreign keys before. As a quick refresher, a primary key
is an identifier that references a column in which each value is unique. In other words, it's a
column of a table that is used to uniquely identify each record within that table. The value
assigned to the primary key in a particular row must be unique within the entire table. For
example, if customer_id is the primary key for the customer table, no two customers will ever
have the same customer_id.

By contrast, a foreign key is a field within a table that is a primary key in another table. A table
can have only one primary key, but it can have multiple foreign keys. These keys are what create
the relationships between tables in a relational database, which helps organize and connect data
across multiple tables in the database.

Some tables don't require a primary key. For example, a revenue table can have multiple foreign
keys and not have a primary key. A primary key may also be constructed using multiple columns
of a table. This type of primary key is called a composite key. For example, if customer_id and
location_id are two columns of a composite key for a customer table, the values assigned to
those fields in any given row must be unique within the entire table.

SQL? You’re speaking my language

Databases use a special language to communicate called a query language. Structured Query
Language (SQL) is a type of query language that lets data analysts communicate with a
database. So, a data analyst will use SQL to create a query to view the specific data that they
want from within the larger set. In a relational database, data analysts can write queries to get
data from the related tables. SQL is a powerful tool for working with databases — which is why
you are going to learn more about it coming up!

Metadata is as important as the data

itself
Data analytics, by design, is a field that thrives on collecting and organizing data. In this reading, you
are going to learn about how to analyze and thoroughly understand every aspect of your data.

Take a look at any data you find. What is it? Where did it come from? Is it useful? How do you
know? This is where metadata comes in to provide a deeper understanding of the data. To put it
simply, metadata is data about data. In database management, it provides information about other
data and helps data analysts interpret the contents of the data within a database.

Regardless of whether you are working with a large or small quantity of data, metadata is the mark
of a knowledgeable analytics team, helping to communicate about data across the business and
making it easier to reuse data. In essence, metadata tells the who, what, when, where, which, how,
and why of data.

Elements of metadata
Before looking at metadata examples, it is important to understand what type of information
metadata typically provides.

Title and description

What is the name of the file or website you are examining? What type of content does it contain?

Tags and categories

What is the general overview of the data that you have? Is the data indexed or described in a
specific way?

Who created it and when

Where did the data come from, and when was it created? Is it recent, or has it existed for a long
time?

Who last modified it and when

Were any changes made to the data? If yes, were the modifications recent?

Who can access or update it

Is this dataset public? Are special permissions needed to customize or modify the dataset?

Examples of metadata
In today’s digital world, metadata is everywhere, and it is becoming a more common practice to
provide metadata on a lot of media and information you interact with. Here are some real-world
examples of where to find metadata:

Photos
Whenever a photo is captured with a camera, metadata such as camera filename, date, time, and
geolocation are gathered and saved with it.

Emails
When an email is sent or received, there is lots of visible metadata such as subject line, the sender,
the recipient and date and time sent. There is also hidden metadata that includes server names, IP
addresses, HTML format, and software details.

Spreadsheets and documents

Spreadsheets and documents are already filled with a considerable amount of data so it is no
surprise that metadata would also accompany them. Titles, author, creation date, number of pages,
user comments as well as names of tabs, tables, and columns are all metadata that one can find in
spreadsheets and documents.

Websites
Every web page has a number of standard metadata fields, such as tags and categories, site
creator’s name, web page title and description, time of creation and any iconography.

Digital files
Usually, if you right click on any computer file, you will see its metadata. This could consist of file
name, file size, date of creation and modification, and type of file.

Books
Metadata is not only digital. Every book has a number of standard metadata on the covers and
inside that will inform you of its title, author’s name, a table of contents, publisher information,
copyright description, index, and a brief description of the book’s contents.

Data as you know it

Knowing the content and context of your data, as well as how it is structured, is very valuable in your
career as a data analyst. When analyzing data, it is important to always understand the full picture. It
is not just about the data you are viewing, but how that data comes together. Metadata ensures that
you are able to find, use, preserve, and reuse data in the future. Remember, it will be your
responsibility to manage and make use of data in its entirety; metadata is as important as the data
itself.

Organization guidelines
This reading summarizes best practices for file naming, organization, and storage.

Best practices for file naming conventions

Review the following file naming recommendations:

 Work out and agree on file naming conventions early on in a project to avoid renaming files
again and again.
 Align your file naming with your team's or company's existing file-naming conventions.
 Ensure that your file names are meaningful; consider including information like project name
and anything else that will help you quickly identify (and use) the file for the right purpose.
 Include the date and version number in file names; common formats are YYYYMMDD for dates
and v## for versions (or revisions).
 Create a text file as a sample file with content that describes (breaks down) the file naming
convention and a file name that applies it.
 Avoid spaces and special characters in file names. Instead, use dashes, underscores, or capital
letters. Spaces and special characters can cause errors in some applications.

Best practices for keeping files organized

Remember these tips for staying organized as you work with files:

 Create folders and subfolders in a logical hierarchy so related files are stored together.
 Separate ongoing from completed work so your current project files are easier to find. Archive
older files in a separate folder, or in an external storage location.
 If your files aren't automatically backed up, manually back them up often to avoid losing
important work.

Balancing security and analytics

The battle between security and data analytics
Data security means protecting data from unauthorized access or corruption by putting safety
measures in place. Usually the purpose of data security is to keep unauthorized users from
accessing or viewing sensitive data. Data analysts have to find a way to balance data security with
their actual analysis needs. This can be tricky-- we want to keep our data safe and secure, but we
also want to use it as soon as possible so that we can make meaningful and timely observations.

In order to do this, companies need to find ways to balance their data security measures with their
data access needs.
Luckily, there are a few security measures that can help companies do just that. The two we will talk
about here are encryption and tokenization.

Encryption uses a unique algorithm to alter data and make it unusable by users and applications that
don’t know the algorithm. This algorithm is saved as a “key” which can be used to reverse the
encryption; so if you have the key, you can still use the data in its original form.

Tokenization replaces the data elements you want to protect with randomly generated data referred
to as a “token.” The original data is stored in a separate location and mapped to the tokens. To
access the complete original data, the user or application needs to have permission to use the
tokenized data and the token mapping. This means that even if the tokenized data is hacked, the
original data is still safe and secure in a separate location.

Encryption and tokenization are just some of the data security options out there. There are a lot of
others, like using authentication devices for AI technology.

As a junior data analyst, you probably won’t be responsible for building out these systems. A lot of
companies have entire teams dedicated to data security or hire third party companies that specialize
in data security to create these systems. But it is important to know that all companies have a
responsibility to keep their data secure, and to understand some of the potential systems your future
employer might use.

Advanced Data Management Techniques
No ratings yet
Advanced Data Management Techniques
257 pages
VETMI Data Analysis Workshop
No ratings yet
VETMI Data Analysis Workshop
577 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Data Types and Sources
No ratings yet
Data Types and Sources
36 pages
Unit 1
No ratings yet
Unit 1
85 pages
Microstrategy - ProjectDesign
No ratings yet
Microstrategy - ProjectDesign
601 pages
Dr. Ayaz - Data Science Presentation
No ratings yet
Dr. Ayaz - Data Science Presentation
164 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
52 pages
TYBSC CS Data Science Munotes
No ratings yet
TYBSC CS Data Science Munotes
137 pages
Da Mod 1
No ratings yet
Da Mod 1
60 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
Moduke 2
No ratings yet
Moduke 2
55 pages
Fundamentals of Machine Learning and Data Science
No ratings yet
Fundamentals of Machine Learning and Data Science
73 pages
Data Visulaziation
No ratings yet
Data Visulaziation
42 pages
M2 - Data Analysis and Statistical Techniques
No ratings yet
M2 - Data Analysis and Statistical Techniques
49 pages
3 Data Science Intro
No ratings yet
3 Data Science Intro
76 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Module 2
No ratings yet
Module 2
55 pages
Unit01-Advanced Data Management Techniques
No ratings yet
Unit01-Advanced Data Management Techniques
11 pages
Lesson 03 Understanding The Data
No ratings yet
Lesson 03 Understanding The Data
81 pages
DS Notes
No ratings yet
DS Notes
49 pages
FIRST TERM SS2 Data Processing
100% (2)
FIRST TERM SS2 Data Processing
21 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 4 Intangible Assets
No ratings yet
Chapter 4 Intangible Assets
59 pages
ANL201 Study Unit 3 - 2023
No ratings yet
ANL201 Study Unit 3 - 2023
48 pages
DAT100 Int Data Ana Lec3 Types of Data
No ratings yet
DAT100 Int Data Ana Lec3 Types of Data
35 pages
Experiment 4
No ratings yet
Experiment 4
13 pages
ML Lecture 4 Data
No ratings yet
ML Lecture 4 Data
22 pages
Relational Data Model and Relational Database Constraints
No ratings yet
Relational Data Model and Relational Database Constraints
38 pages
Coursera - Data Analytics - Course 3
No ratings yet
Coursera - Data Analytics - Course 3
14 pages
Introduction-to-Database-Management-Systems-DBMS For Beginners
No ratings yet
Introduction-to-Database-Management-Systems-DBMS For Beginners
11 pages
Lecture 5 1 Flavours of Data
No ratings yet
Lecture 5 1 Flavours of Data
30 pages
Notes 3 (Prepare Coursera)
No ratings yet
Notes 3 (Prepare Coursera)
67 pages
CS109a Lecture1
No ratings yet
CS109a Lecture1
67 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Module 1 - Lecture 3 - Types of Data - 16.5.2022
No ratings yet
Module 1 - Lecture 3 - Types of Data - 16.5.2022
38 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
DBMS LAB Programs For Mca
No ratings yet
DBMS LAB Programs For Mca
20 pages
Data Analyst Work
No ratings yet
Data Analyst Work
22 pages
DATA ANALYSIS - Full - Note - Immersive 2
No ratings yet
DATA ANALYSIS - Full - Note - Immersive 2
13 pages
Ifrs 5 Noncurrent Assets Held For Sale
No ratings yet
Ifrs 5 Noncurrent Assets Held For Sale
25 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
4.0 Introduction To Data
No ratings yet
4.0 Introduction To Data
16 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Itpc 101 - Chapter II
No ratings yet
Itpc 101 - Chapter II
18 pages
Chapter 3
No ratings yet
Chapter 3
38 pages
How Data Is Col
No ratings yet
How Data Is Col
11 pages
DSUR Notes-1
No ratings yet
DSUR Notes-1
12 pages
Chapter 1.1 Introduction To Data
No ratings yet
Chapter 1.1 Introduction To Data
10 pages
Chapter 9
No ratings yet
Chapter 9
30 pages
Chapter 1-Introduction To Data
No ratings yet
Chapter 1-Introduction To Data
18 pages
EDA Unit-1
No ratings yet
EDA Unit-1
9 pages
DBMS Assignment 2
No ratings yet
DBMS Assignment 2
5 pages
Data Modeling ER
33% (3)
Data Modeling ER
89 pages
Google Certificate Notes
No ratings yet
Google Certificate Notes
36 pages
Fundamentals of Data Analysis
No ratings yet
Fundamentals of Data Analysis
10 pages
Data Formats in Practice
0% (1)
Data Formats in Practice
4 pages
DBMSit
No ratings yet
DBMSit
282 pages
Dividend Theories
No ratings yet
Dividend Theories
10 pages
DBMS Section A Lesson Plan
No ratings yet
DBMS Section A Lesson Plan
2 pages
Undestanding Data Module-3
No ratings yet
Undestanding Data Module-3
8 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
Database Management System: by Hemant Tulsani
No ratings yet
Database Management System: by Hemant Tulsani
33 pages
Sample Security Plan
No ratings yet
Sample Security Plan
9 pages
The Characteristics of Databases
No ratings yet
The Characteristics of Databases
15 pages
ER Model
No ratings yet
ER Model
14 pages
Investment CH 2
No ratings yet
Investment CH 2
7 pages
Lecture - 1 Introduc - On To Database System: CS344 S. Ranbir Singh
No ratings yet
Lecture - 1 Introduc - On To Database System: CS344 S. Ranbir Singh
66 pages
DBMS
No ratings yet
DBMS
12 pages
18 Requirement Elicitation Technique List For BA
No ratings yet
18 Requirement Elicitation Technique List For BA
8 pages
Untitled
No ratings yet
Untitled
10 pages
Note On Data Analytics
No ratings yet
Note On Data Analytics
21 pages
Data - Visualisation - Charts and Types of Data
No ratings yet
Data - Visualisation - Charts and Types of Data
7 pages
Information and Communication Technology Paper 2 (Sample Paper)
No ratings yet
Information and Communication Technology Paper 2 (Sample Paper)
5 pages
The Study of Data Modeling Methodologies For Column-Oriented Databases
No ratings yet
The Study of Data Modeling Methodologies For Column-Oriented Databases
6 pages
Data Formats in Practice - Coursera
No ratings yet
Data Formats in Practice - Coursera
5 pages
Week 1 Data Discovery
No ratings yet
Week 1 Data Discovery
5 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Data Formats in Practice
No ratings yet
Data Formats in Practice
6 pages
Unit II
No ratings yet
Unit II
6 pages
Data Formats in Practice
No ratings yet
Data Formats in Practice
4 pages
Class 12 CS Chapter 8 2024-2025
No ratings yet
Class 12 CS Chapter 8 2024-2025
3 pages
2database Management System of Multi Level Marketing Organisation
No ratings yet
2database Management System of Multi Level Marketing Organisation
24 pages
Slide 2
No ratings yet
Slide 2
2 pages
Entity Relationship Examples
100% (1)
Entity Relationship Examples
3 pages
SQL - Assignment
No ratings yet
SQL - Assignment
2 pages
Unstructured Data Vs Structured
No ratings yet
Unstructured Data Vs Structured
3 pages
Wa0000.
No ratings yet
Wa0000.
2 pages
Requirement Gathering
No ratings yet
Requirement Gathering
9 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
BCA313 Unit 2 Entity-Relationship Model
No ratings yet
BCA313 Unit 2 Entity-Relationship Model
60 pages
Data Warehousing With Oracle
No ratings yet
Data Warehousing With Oracle
0 pages

Course 3

Uploaded by

Course 3

Uploaded by

Selecting the right data

How the data will be collected

Solving your business problem

How much data to collect

The structure of data

 Structured data: Organized in a certain format, such as rows and columns.

Here's a refresher on the characteristics of structured and unstructured data:

The fairness issue

Data modeling levels and techniques

What is data modeling?

Levels of data modeling

Understanding Boolean logic

Boolean logic example

The power of multiple conditions

Data transformation usually involves:

 Adding, copying, or replicating data

Why transform data?

 Data organization: better organized data is easier to use

Data transformation example: data merging

Data transformation example: data organization (long

Long data example: Stock prices

Wide data example: Stock prices

Wide data is preferred when Long data is preferred when

Your role in data anonymization

What types of data should be anonymized?

The open-data debate

 Be available and accessible to the public as a complete dataset

The open data debate: What data should be publicly

The key to relational databases

SQL? You’re speaking my language

Metadata is as important as the data

Title and description

Tags and categories

Who created it and when

Who last modified it and when

Who can access or update it

Spreadsheets and documents

Data as you know it

Best practices for file naming conventions

Best practices for keeping files organized

Balancing security and analytics

You might also like