Data Analytics Course 3
Data Analytics Course 3
Every data analyst’s goal is to conduct efficient data analysis. One way to increase the efficiency of your
analyses is to streamline processes that help save time and energy in the long run. Meaningful, logical, and
consistent file names help data analysts organize their data and automate their analysis process. When you use
consistent guidelines to describe the content, date, or version of a file and its name, you’re using file naming
conventions.
In this reading, you’ll learn more about best practices for file naming conventions and file organization.
It's also critical to ensure that file names are meaningful, consistent, and easy-to-read. File names should
include:
In the following sections, you’ll explore each part of a sales report file name that follows an established naming
convention, SalesReport_20231125_v02. This example will help you understand the key parts of a strong file
name and why they’re important.
Name
Giving a file a meaningful name to describe its contents makes searching for it straightforward. It also makes it
easy to understand the type of data the file contains.
In the example, the file name includes the text SalesReport, a succinct description of what the file contains: a
sales report.
Creation date
Knowing when a file was created can help you understand if it is relevant to your current analysis. For example,
you might want to analyze only data from 2023.
In the example, the year is described as 20231125. This reads as the sales report from November 25, 2023
following the year, month, and day (YYYYMMDD) format of the international date standard. Keep in mind that
different countries follow different date conventions, so make sure you know the date standard your company
follows.
Revision version
Including a revision version helps ensure you’re working with the correct file. You wouldn’t want to make edits to
an old version of a file without realizing it! When you include revision numbers in a file name, lead with a zero.
This way, if your team reaches more than nine rounds of revisions, double digits are already built into your
convention.
In the example, the version is described as v02. The v is short for the version of the file, and the number
following the v indicates which round of revisions the file is currently in.
When you use spaces and special characters in a file name, software may not be able to recognize them, which
causes problems and errors in some applications. An alternative is to use hyphens, underscores, and capital
letters. The example includes underscores between each piece of information, but your team could choose to
use hyphens between year, month, and date, too: SalesReport_2023_11_25_v02.
File organization
To keep your files organized, create folders and subfolders—in a logical hierarchy—to ensure related files are
stored together and can be found easily later. A hierarchy is a way of organizing files and folders. Broader-topic
folders are located at the top of the hierarchy, and more specific subfolders and files are contained within those
folders. Each folder can contain other folders and files. This allows you to group related files together and makes
it easier to find the files you need. In addition, it’s a best practice to store completed files separately from in-
progress files so the files you need are easy to find. Archive older files in a separate folder or in an external
storage location.
Key takeaways
Use consistent, meaningful file-naming conventions throughout your project to save you and your team time by
making data easy to find and use. File-naming conventions should be agreed upon by all team members before
starting a project and should describe the project by including its name, the date, and the revision version.
Document this information in a location that team members can access.
In order to do this, companies need to find ways to balance their data security measures with their data access
needs.
Luckily, there are a few security measures that can help companies do just that. The two we will talk about here
are encryption and tokenization.
Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don’t know
the algorithm. This algorithm is saved as a “key” which can be used to reverse the encryption; so if you have the
key, you can still use the data in its original form.
Tokenization replaces the data elements you want to protect with randomly generated data referred to as a
“token.” The original data is stored in a separate location and mapped to the tokens. To access the complete
original data, the user or application needs to have permission to use the tokenized data and the token mapping.
This means that even if the tokenized data is hacked, the original data is still safe and secure in a separate
location.
Encryption and tokenization are just some of the data security options out there. There are a lot of others, like
using authentication devices for AI technology.
As a junior data analyst, you probably won’t be responsible for building out these systems. A lot of companies
have entire teams dedicated to data security or hire third party companies that specialize in data security to
create these systems. But it is important to know that all companies have a responsibility to keep their data
secure, and to understand some of the potential systems your future employer might use.
However, one thing you absolutely can do to help strike the right balance is to use version control best practices.
Version control enables all collaborators within a file to track changes over time. You can understand who made
what changes to a file, when they were made, and why.
Here's a simple example: Perhaps you're working on a project with a team of other people. You are all
collaborating within the same set of files, but each person is responsible for a different part of the project.
Without version control, it would be very difficult to keep track of who made what changes to the files and when.
This would lead to confusion and, even worse, people accidentally overwriting each other's work! Version control
is essential for data analytics professionals because it allows users to effectively collaborate with others and
experiment with new ideas without fear of losing their work.
Data security: Protecting data from unauthorized access or corruption by adopting safety measures