Sharing-creation- deletion class notes
Sharing-creation- deletion class notes
Data prepared:
Data preparation is the process of transforming raw data into a form that can be
used for analysis and machine learning. It involves several steps, including:
Gathering data: Finding the right data to use, either from an existing data catalog
or by adding new sources
Assessing data: Getting to know the data and understanding what needs to be done to
make it useful
Cleaning and validating data: Removing faulty data, filling in gaps, and fixing
mistakes
Transforming and enriching data: Updating the format or value entries, or adding
related information
Storing data: Saving the prepared data or sending it to a third-party application
Data preparation can be a lengthy process, but it's essential to ensure that data
is accurate and relevant before it's used for analysis. Some key practices to keep
in mind include:
Using a common format for storing and organizing data, such as CSV, JSON, or XML
Centralizing data storage in a data warehouse, data lake, or cloud storage
Defining clear objectives and key metrics to help prioritize efforts
Using validation techniques, such as checksums, rules, and tests, to ensure data is
correct
Delete Data
Data should be deleted when it is no longer required for authorized purposes. Here
are some methods for deleting data:
Overwriting: Use software to overwrite data one or more times. This is a simple and
inexpensive method, but it may be ineffective on some media, like write-once CDs.
ATA Secure Erase: This method uses a single pass to overwrite data on the hard
drive.
Total destruction: Destroy the storage media using degaussers, shredders, or
drills.
Digital shredding or wiping: This method does not alter the physical asset.
Built-in sanitization commands: Use built-in commands to sanitize data.
The data deletion process should also include:
Documenting the decision and keeping evidence of the action
Putting data "beyond use" if it cannot be immediately overwritten
You can also develop and document data retention policies and schedules to
determine when data should be deleted.