Data Cleaning Steps
Data Cleaning Steps
- **Inspect the Data**: Use queries like `SELECT * FROM table LIMIT 10;` to review the dataset.
- **Understand Schema**: Analyze the table structure using `DESCRIBE table;` or `SHOW CREATE
TABLE table;`.
- **Define Cleaning Objectives**: Identify what needs cleaning (e.g., duplicates, missing values, or
inconsistent formats).
- **Identify Missing Data**: Query for NULL or empty values using `SELECT * FROM table WHERE
column IS NULL;`.
- **Impute or Remove**:
- Fill missing values using `UPDATE` (e.g., `UPDATE table SET column = 'default_value' WHERE
column IS NULL;`).
- Remove rows with missing values using `DELETE` (e.g., `DELETE FROM table WHERE column
IS NULL;`).
```sql
FROM table
```
DELETE t1
FROM table t1
```
- **Case Formatting**: Ensure uniform case using functions like `LOWER()`, `UPPER()`, or
`INITCAP()`.
```sql
```
- **Normalize Values**: Replace inconsistent entries with standard ones using `UPDATE`.
```sql
FROM INFORMATION_SCHEMA.COLUMNS
```
```sql
```
#### 6. **Validate Data Integrity**
- Use constraints like `NOT NULL`, `UNIQUE`, or `FOREIGN KEY` to enforce data integrity.
```sql
```
---