0% found this document useful (0 votes)
2 views3 pages

Data Cleaning Steps

Uploaded by

Harsh Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Data Cleaning Steps

Uploaded by

Harsh Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

### Steps for Data Cleaning in MySQL

#### 1. **Understand the Data**

- **Inspect the Data**: Use queries like `SELECT * FROM table LIMIT 10;` to review the dataset.

- **Understand Schema**: Analyze the table structure using `DESCRIBE table;` or `SHOW CREATE

TABLE table;`.

- **Define Cleaning Objectives**: Identify what needs cleaning (e.g., duplicates, missing values, or

inconsistent formats).

#### 2. **Handle Missing Values**

- **Identify Missing Data**: Query for NULL or empty values using `SELECT * FROM table WHERE

column IS NULL;`.

- **Impute or Remove**:

- Fill missing values using `UPDATE` (e.g., `UPDATE table SET column = 'default_value' WHERE

column IS NULL;`).

- Remove rows with missing values using `DELETE` (e.g., `DELETE FROM table WHERE column

IS NULL;`).

#### 3. **Remove Duplicates**

- Identify duplicates using:

```sql

SELECT column1, column2, COUNT(*)

FROM table

GROUP BY column1, column2

HAVING COUNT(*) > 1;

```

- Delete duplicates while retaining one instance:


```sql

DELETE t1

FROM table t1

INNER JOIN table t2

ON t1.id > t2.id AND t1.column = t2.column;

```

#### 4. **Standardize Data**

- **Case Formatting**: Ensure uniform case using functions like `LOWER()`, `UPPER()`, or

`INITCAP()`.

- **Trim Extra Spaces**:

```sql

UPDATE table SET column = TRIM(column);

```

- **Normalize Values**: Replace inconsistent entries with standard ones using `UPDATE`.

#### 5. **Correct Data Types**

- Identify mismatched data types:

```sql

SELECT column, DATA_TYPE

FROM INFORMATION_SCHEMA.COLUMNS

WHERE TABLE_NAME = 'table';

```

- Alter columns to correct data types if necessary:

```sql

ALTER TABLE table MODIFY column datatype;

```
#### 6. **Validate Data Integrity**

- Use constraints like `NOT NULL`, `UNIQUE`, or `FOREIGN KEY` to enforce data integrity.

- Write queries to check for violations of these constraints.

#### 7. **Handle Outliers**

- Use aggregate functions (`AVG()`, `STDDEV()`) to detect unusual values.

- Filter or adjust outliers:

```sql

DELETE FROM table WHERE column > threshold;

```

#### 8. **Document the Process**

- Record the transformations you applied.

- Maintain a backup of the original dataset for reference.

---

### Practice Exercises

- Start with simple datasets and practice cleaning common issues.

- Explore online datasets or your own to apply these techniques.

- Build queries incrementally, testing each step.

You might also like