EDA_SQL_Document
EDA_SQL_Document
- Data Overview:
Use SQL queries like SELECT TOP (5) * FROM table_name; or SELECT COLUMN_NAME,
to quickly understand the structure of your dataset. Identify whether the data types are appropriate
2. Data Cleaning
COUNT(*) > 1;
- Outlier Detection:
STDEV(column_name));
- Data Type Corrections:
Use ALTER TABLE statements to ensure columns have the correct data type, e.g.:
3. Descriptive Statistics
- Summary Statistics:
table_name;
4. Data Relationships
- Correlation Analysis:
- Scatter Plots:
SQL cannot directly create plots, but you can retrieve the necessary data for visualization.
5. Data Visualization:
SQL doesn't produce charts directly. Export results for visualization using external tools.
- Encoding:
Use CASE statements to manually assign numerical values to categories.
- Frequency Analysis:
7. Feature Engineering:
8. Outlier Treatment:
9. Dimensionality Reduction: