Project 2
Project 2
This project involves building a Python-based data analysis tool that leverages
Large Language Models (LLMs) to analyze datasets, generate visualizations, and
create a narrative report. The primary goal is to make the process adaptable to any
structured CSV dataset without assumptions about its structure or contents.
Key Objectives:
Generic Analysis: Perform universal data explorations like summary statistics,
correlation matrices, outlier detection, clustering, and anomaly identification.
Dynamic LLM Integration: Use an LLM (GPT-4o-Mini) to:
Interpret dataset features and types.
Generate Python code for specific analyses or summaries.
Suggest further function calls or strategies for deeper insights.
Data Visualization: Create visual representations of analysis results using
libraries like Seaborn or Matplotlib. Save these visualizations as PNG files.
Story Narration: Use the LLM to craft a Markdown report (README.md) summarizing the
analysis, visualizations, and insights in a narrative format.
Prerequisites for This Project:
Python Basics:
Access and experience with LLMs like GPT-4o-Mini via the AI Proxy.
Setting and using environment variables like AIPROXY_TOKEN.
Command-line Interfaces:
Skills in writing and structuring Markdown documents for clear and visually
appealing reports.
Example Workflow:
Input:
A CSV dataset provided as a command-line argument.
Analysis Process:
markdown
Copy code
/goodreads/
- README.md
- chart1.png
/happiness/
- README.md
- chart2.png
/media/
- README.md
- chart3.png
autolysis.py
This project is a blend of data science, storytelling, and programming and prepares
you for real-world scenarios where automated tools generate insights and
communicate them effectively.