0% found this document useful (0 votes)
5 views4 pages

dbms explaination

The document outlines the setup for a PostgreSQL project structure focused on HR analytics, including database configuration, initialization, and operations. It details the necessary files, environment setup, and steps for creating a PostgreSQL database, as well as how to execute queries and visualize data using Jupyter notebooks. Key points emphasize the importance of version control for schema and queries, and the use of VS Code for direct query execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

dbms explaination

The document outlines the setup for a PostgreSQL project structure focused on HR analytics, including database configuration, initialization, and operations. It details the necessary files, environment setup, and steps for creating a PostgreSQL database, as well as how to execute queries and visualize data using Jupyter notebooks. Key points emphasize the importance of version control for schema and queries, and the use of VS Code for direct query execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

Ah, let's modify the approach for PostgreSQL specifically.

Here's how to set it up:

1. Project Structure:
```
hr_analytics/
├── src/
│ ├── database/
│ │ ├── config.py # Database configuration
│ │ ├── init_db.py # Create tables and import data
│ │ └── db_operations.py # CRUD operations
├── sql/
│ ├── schema/
│ │ └── create_tables.sql # Table definitions
│ └── queries/
│ ├── q1_turnover.sql
│ └── ...
├── notebooks/
│ └── analysis.ipynb
├── .env # Database credentials
└── requirements.txt
```

2. Database Configuration (src/database/config.py):


```python
from dotenv import load_dotenv
import os

load_dotenv()

DB_CONFIG = {
'host': 'localhost',
'database': 'hr_analytics',
'user': os.getenv('DB_USER'),
'password': os.getenv('DB_PASSWORD'),
'port': 5432
}
```

3. Initialize Database (src/database/init_db.py):


```python
import pandas as pd
from sqlalchemy import create_engine
from .config import DB_CONFIG

def get_connection_string():
return f"postgresql://{DB_CONFIG['user']}:
{DB_CONFIG['password']}@{DB_CONFIG['host']}:{DB_CONFIG['port']}/
{DB_CONFIG['database']}"

def create_database():
"""
Create database and import CSV data
"""
engine = create_engine(get_connection_string())

# Read CSV
df = pd.read_csv('data/HRDataset_v14.csv')

# Create table
df.to_sql('employees', engine, if_exists='replace', index=False)

print("Database created successfully!")

if __name__ == "__main__":
create_database()
```

4. Setup Steps:

First, create the PostgreSQL database:


```sql
-- Run in psql or pgAdmin
CREATE DATABASE hr_analytics;
```

Then set up your environment:


```bash
# .env file
DB_USER=your_username
DB_PASSWORD=your_password
```

Install requirements:
```bash
# requirements.txt
pandas
psycopg2-binary
sqlalchemy
python-dotenv
jupyter
matplotlib
seaborn
```

5. VS Code Setup:
- Install PostgreSQL extension
- Connect to database:
```json
// VS Code PostgreSQL connection
{
"name": "HR Analytics",
"server": "localhost",
"port": 5432,
"database": "hr_analytics",
"username": "your_username"
}
```

6. Database Operations (src/database/db_operations.py):


```python
import psycopg2
import pandas as pd
from sqlalchemy import create_engine
from .config import DB_CONFIG

class DBOperations:
def __init__(self):
self.conn_string = f"postgresql://{DB_CONFIG['user']}:
{DB_CONFIG['password']}@{DB_CONFIG['host']}:{DB_CONFIG['port']}/
{DB_CONFIG['database']}"
self.engine = create_engine(self.conn_string)

def execute_query(self, query):


"""Execute SELECT queries"""
return pd.read_sql_query(query, self.engine)

def execute_update(self, query):


"""Execute INSERT/UPDATE/DELETE queries"""
with psycopg2.connect(**DB_CONFIG) as conn:
with conn.cursor() as cur:
cur.execute(query)
conn.commit()
```

7. Using in VS Code:
```sql
-- sql/queries/q1_turnover.sql
SELECT department,
COUNT(*) as employee_count,
ROUND(AVG(salary)::numeric, 2) as avg_salary
FROM employees
GROUP BY department;

-- Run this directly in VS Code with PostgreSQL extension


```

8. Jupyter Notebook Integration (notebooks/analysis.ipynb):


```python
import sys
sys.path.append('../src')

from database.db_operations import DBOperations


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Initialize database connection


db = DBOperations()

# Execute query
with open('../sql/queries/q1_turnover.sql', 'r') as file:
query = file.read()

results = db.execute_query(query)

# Create visualization
plt.figure(figsize=(12,6))
sns.barplot(data=results, x='department', y='avg_salary')
plt.xticks(rotation=45)
plt.title('Average Salary by Department')
plt.show()
```

9. Multi-device Workflow:

On each device:
```bash
# First time setup
git clone <your-repo>
createdb hr_analytics # Create PostgreSQL database
python src/database/init_db.py # Import data

# Daily workflow
git pull # Get latest queries
# Work on queries/analysis
git add sql/queries/*.sql notebooks/*.ipynb
git commit -m "Updated analysis"
git push
```

10. Example CRUD Operations:


```sql
-- Create
INSERT INTO employees (employee_name, department, salary)
VALUES ('John Doe', 'IT', 75000);

-- Read
SELECT * FROM employees WHERE department = 'IT';

-- Update
UPDATE employees
SET salary = 80000
WHERE employee_name = 'John Doe';

-- Delete
DELETE FROM employees
WHERE employee_name = 'John Doe';
```

Key Points for PostgreSQL:


1. Each device needs PostgreSQL installed
2. Database credentials are stored in .env (not in Git)
3. Schema and queries are version controlled
4. Data can be recreated from CSV
5. VS Code PostgreSQL extension provides direct query execution
6. Jupyter notebooks can access same database

Would you like me to provide more details about any specific part or show how to
handle any particular analysis task?

You might also like