0% found this document useful (0 votes)

5 views

etl_and_stats_code

The document outlines an ETL (Extract, Transform, Load) process using Python with pandas and numpy. It includes data extraction from a CSV-like structure, transformation by adding a bonus and normalizing age, and loading the transformed data into a CSV file. Additionally, it performs statistical analysis on salary data and implements a simple linear regression model to predict salary based on age.

Uploaded by

Rahul Waldia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

etl_and_stats_code

Uploaded by

Rahul Waldia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

import pandas as pd

import numpy as np
from scipy.stats import norm

# Step 1: Extract
def extract_data():
# Example data as a CSV
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 40, 29],
'Salary': [50000, 60000, 70000, 80000, 55000]
}
df = pd.DataFrame(data)
print("Data Extracted:")
print(df)
return df

# Step 2: Transform
def transform_data(df):
# Adding a column for Bonus (10% of Salary)
df['Bonus'] = df['Salary'] * 0.1

# Normalizing Age column (min-max scaling)

df['Age_Normalized'] = (df['Age'] - df['Age'].min()) / (df['Age'].max() -
df['Age'].min())

print("\nData Transformed:")
print(df)
return df

# Step 3: Load
def load_data(df):
# Save transformed data to a CSV file
output_file = "transformed_data.csv"
df.to_csv(output_file, index=False)
print(f"\nData Loaded to {output_file}")

# Statistical Functions
def statistical_functions(df):
# Mean and Median
mean_salary = np.mean(df['Salary'])
median_salary = np.median(df['Salary'])

# Normal Distribution Example

mu, sigma = mean_salary, np.std(df['Salary'])
normal_dist = norm.pdf(df['Salary'], mu, sigma)
df['Normal_Distribution'] = normal_dist

print("\nStatistical Analysis:")
print(f"Mean Salary: {mean_salary}")
print(f"Median Salary: {median_salary}")
print("\nNormal Distribution (Probability Density Function):")
print(df[['Salary', 'Normal_Distribution']])

# Modeling (Linear Regression Example)

def simple_model(df):
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Independent variable: Age, Dependent variable: Salary

X = df[['Age']]
y = df['Salary']

# Splitting data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Model Training
model = LinearRegression()
model.fit(X_train, y_train)

# Prediction and Evaluation

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print("\nSimple Linear Regression Model:")

print(f"Coefficient: {model.coef_[0]}")
print(f"Intercept: {model.intercept_}")
print(f"Mean Squared Error: {mse}")

# Main Function to Execute the Steps

def main():
# ETL Process
df = extract_data()
df = transform_data(df)
load_data(df)

# Statistical Analysis
statistical_functions(df)

# Simple Modeling
simple_model(df)

# Run the main function

if __name__ == "__main__":
main()

python 1
No ratings yet
python 1
3 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
data analytics lab manual
No ratings yet
data analytics lab manual
26 pages
Task1
No ratings yet
Task1
5 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
ML Complete Notes Hridoy.docx
No ratings yet
ML Complete Notes Hridoy.docx
5 pages
2022UCD2164-1-2
No ratings yet
2022UCD2164-1-2
35 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Linear Regression 1
No ratings yet
Linear Regression 1
2 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Linear regression - Colab
No ratings yet
Linear regression - Colab
1 page
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Python Module 5
No ratings yet
Python Module 5
19 pages
Aayushi ML File
No ratings yet
Aayushi ML File
37 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Data Analysis in Python-3
No ratings yet
Data Analysis in Python-3
4 pages
MACHINE LEARNING manual
No ratings yet
MACHINE LEARNING manual
36 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Kartik mlp 4-9prg (1)
No ratings yet
Kartik mlp 4-9prg (1)
10 pages
PythonFile[1]
No ratings yet
PythonFile[1]
5 pages
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
No ratings yet
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
68 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Experiment No.8
No ratings yet
Experiment No.8
5 pages
aiml_
No ratings yet
aiml_
27 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Data Science Record_05
No ratings yet
Data Science Record_05
20 pages
Kunj Project 1
No ratings yet
Kunj Project 1
34 pages
Data Preprocessing
No ratings yet
Data Preprocessing
18 pages
EmployeeMgmt XII IP ProjectReprot 2022 23
No ratings yet
EmployeeMgmt XII IP ProjectReprot 2022 23
16 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
Aastha IP Employee Project
No ratings yet
Aastha IP Employee Project
32 pages
Lab 11,12 - Copy
No ratings yet
Lab 11,12 - Copy
7 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
DSBDA3 - Jupyter Notebook
No ratings yet
DSBDA3 - Jupyter Notebook
12 pages
employee management-Ghanim,Rudra
No ratings yet
employee management-Ghanim,Rudra
25 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Kunj Project 1
No ratings yet
Kunj Project 1
34 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
ml_6_7_8 (1)
No ratings yet
ml_6_7_8 (1)
10 pages
Employee Info
No ratings yet
Employee Info
2 pages
Lab 1
No ratings yet
Lab 1
3 pages
Data Science
No ratings yet
Data Science
18 pages
Kunj 3
No ratings yet
Kunj 3
34 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
vertopal.com_Final007
No ratings yet
vertopal.com_Final007
35 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
Employee Management System
No ratings yet
Employee Management System
33 pages
Viksit Ip Project File
No ratings yet
Viksit Ip Project File
33 pages
Parth IP Employee Management Project (1)
No ratings yet
Parth IP Employee Management Project (1)
32 pages
ML File
No ratings yet
ML File
37 pages
Ip Project File
No ratings yet
Ip Project File
46 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Practical_1
No ratings yet
Practical_1
5 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2)_removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2)_removed
24 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
2023 Past Year Question Paper
No ratings yet
2023 Past Year Question Paper
6 pages
Lecture - Hoi Qui Don - DT - New - 8.5
No ratings yet
Lecture - Hoi Qui Don - DT - New - 8.5
10 pages
Logistic Regression: Interaction Terms
No ratings yet
Logistic Regression: Interaction Terms
23 pages
21 K-Nearest Neighbors Regression
No ratings yet
21 K-Nearest Neighbors Regression
8 pages
Non-Stationary time series models.
No ratings yet
Non-Stationary time series models.
13 pages
Pengaruh Terpaan Tayangan Youtube "Londokampung" Terhadap Tingkat Pengetahuan Bahasa Suroboyoan Pada Subscribers Di Surabaya
No ratings yet
Pengaruh Terpaan Tayangan Youtube "Londokampung" Terhadap Tingkat Pengetahuan Bahasa Suroboyoan Pada Subscribers Di Surabaya
9 pages
Naive - Bayes - Ipynb - Colab
No ratings yet
Naive - Bayes - Ipynb - Colab
3 pages
5.3Time Series
No ratings yet
5.3Time Series
6 pages
Buy ebook Exploratory and Confirmatory Factor Analysis Understanding Concepts and Applications 1st Edition Bruce Thompson cheap price
100% (1)
Buy ebook Exploratory and Confirmatory Factor Analysis Understanding Concepts and Applications 1st Edition Bruce Thompson cheap price
67 pages
Instant Download (Ebook) Generalized linear models and extensions by Hardin, James William; Hilbe, Joseph M ISBN 9781597182256, 9781597182263, 9781597182270, 1597182257, 1597182265, 1597182273 PDF All Chapters
100% (6)
Instant Download (Ebook) Generalized linear models and extensions by Hardin, James William; Hilbe, Joseph M ISBN 9781597182256, 9781597182263, 9781597182270, 1597182257, 1597182265, 1597182273 PDF All Chapters
55 pages
CA4229 Week 3 Land Use Planning - Applied Val
No ratings yet
CA4229 Week 3 Land Use Planning - Applied Val
69 pages
2022 CFA Program Curriculum Level II Box Set vol 1 6 1st Edition Cfa Institute - The latest updated ebook version is ready for download
100% (2)
2022 CFA Program Curriculum Level II Box Set vol 1 6 1st Edition Cfa Institute - The latest updated ebook version is ready for download
74 pages
Unit 2
No ratings yet
Unit 2
76 pages
Copy of for Reliance Company Provide Its Last 10 Years Data of Its Sales Revenue, PAT, EBIT, Stock Returns, Corelation and Regression.
No ratings yet
Copy of for Reliance Company Provide Its Last 10 Years Data of Its Sales Revenue, PAT, EBIT, Stock Returns, Corelation and Regression.
4 pages
SouvenirSales Multiplicative
No ratings yet
SouvenirSales Multiplicative
57 pages
Nptel Week 5
No ratings yet
Nptel Week 5
4 pages
Machine Learning Assignment 2
No ratings yet
Machine Learning Assignment 2
1 page
A Machine Learning Approach For Graduate Admission Prediction
No ratings yet
A Machine Learning Approach For Graduate Admission Prediction
5 pages
TO Machine Learning: Lecture Slides For
No ratings yet
TO Machine Learning: Lecture Slides For
33 pages
Enders4 - APPLIED ECONOMETRIC TIME SERIES
No ratings yet
Enders4 - APPLIED ECONOMETRIC TIME SERIES
42 pages
Anova
No ratings yet
Anova
40 pages
Diabetes Detection Using Machine Learning Classification Methods
No ratings yet
Diabetes Detection Using Machine Learning Classification Methods
5 pages
Statistics - Xii Chapter - 3 Correlation Test MCQ - Marks 30 35 Min
No ratings yet
Statistics - Xii Chapter - 3 Correlation Test MCQ - Marks 30 35 Min
5 pages
Correlation Types and Degree and Karl Pearson Coefficient of Correlation
No ratings yet
Correlation Types and Degree and Karl Pearson Coefficient of Correlation
5 pages
Quiz 1 Practice Solutions: Conceptual Exercises
No ratings yet
Quiz 1 Practice Solutions: Conceptual Exercises
6 pages
SBE11E Chapter 13
No ratings yet
SBE11E Chapter 13
31 pages
CSE 474/574 Introduction To Machine Learning Fall 2011 Assignment 3
No ratings yet
CSE 474/574 Introduction To Machine Learning Fall 2011 Assignment 3
3 pages
Les5eppt09 160218110600
No ratings yet
Les5eppt09 160218110600
84 pages
Decision Tree Version 3
No ratings yet
Decision Tree Version 3
16 pages
Lecture # 2 (The Classical Linear Regression Model) PDF
No ratings yet
Lecture # 2 (The Classical Linear Regression Model) PDF
3 pages

etl_and_stats_code

Uploaded by

etl_and_stats_code

Uploaded by

import pandas as pd

# Normalizing Age column (min-max scaling)

# Normal Distribution Example

# Modeling (Linear Regression Example)

# Independent variable: Age, Dependent variable: Salary

# Prediction and Evaluation

print("\nSimple Linear Regression Model:")

# Main Function to Execute the Steps

# Run the main function

You might also like