0% found this document useful (0 votes)
57 views

Data-Science Project Life Cycle

Uploaded by

Ashish Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Data-Science Project Life Cycle

Uploaded by

Ashish Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Analytics Maturity in Organizations

Analytics Maturity in Organizations


• Python
• Pandas
Data Science Project Life Cycle • Numpy
Train/Test Split
• Sci-Kit Learn
Model Fitting • Python
Data Analysis Sci-Kit Learn • Pandas • Python • Python
• Python Model Prediction • Python • Pandas • Pandas
• Python • Numpy
• Pandas Sci-Kit Learn • Pandas • Numpy • Numpy
• Pandas Model Metrics
• Python • Numpy Model Visualization • Numpy • Sci-Kit Learn • Sci-Kit Learn
• Numpy • Python • Sci-Kit Learn
• Kafka • Matplotlib
• SQL • Pandas Model Visualization
• NiFi Data Visualization • Seaborn Model Visualization Model Visualization Model Visualization
• Numpy • Matplotlib
• Spark • Matplotlib • Matplotlib • Matplotlib • Matplotlib
• Spark • Seaborn
• Seaborn • Seaborn • Seaborn • Seaborn

3. 5.
9.
1. 2. 4. Model 6. 7. 8.
Data Explainable
Data Data Wrangling( Data Selection & Model Model Model
Discovery Ingestion Machine
Pre-Processing/ Exploration Model Evaluation Comparison Boosting
Learning
Preparation) Building

Model Selection/Fitting - Fit model on Training Data Visualization


Univariate Analysis • Derive Correlation / Derive Independent & Dependent variable • Actual Vs Predicted
• Define Problem • Assess Data Sources Data Cleansing
• .info() • Prepare Train and Test Data • Write up on
statement • Batch Extract thru • Remove Missing
• .describe() • Decide on Supervised/UnSupervised learning corrective and
Requirements ETL (Extraction Values
• Barplots,histogram, • Decide on Classification/Regression for supervised preventive actions
• Assess Data Sources Transformation • Remove Outliers • Dashboards /Stories
Load) • Data Imputation • Count plots • Decide on Clustering or dimension reduction(pca) for unsupervised
• Identify Key on current
Business fields for • Real Time Extract Data Manipulation Bi Variate Analysis • Fit the data for Multiple model techniques
• Scatter Plot • If classification – Logistic Reg/Random Forest/Decision Tree/KNN/NB/SVM etc performance
Sampling thru Kafka/Ni Fi • Rename Columns • Evidence on data
Spark • Data Summarization • .Corr • If regression – Linear Regression/Random Forest/Decision Tree/KNN..etc
• Correlation Plot Model Prediction - Predict on Test Data anomalies and way
• Data Fitering to correct .
Sorting/Grouping • Regression Plot Model Evaluation - Measures performance of Model
• Merge/Join/Concat Model Comparison - Compares performance of different model techniques
20-04-2019 Model Boosting - Boost the performance of chosen model
By Anand Sivaraman Subramaniam Move to Prod – Move the model to production as a .pkl file

You might also like