0% found this document useful (0 votes)
18 views5 pages

Context: Description

Java Exercise for labs.

Uploaded by

Danish Ali Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

Context: Description

Java Exercise for labs.

Uploaded by

Danish Ali Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Description

Context
Renewable energy sources play an increasingly important role in the global energy mix, as the effort to reduce

the environmental impact of energy production increases.

Out of all the renewable energy alternatives, wind energy is one of the most developed technologies

worldwide. The U.S Department of Energy has put together a guide to achieving operational efficiency using

predictive maintenance practices.

Predictive maintenance uses sensor information and analysis methods to measure and predict degradation and

future component capability. The idea behind predictive maintenance is that failure patterns are predictable and

if component failure can be predicted accurately and the component is replaced before it fails, the costs of

operation and maintenance will be much lower.

The sensors fitted across different machines involved in the process of energy generation collect data related to

various environmental factors (temperature, humidity, wind speed, etc.) and additional features related to

various parts of the wind turbine (gearbox, tower, blades, break, etc.).
Objective
“ReneWind” is a company working on improving the machinery/processes involved in the production of wind

energy using machine learning and has collected data of generator failure of wind turbines using sensors. They

have shared a ciphered version of the data, as the data collected through sensors is confidential (the type of

data collected varies with companies). Data has 40 predictors, 20000 observations in the training set, and 5000

in the test set.

The objective is to build various classification models, tune them, and find the best one that will help identify

failures so that the generators could be repaired before failing/breaking to reduce the overall maintenance cost.

The nature of predictions made by the classification model will translate as follows:

 True positives (TP) are failures correctly predicted by the model. These will result in repair costs.

 False negatives (FN) are real failures where there is no detection by the model. These will result in

replacement costs.

 False positives (FP) are detections where there is no failure. These will result in inspection costs.

It is given that the cost of repairing a generator is much less than the cost of replacing it, and the cost of the

inspection is less than the cost of repair.

“1” in the target variables should be considered as “failure” and “0” represents “No failure”.

Data Description

 The data provided is a transformed version of original data which was collected using sensors.

 Train.csv - To be used for training and tuning of models.

 Test.csv - To be used only for testing the performance of the final best model.

 Both the datasets consist of 40 predictor variables and 1 target variable


Submission Guidelines

1. There are two ways to work on this project:

i. Full-code way: The full code way is to write the solution code from scratch and only submit a final Jupyter

notebook with all the insights and observations.

Please follow the below steps to complete the assessment. Kindly note that if you submit a presentation,

ONLY the presentation will be evaluated. Please make sure that all the sections mentioned in the rubric have

been covered in your submissions.

i. Full-code version

 Download the full-code version of the learner notebook.

 Follow the instructions provided in the notebook to complete the project.

 Clearly write down insights and recommendations for the business problems in the comments.

 Submit only the solution notebook prepared from the learner notebook [format: .html]

Best Practices for Full-code submissions

 The final notebook should be well-documented, with inline comments explaining the functionality of

code and markdown cells containing comments on the observations and insights.

 The notebook should be run from start to finish in a sequential manner before submission.

 It is important to remove all warnings and errors before submission.

 The notebook should be submitted as an HTML file (.html) and NOT as a notebook file (.ipynb).

 Please refer to the FAQ page for common project-related queries.

Scoring guide (Rubric) - ReneWind


Criteria Points

Exploratory Data Analysis and Insights


- Overview of the data - Univariate analysis 4

Data pre-processing
- Prepare the data for analysis - Missing value Treatment - Ensure no data leakage 4

Model building - Original data


- Build atleast 6 classification models (Using logistic regression, decision trees, random
forest, bagging classifier and boosting methods) - You can choose not to build XGBoost if 6
you are facing issues with installation

Model building - Oversampled data


- Build atleast 6 classification models using oversampled train data (Using logistic
regression, decision trees, random forest, bagging classifier and boosting methods) - You 7
can choose not to build XGBoost if you are facing issues with the installation

Model building - Undersampled data


- Build atleast 6 classification models using undersampled train data (Using logistic
regression, decision trees, random forest, bagging classifier and boosting methods) - You 7
can choose not to build XGBoost if you are facing issues with the installation

Hyperparameter tuning
- Choose at least 3 best performing models among all the models built previously (Mention
the reason for the choices made) - Tune the chosen models. - Check the performance of the 12
tuned models.

Model Performances
- Compare performances of the tuned models and choose a final model. - Check the 5
performance of the final model on test data.

Productionize the model


- Productionize the final model using pipelines. 3

Business Insights & Conclusions


- Business insights and Conclusions. 4

Presentation/Notebook - Overall quality


8
- Structure and flow - Crispness - Visual appeal - Conclusion and Business
Criteria Points

Recommendations OR - Structure and flow - Well commented code - Conclusion and


Business Recommendations

Points 60

You might also like