Generative AI For Models Development
Generative AI For Models Development
Generative AI For Models Development
In this lab, we will use generative AI to create Python scripts to develop and evaluate different predictive models for a given data set.
Learning objectives
In this lab, you will learn how to use generative AI to create Python codes that can:
The data set for this lab is available in the following URL.
1. 1
1. URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing
Copied!
The dataset is a filtered and modified version of the Laptop Price Prediction using specifications dataset, available under the Database Contents License (DbCL) v1.0
on the Kaggle website.
Complete the setup in the Jupyter Notebook and then proceed further.
Copied!
Running this prompt using the GPT 3.5 engine in the GenAI classroom should result in a response that contains the code, as shown below.
1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
11. 11
12. 12
13. 13
14. 14
15. 15
16. 16
17. 17
1. import pandas as pd
2.
3. # Define the file path
4. file_path = "path/to/your/csv/file.csv"
5.
6. # Read the CSV file into a pandas data frame
7. data_frame = pd.read_csv(file_path)
8.
9. # Assume the first row of the file can be used as the headers for the data
10. # If the file doesn't have headers, you can remove the 'header' parameter
11. data_frame = pd.read_csv(file_path, header=0)
12.
13. # Additional details:
14. # - The 'pd.read_csv()' function is used to read a CSV file into a pandas data frame.
about:blank 1/8
23/10/2024, 13:59 about:blank
15. # - The 'header' parameter in the 'pd.read_csv()' function specifies which row to use as the headers.
16. # By default, it is set to 'infer', which means pandas will try to infer the headers from the file.
17. # If the headers are in the first row, you can set the 'header' parameter to 0.
Copied!
You may copy the code from the response and try it on the Jupyter Notebook with appropriate modifications in the file path.
Copied!
1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
11. 11
12. 12
13. 13
14. 14
15. 15
16. 16
17. 17
18. 18
19. 19
20. 20
21. 21
22. 22
23. 23
24. 24
25. 25
26. 26
27. 27
28. 28
29. 29
30. 30
31. 31
32. 32
33. 33
34. 34
35. 35
1. import pandas as pd
2. from sklearn.linear_model import LinearRegression
3. from sklearn.metrics import mean_squared_error, r2_score
4.
5. # Assume you have a pandas data frame called 'data_frame' with two columns: 'source_variable' and 'target_variable'
6.
7. # Extract the source variable and target variable from the data frame
8. X = data_frame[['source_variable']]
9. y = data_frame['target_variable']
10.
11. # Initialize a linear regression model
12. model = LinearRegression()
13.
14. # Train the model using the source and target variables
15. model.fit(X, y)
16.
17. # Make predictions using the trained model
18. y_pred = model.predict(X)
19.
20. # Calculate the mean squared error (MSE)
21. mse = mean_squared_error(y, y_pred)
22.
23. # Calculate the coefficient of determination (R^2)
24. r2 = r2_score(y, y_pred)
25.
26. # Display the MSE and R^2 values
27. print("Mean Squared Error (MSE):", mse)
28. print("Coefficient of Determination (R^2):", r2)
29.
30. # Additional details:
31. # - The 'LinearRegression' class from the 'sklearn.linear_model' module is used to create a linear regression model.
32. # - The 'fit()' method is used to train the model using the source and target variables.
33. # - The 'predict()' method is used to make predictions using the trained model.
34. # - The 'mean_squared_error()' function from the 'sklearn.metrics' module is used to calculate the MSE.
35. # - The 'r2_score()' function from the 'sklearn.metrics' module is used to calculate the R^2 value.
Copied!
about:blank 2/8
23/10/2024, 13:59 about:blank
You can use this code to develop a linear regression model with the target variable as Price and the source variable as CPU_frequency. Try this out in the Test
environment.
1. 1
2. 2
3. 3
Copied!
1. import pandas as pd
2. from sklearn.linear_model import LinearRegression
3. from sklearn.metrics import mean_squared_error, r2_score
4.
5. # Assume you have a pandas data frame called 'data_frame' with multiple columns: 'source_variable_1', 'source_variable_2', ..., 'tar
6.
7. # Extract the source variables and target variable from the data frame
8. X = data_frame[['source_variable_1', 'source_variable_2', ...]]
9. y = data_frame['target_variable']
10.
11. # Initialize a linear regression model
12. model = LinearRegression()
13.
14. # Train the model using the source and target variables
15. model.fit(X, y)
16.
17. # Make predictions using the trained model
18. y_pred = model.predict(X)
19.
20. # Calculate the mean squared error (MSE)
21. mse = mean_squared_error(y, y_pred)
22.
23. # Calculate the coefficient of determination (R^2)
24. r2 = r2_score(y, y_pred)
25.
26. # Display the MSE and R^2 values
27. print("Mean Squared Error (MSE):", mse)
28. print("Coefficient of Determination (R^2):", r2)
29.
30. # Additional details:
31. # - The 'LinearRegression' class from the 'sklearn.linear_model' module is used to create a linear regression model.
32. # - The 'fit()' method is used to train the model using the source and target variables.
33. # - The 'predict()' method is used to make predictions using the trained model.
34. # - The 'mean_squared_error()' function from the 'sklearn.metrics' module is used to calculate the MSE.
35. # - The 'r2_score()' function from the 'sklearn.metrics' module is used to calculate the R^2 value.
Copied!
You can use the generated code and build a linear regression model for the data set that uses CPU_frequency, RAM_GB, Storage_GB_SSD, CPU_core, OS, GPU and
Category and source variables and Price as the target variable. You can check the code generated in the webpage browser.
You may compare the performance of the two models by comparing their MSE and R^2 values.
about:blank 3/8
23/10/2024, 13:59 about:blank
Polynomial regression
You can explore creating a model that uses higher-order features from original attributes. Higher orders of the same feature allow better generalization on the target
variable. It is called polynomial regression, and you can use Generative AI to create a code for this.
Assume you are given a single attribute as the source variable and one as a target variable. You must create a model using polynomial regression for a given order.
You can also make the model for different order values and compare their performance based on MSE and R^2 scores.
Copied!
1. import pandas as pd
2. import numpy as np
3. from sklearn.linear_model import LinearRegression
4. from sklearn.preprocessing import PolynomialFeatures
5. from sklearn.metrics import mean_squared_error, r2_score
6.
7. # Assume you have a pandas data frame called 'data_frame' with two columns: 'source_variable' and 'target_variable'
8.
9. # Extract the source variable and target variable from the data frame
10. X = data_frame[['source_variable']]
11. y = data_frame['target_variable']
12.
13. # Initialize lists to store the MSE and R^2 values for each model
14. mse_values = []
15. r2_values = []
16.
about:blank 4/8
23/10/2024, 13:59 about:blank
17. # Loop through the polynomial orders
18. for order in [2, 3, 5]:
19. # Create polynomial features
20. polynomial_features = PolynomialFeatures(degree=order)
21. X_poly = polynomial_features.fit_transform(X)
22.
23. # Initialize a linear regression model
24. model = LinearRegression()
25.
26. # Train the model using the polynomial features and target variable
27. model.fit(X_poly, y)
28.
29. # Make predictions using the trained model
30. y_pred = model.predict(X_poly)
31.
32. # Calculate the mean squared error (MSE)
33. mse = mean_squared_error(y, y_pred)
34.
35. # Calculate the coefficient of determination (R^2)
36. r2 = r2_score(y, y_pred)
37.
38. # Append the MSE and R^2 values to the lists
39. mse_values.append(mse)
40. r2_values.append(r2)
41.
42. # Display the MSE and R^2 values for the current model
43. print(f"Polynomial Order {order}:")
44. print("Mean Squared Error (MSE):", mse)
45. print("Coefficient of Determination (R^2):", r2)
46. print()
47.
48. # Compare the performance of the models
49. best_order = np.argmin(mse_values)
50. worst_order = np.argmax(r2_values)
51.
52. print("Model Comparison:")
53. print(f"Best Polynomial Order: {best_order + 2}")
54. print(f"Worst Polynomial Order: {worst_order + 2}")
55.
56. # Additional details:
57. # - The 'PolynomialFeatures' class from the 'sklearn.preprocessing' module is used to create polynomial features.
58. # - The 'fit_transform()' method is used to transform the source variable into polynomial features.
59. # - The 'degree' parameter in the 'PolynomialFeatures' class specifies the maximum degree of the polynomial features.
60. # - The 'argmin()' and 'argmax()' functions from the 'numpy' module are used to find the index of the minimum and maximum values in
Copied!
You can use the relevant part of the code in your script.
You can see that the model can generate sophisticated code using functions to create and train models with different orders and evaluate their performance for each of
them.
Try to run the generated code on the testing interface with the source variable as CPU frequency and the target variable as Price.
Creating a Pipeline
Pipelines are processes containing a sequence of steps that lead to creating a trained model.
You will now use the Generative AI model to create a pipeline for performing feature scaling, creating polynomial features for multiple attributes, and performing
linear regression using these variables.
Copied!
1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
11. 11
12. 12
13. 13
14. 14
15. 15
16. 16
17. 17
18. 18
19. 19
20. 20
21. 21
22. 22
23. 23
24. 24
about:blank 5/8
23/10/2024, 13:59 about:blank
25. 25
26. 26
27. 27
28. 28
29. 29
30. 30
31. 31
32. 32
33. 33
34. 34
35. 35
36. 36
37. 37
38. 38
39. 39
40. 40
41. 41
42. 42
1. import pandas as pd
2. import numpy as np
3. from sklearn.pipeline import make_pipeline
4. from sklearn.preprocessing import StandardScaler, PolynomialFeatures
5. from sklearn.linear_model import LinearRegression
6. from sklearn.metrics import mean_squared_error, r2_score
7.
8. # Assume you have a pandas data frame called 'data_frame' with multiple columns: 'source_variable_1', 'source_variable_2', ..., 'tar
9.
10. # Extract the source variables and target variable from the data frame
11. X = data_frame[['source_variable_1', 'source_variable_2', ...]]
12. y = data_frame['target_variable']
13.
14. # Create a pipeline that performs parameter scaling, polynomial feature generation, and linear regression
15. pipeline = make_pipeline(
16. StandardScaler(),
17. PolynomialFeatures(degree=2),
18. LinearRegression()
19. )
20.
21. # Train the model using the source and target variables
22. pipeline.fit(X, y)
23.
24. # Make predictions using the trained model
25. y_pred = pipeline.predict(X)
26.
27. # Calculate the mean squared error (MSE)
28. mse = mean_squared_error(y, y_pred)
29.
30. # Calculate the coefficient of determination (R^2)
31. r2 = r2_score(y, y_pred)
32.
33. # Display the MSE and R^2 values
34. print("Mean Squared Error (MSE):", mse)
35. print("Coefficient of Determination (R^2):", r2)
36.
37. # Additional details:
38. # - The 'make_pipeline()' function from the 'sklearn.pipeline' module is used to create a pipeline.
39. # - The 'StandardScaler' class from the 'sklearn.preprocessing' module is used to perform parameter scaling.
40. # - The 'PolynomialFeatures' class from the 'sklearn.preprocessing' module is used to create polynomial features.
41. # - The 'LinearRegression' class from the 'sklearn.linear_model' module is used for linear regression.
42. # - The pipeline automatically applies the transformations in the specified order.
Copied!
Make appropriate changes to the attributes you will use in the code. Consider the same set of attributes as used for multiple feature linear regression, which is
CPU_frequency, RAM_GB, Storage_GB_SSD, CPU_core, OS, GPU and Category as source variables, and Price as the target variable.
You can use generative AI to create the Python code to perform a grid search for the optimum ridge regression model, which uses polynomial features generated from
multiple parameters.
Copied!
about:blank 6/8
23/10/2024, 13:59 about:blank
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
11. 11
12. 12
13. 13
14. 14
15. 15
16. 16
17. 17
18. 18
19. 19
20. 20
21. 21
22. 22
23. 23
24. 24
25. 25
26. 26
27. 27
28. 28
29. 29
30. 30
31. 31
32. 32
33. 33
34. 34
35. 35
36. 36
37. 37
38. 38
39. 39
40. 40
41. 41
42. 42
43. 43
44. 44
45. 45
46. 46
47. 47
48. 48
49. 49
50. 50
51. 51
52. 52
1. import pandas as pd
2. import numpy as np
3. from sklearn.model_selection import GridSearchCV, cross_val_score
4. from sklearn.preprocessing import PolynomialFeatures
5. from sklearn.linear_model import Ridge
6. from sklearn.metrics import mean_squared_error, r2_score
7.
8. # Assume you have a pandas data frame called 'data_frame' with multiple columns: 'source_variable_1', 'source_variable_2', ..., 'tar
9.
10. # Extract the source variables and target variable from the data frame
11. X = data_frame[['source_variable_1', 'source_variable_2', ...]]
12. y = data_frame['target_variable']
13.
14. # Create polynomial features
15. polynomial_features = PolynomialFeatures()
16.
17. # Transform the source variables into polynomial features
18. X_poly = polynomial_features.fit_transform(X)
19.
20. # Define the hyperparameter values for the grid search
21. param_grid = {'alpha': [0.1, 1.0, 10.0], 'degree': [2, 3, 4]}
22.
23. # Initialize a ridge regression model
24. model = Ridge()
25.
26. # Perform grid search with cross-validation
27. grid_search = GridSearchCV(model, param_grid, cv=5)
28.
29. # Train the model using the polynomial features and target variable
30. grid_search.fit(X_poly, y)
31.
32. # Make predictions using the trained model
33. y_pred = grid_search.predict(X_poly)
34.
35. # Calculate the mean squared error (MSE)
36. mse = mean_squared_error(y, y_pred)
37.
38. # Calculate the coefficient of determination (R^2)
39. r2 = r2_score(y, y_pred)
40.
41. # Display the MSE and R^2 values
42. print("Mean Squared Error (MSE):", mse)
43. print("Coefficient of Determination (R^2):", r2)
44.
45. # Additional details:
46. # - The 'PolynomialFeatures' class from the 'sklearn.preprocessing' module is used to create polynomial features.
47. # - The 'GridSearchCV' class from the 'sklearn.model_selection' module is used to perform grid search with cross-validation.
48. # - The 'Ridge' class from the 'sklearn.linear_model' module is used for ridge regression.
49. # - The 'fit_transform()' method is used to transform the source variables into polynomial features.
50. # - The 'param_grid' parameter in the 'GridSearchCV' class specifies the hyperparameter values to search over.
51. # - The 'cv' parameter in the 'GridSearchCV' class specifies the number of folds for cross-validation.
52. # - The best model found by grid search can be accessed using the 'best_estimator_' attribute of the grid search object.
Copied!
about:blank 7/8
23/10/2024, 13:59 about:blank
You can test this code for the data set on the testing environment.
You make use of the following parametric values for this purpose.
Source Variables: CPU_frequency, RAM_GB, Storage_GB_SSD, CPU_core, OS, GPU and Category
Target Variable: Price
Set of values for alpha: 0.0001,0.001,0.01, 0.1, 1, 10
Cross Validation: 4-fold
Polynomial Feature order: 2
Conclusion
Congratulations! You have completed the lab on Data preparation.
With this, you have learned how to use generative AI to create Python codes that can:
Author(s)
Abhishek Gagneja
about:blank 8/8