Automate_ChatGPT_Prompts_for_Data_Science_with_Python_Enhanced_Coding
Automate_ChatGPT_Prompts_for_Data_Science_with_Python_Enhanced_Coding
POL CAMARILLO
About the Authors
Table of Contents
Table of Contents
About the Authors
Table of Contents
Automate ChatGPT Prompts for Data Science with Python
Chapter 1: ChatGPT Prompts for Data Science (Tried, Tested, and Rated)
Automate data science tasks with ChatGPT
How to Read this
Table of Contents
Write python
Explain code
Optimize code
Format code
Translate code from one language to another
Explain concepts
Suggest ideas
Troubleshoot problem
Write SQL
Write other Code
Miscellaneous
Chapter 2: WRITE PYTHON
1. Train Classification Model
2. Automatic Machine Learning
3. Tune Hyperparameter
4. Explore Data
5. Generate Data
6. Write Regex
7. Train Time Series
8. Address Imbalance Data
9. Get Feature Importance
10. Visualize Data with Matplotlib
11. Visualize Image Grid Matplotlib
12. Explain Model with Lime
13. Explain Model with Shap
14. Write Multithreaded Functions
15. Compare Function Speed
16. Create NumPy Array
17. Write Unit Test
18. Validate Column
Chapter 3: EXPLAIN CODE
19. Explain Python
20. Explain SQL
21. Explain Google Sheets Formula
Chapter 4: OPTIMIZE CODE
22. Improve Code Speed
23. Optimize Pandas
24. Optimize Pandas Again
25. Optimize Python
26. Optimize SQL
27. Simplify Python
Chapter 5: FORMAT CODE
28. Write Documentation
29. Improve Readability
30. Format SQL
Chapter 6: TRANSLATE CODE
31. Translate Between DBMS
32. Translate Python to R
33. Translate R to Python
Chapter 7: EXPLAIN CONCEPTS
34. Explain to Five-Year-Old
35. Explain to Undergraduate
36. Explain to Professor
37. Explain to Business Stakeholder
38. Explain Like StackOverflow
Chapter 8: SUGGEST IDEAS
39. Suggest Edge Cases
40. Suggest Dataset
41. Suggest Portfolio Ideas
42. Suggest Resources
43. Suggest Time Complexity
44. Suggest Feature Engineering
45. Suggest A/B Testing Steps
46. Career Coaching
Chapter 9: TROUBLESHOOT PROBLEM
47. Correct Own ChatGPT Code
48. Correct Python Code
49. Correct SQL Code
50. Troubleshoot PowerBI Model
Chapter 10: WRITE SQL
51. Create Running Average
52. Solve Leetcode Question
Chapter 11: WRITE OTHER CODE
53. Write Google Sheets Formula
54. Write R
55. Write Shell
56. Write VBA
Chapter 12: MISCALLANEOUS
57. Format Tables
58. Summarize Book
59. Summarize Paper
60. Provide Emotional Support (My Favourite)
Closing
I rated 60 ChatGPTfunctions for Data Science. Use these prompts and ask
ChatGPT to write, and explain code, optimize data science code. It can also
explain data science concepts, suggest ideas, and troubleshoot problems.
Table of Contents
Write python
Explore Data
Generate Data
Write Regex
Validate Column
Explain code
Explain Python
Explain SQL
Optimize code
Optimize Pandas
Optimize Python
Optimize SQL
Format code
Simplify Python
Write Documentation
Improve Readability
Format SQL
Translate Python to R
Translate R to Python
Explain concepts
Explain to Five-Year-Old
Explain to Undergraduate
Explain to Professor
Suggest ideas
Suggest Dataset
Suggest Resources
Career Coaching
Troubleshoot problem
Write SQL
Write R
Write Shell
Write VBA
Miscellaneous
Format Tables
Summarize Book
Summarize Paper
Prompt: I want you to act as a data scientist and code for me. I have a
dataset of [describe dataset]. Please build a machine learning model that
Next, we will handle missing values and encode categorical variables, if necessary.
We will also need to ensure that the data is in the correct format for modeling. This
may involve tasks such as scaling numerical columns or one'hot encoding
categorical variables.
Now that the data is cleaned and prepared, we can split ft into training and testing
sets and build a machine learning model.
Q Coftycoda
# Build model
from skiearn.ensemble import RandomForestClassifier
model - RandomForestClassifierO
model.fitQLtrain, y_train)
Finally, w can evaluate the model on the testing set.
0 Copy code
# Evaluate model
fror skiearn.metrics import accuracy_score
This is good because it provides enough code to start off a model. It is also
technically sound and correct. However, it’s lacking because it only uses
accuracy for evaluation.
i want you to act as an automatic machine learning (AutoML) hot using TPOT for me. I am
working on a model that predicts whether person will default on a credit card loan In
October Please write python code to find the best classification model with the highest
AUG score on the test set.
Certainly! Here is some code that you can use to find the best class if ication model
using TPOT:
Copy code
Lmpor-t pandas a* pd
from skIe □ rn.madeI_s electIon Import t r a i n_t e st_s p I it
from tpot import TPtJTClossifter
I’m not at all surprised that ChatGPT excels here since the code is pretty
much only boilerplate code. This can be copied and pasted from the
documentation as is.
3. Tune Hyperparameter
Prompt: I want you to act as a data scientist and code for me. I have trained
a [model name]. Please write the code to tune the hyper parameters.
J I want you to act as a data scientist and code for me. I have trained a scikit-learn decision
tree. Ptease write the code to tune the hyper parameters.
To tune the hyperparameters of a decision tree model using scikit-learn, we can use
the GridSearchCV' function from the * s k leo r n. nod el_ sei ection' module.
This function allows ms to specify a range of values for each hyper parameter, and it
will train and evaluate a model for each combination of hyperparameter values.
# Import libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model-selection impart GridSearchCV
# Load data
data = pd.read_csv( do m. csv’)
X = data.drop(1 targe. , 0x15=1)
y = dataftarget’]
it Create a decision tree classifier
model = DecisionTreeClassifierO
Not bad at all, and the code runs! Grid search is the most popular method
for hyperparameter tuning, so it’s no surprise that it came up on top.
However, note that grid search goes through the search space exhaustively.
The larger the search space, the longer time it will take. An alternative is the
randomized search, which implements a randomized search over
parameters. Once assigned a budget, randomized search takes the same
amount of time regardless of the search space.
4. Explore Data
Prompt: I want you to act as a data scientist and code for me. I have a
dataset of [describe dataset]. Please write code for data visualisation and
exploration.
# Summary statistics of the numerical col inns
data.describe()
5. Generate Data
Prompt: I want you to act as a fake data generator. I need a dataset that has
x rows and y columns: [insert column names]
fcl I want you to act as a fake data generator. I need a dataset that 1,000 rows and 3
columns: "customerjd", "product", and "revenue".
Certainly!. Here is some example code that generates a fake dataset with 1,000 rows
and three columns: "cu5tomer_id"h "product", and "revenue":
□ copy code
i'par: randan
import pondos os pd
6. Write Regex
Surel Here Is a regex In python that will match strings that start with an
alphanumeric character, contain the word "data", contain the word “science", and
end with an exclamation point;
Q Copy code
import re
‘ \ *': This matches any number of any characters (except a newline), including
none.
The 'mat ch .string' function will return 'True' if the string matches the regex,
Prompt: I want you to act as a data scientist and code for me. I have a time
series dataset [describe dataset]. Please build a machine learning model that
predict [target variable]. Please use [time range] as train and [time range] as
validation.
I want you to act as a data scientist and code for me. I have a time series dataset of sales
revenue from 2Q19 to 2022. Please build a machine learning model that predict sales
revenue for 12 months in 2023, Please use 2019 to 2021 data as train and 2022 data as
validation set. Please do not write code for data visualisation and exploration.
To build a machine learning model for forecasting future sales revenue, we will first
need to import the necessary libraries and load the data into our environment
Q Copy cade
# Invert libraries
import pandas as pd
import numpy as np
# Load data
data = pd.read_csvC'data.csv')
Next, we will handle missing values and encode categorical variables, if necessary.
We will also need to ensure that the data is in the correct format for modeling. This
may involve tasks such as scaling numerical columns and creating lag variables.
Now that the data is cleaned and prepared, we can split it into training and
Q Copy code
# Build model
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressorO
model. ruin, y_troirf)
Q Copy code
# Evaluate model
from skiearn.metrics inp rt mean_absolute_error
I want you to act as a coder, I have trained a machine learning model on an imbalanced
dataset. The predictor variable is the column "scam". The dataset has 1% scam dataset
and 99% non-scam dataset. In python, how do I eve ream pie and/or undersample my
data?
To oversample the minority class in your dataset, you can use the ‘ SMOTE"
(Synthetic Minority Oversampling Technique) method, which is available in the
'imbleorn' library. Here Is an example of how to use it:
method from the ' imblearn' library. Here is an example of how to use it:
Pl Copy code
ChatGPT used the imblearn library to write boilerplate code that randomly
under and oversamples the dataset.
The code is sound, but I would nitpick on its understanding of over and
undersampling.
Thus, it should have attempted to split the dataset into train and test sets
first, before performing the under and oversampling.
Certainly! Here is some code that you can use to find the most important features in
a decision tree model:
Q Cop/code
import nuffipy as np
# Load your model and the data that you used to train it
model = # Load your model
X_train ■ # Load your training data
for f in rangeCX_train.shapef:]};
printC'Xd. feature %d £%O” % £f + 1, indicesffj,
importances[indicesCfJ]
ChatGPT correctly provided the code for plotting the feature importance of
a decision tree. It even ranked the feature importances from the most
important to the least important. Kudos!
Prompt: I want you to act as a coder in python. I have a dataset [name] with
columns [name]. [Describe graph requirements]
L1 I want you to act as a coder in python. I have an dataset called iris_df. It has the columns
Petal Length, Petal Width, Sepal Length, Sepal Width and Species Type. On the left, I
want a graph that depicts the relationship between Petal Width and Petal Length. On the
right, I want a graph that depicts the relationship between Sepal Width and Sepal Length.
For both graphs, please differentiate between the different species. Please make It
visually pleasing and colorblind-friendly. Please use matplotlib, ensure that the graph is
large enough for a presentation, and save the output to a file called "out put .png"
Certainly! Here is some code that should accomplish what you've described:
Q Copy code
# Plot the relationship between Sepal Width and Sepal Length on the
right
for spec Ln set(species);
spec_df = irisudf[iris.jdf[’Species Type ] == spec]
axZ.scatter(spec_df['Sepal '■ dth ], spec_df[ Sepal Length’],
label=spec)
ax2,set_xlabel( epal Width )
ax2.set^yl abel('Sepal Length')
ax2.legendC)
I want you to act as a coder. Il have a folder of images. The subdirectories are "TRAIN”
and ’’TEST'". Within "Train” and "Test" are HO foiders from "O'" to '"Q". Within each folder,
there are JPGr JPEG and PNG images. Write code that randomly selects 20 images from
the TRAIN subdirectory and display them in a 4 by 5 grid. Repeat the same for the test
dataset.
Surer Hero is some code that should accomplish what you've described:;
Q Copy code
import os
import randOrti
import mdtplCjit I ib . pypltjt ci.% pit
from mat pl Pt lib■, Image import imreaicl
Again, ChatGPT can provide the correct code for plotting grid of images
with Matplotlib.
l want you to act as a data scientist and explain the model's results. I have trained a
scikit-learn K-nearest-neighbour model and I would like to explain the output using LIME.
Please write the code.
Certainly! Here is some code that you can use to explain the results of a scikit-learn
Q Copy code
import lime
import lineAine_tabular
import numpy as np
# Load your model and the data that you used to train it
model - # Load your model
X_train = # Load your training data
X.test - # Load the test data that you want to explain
This code will create a LIME explainer object, which you can use to generate an
explanation for a single instance in your test dataset. The explanation will consist of
a list of the most important features that contributed to the model's prediction,
along with the effect of each feature on the prediction.
Prompt: I want you to act as a data scientist and explain the model’s results.
I have trained a scikit-learn XGBoost model and I would like to explain the
output using a series of plots with Shap. Please write the code.
I want you to act as a data scientist and explain the model's results. I have trained a
scikit-learn XG Boost model and I would like to explain the output using a series of plots
with Shap. Please write the code.
Certainly! Here is same code that you can use to create a series of plots using Shap
model:
Q Copy Cddt
import Shop
import matplotlib.pyplot as pit
# Load your model and the data that you used to train it
model - # Load your model
X_troin = # Load your training data
# Loop through the features and create a plot for each one
for i in rainge(X_train.shape[l]):
# Create a subplot
pit. subplot (Strain .shape[l], 1, i+ )
This will create two threads, 'threadl' and 'thread?', and start them
simultaneously. The 'print_time' function will be executed by both threads, with
the 'threadName' and 'delay' arguments being passed to it from the 'args'
parameter in the 'Thread' constructor.
I’m pleasantly surprised to find that this code actually works without any
modifications. I can foresee myself doing this often in the future.
Function 1:
defcoinputeKa, b);
return (a + b) / (a * bj
Function 2;
Import numpy as np
def compute2(a, b):
return np.divide(np.sum([a,b]np.multipliy(a,b))
n CO0y code
import time
import statistics
far 1 in range(num_repeats
t i mes 1.. a ppe ndC mea sli r e_ ti me C fu nd., a, b}}
timesZ .append<measLire_ti.meCfuncZ , a,
# Example usage
run_experiinentCcoinputel, computeZ, 10, 203
ChatGPT is able to write code that compares two function speeds with
built-in packages.
The code runs and is correct. Though, I would rather ChatGPT use
the %%timeit magic in Python instead of writing verbose code.
Prompt: I want you to act as a data scientist. I need to create a numpy array.
This numpy array should have the shape of (x,y,z). Please initialize the
numpy array with random values.
I want you to act as a data scientist. I need to create a numpy array. This numpy array
should have the shape of (10, 3, 4, 5), Please initialize the numpy array with random
values between 1 to 5.
Sure! Here's how you can create a numpy array with the desired shape and
initialized with random values between 1 and 5:
Q Copy code
■impart; numpy 05 np
This will create a numpy array with a shape of (TO, 3, 4, 5) and fill it with random
integers between 1 and 5 (inclusive).
Prompt: I want you to act as a software developer. Please write unit tests for
the function [Insert function]. The test cases are: [Insert test cases]
Prompt: I want you to act as a data scientist. Please write code to test if that
my pandas Dataframe [insert requirements here]
Chapter 3: EXPLAIN CODE
19. Explain Python
I want you to act as a code explainer. What is this code doing? reconstructed =
np.c lip (reconstructed, 0, 1)
pit. imshow(np. transpose (reconstructed, (1, 2, 0)))
pit.showQ
' 'np . dipt reconstructed f 0, ID This function clips the values in the
' reconstructed’ array so that all values less than 0 are set to 0, and all values
greater than 1 are set to 1. This is likely being done to ensure that the values in
the array are within the valid range for pixel Intensity values (which are usually
between O and 1).
■ 'np.transposefreconstructed, fl, Z, : This function transposes
the dimensions of the ' reconstructed ' array. The dimensions of an image
array are usually ' (hei ght, width, channels}', but the ' inishowO'
function expects the dimensions to be(width, height, channels.}The
' transpose^} ' function reorders the dimensions of the array so that it has the
correct shape for ’imshowf)'.
* 'pit,imshow( This function displays the image stored in the transposed
'reconstructed' array.
■ 'pit. showO': This function displays the figure created by ' imshow(}'.
Prompt: I want you to act as a data science instructor. Can you please
explain to me what this SQL code is doing? [Insert SQL code]
L] I want you to act as a data science instructor. Can ycu please explain to me what this SQL
code Is doing? SELECT year, revenue, SUM(revenue] OVER (ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) FROM table
■ ' SELECT year, revenue'; This clause specifies the columns that you want to
include in the output. In this case, the output will include the "year11 and
■ revenue" columns,
■ 'SLM(revenue) OVER {RONS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW}': This is a window function that calculates the sum of the
"revenue" column for each row in the table. The 'OVERO ’ clause defines the
window over which the function will operate, in this case the ' ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW' window. This window includes all
rows from the start of the table ('UNBOUNDED PRECEDING') up to the current
row ('CURRENT ROW ).
’ ' FROM table': This clause specifies the table that you want to retrieve data
from. In this case, the table is simply named "table".
Overall, this SQL code is selecting the "year" and "revenue" columns from the
"table" table, and calculating the sum of the "revenue" column for each raw in the
table. The output will include a column with the sum of the "revenue" column for
each row, as well as the original "year" and "revenue" columns.
ki I want you to act as a Google Sheets formula explainer. Explain the following Google
Sheets command. =filter(C2:C12, A2:A12=F2)
The ' FILTER' function in Google Sheets is used to filter a range of cells based on
specified criteria and return the resulting rows. The function has the following
syntax:
* ' range ': This is the range of cells that you want to filter. In this case, the range
is C2;C1Z',
" ' condition!', ’ conditionZ', etc.: These are the conditions that you want to
use to filter the range. In this case, the condition is 'AZ:A12-F2', which
specifies that you want to include only the rows where the value in column A is
equal to the value in cell F2.
So, in this case, the ' FILTER' function is being used to filter the range ' C2:C12'
based on the condition 'AZ : A1Z-FZ which means that only the rows where the
value in column A is equal to the value in cell F2 will be included in the resulting
dataset. The resulting dataset will include all of the values from column C that meet
this criteria.
Note that the 'FILTER' function is an array function, which means that you'll need
to enter It using Ctrl + Shift + Enter (on Windows) pr Command + Shift + Enter (on
Mac). This will cause the formula to be enclosed in curly braces {} to indicate that
it's an array formula. rf->
ChatGPT is able to explain the FILTER function well. I like that it provided
the generalize code snippet like one would see in the documentation, and
proceeded to explain the general snippet with the specific arguments
provided by the users.
In particular, the line the "Filter" function is being used to filter the range
"C2:C12" based on the condition "A2:A12=F2" is an excellent explanation
is contextualized to the user’s arguments.
Chapter 4: OPTIMIZE CODE
22. Improve Code Speed
I want you to act as a software developer. Please help me improve the time complexity of
the code below, sum=O.
for i in renge(y-tl):
sum+=i
return Eum
To improve the time complexity of the code you provided, you could use a
mathematical formula to calculate the sum Of the integers from O toy', rather than
using a loop. This would eliminate the need to loop through the integers and add
them up one by one, and would allow you to calculate the sum in a single step.
Here is hew you could rewrite the code to use a mathematical formula to calculate
the sum:
Q Copy code
(y
This function calculates the sum of the integers from O to ’ y ’ using the formula
' fn • <n + 120 / 2', which is the formula for the sum of an arithmetic series.
This formula has a time complexity of 0(1). because it performs a fixed number of
operations regardless of the value of 'y'.
Prompt: I want you to act as a code optimizer. Can you point out what’s
wrong with the following Pandas code and optimize it? [Insert code here]
I want you to act as a code optimizer Can you point out what's wrong with the following
Pandas code and optimize it? df = pd,read_csv("titles.csv")
# Mutation-DON'T DO THIS
df_bad = dtqueryfTuntime > 30 & type = 'SHOW”)
df_bad [''score'] = df_bad[['1mdb_score1\" tmdb_score"]].sum(axis=1)
df_bad = df_bad[["seasonsnf "score"]]
df_bad = df_bad.groupby("seasons").agg( ["count", "mean"])
df_bad - df_bad.droplevel(axis=1, level=0)
df.bad = df_bad.qijery ("count > 10")
There’s nothing inherently wrong with the code you've provided, but there are a few
suggestions I can make to optimize it:
1, Consider using the' i si n 0' method to filter the dataframe instead of using the
query method. This can be faster and more readable, especially if you have
multiple conditions.
For example:
3 Copy code
Prompt: I want you to act as a code optimizer. Can you point out what’s
wrong with the following Pandas code and optimize it? [Insert code here]
| I want you to act as a code optimizer. Can you optimize this Pandas code for me?
vals = []
for Ind, row in df. iter rows ():
country = rOwfprodCCuntryT]
if country in top3:
vals.append("top3’)
elif country in topIO:
vals.append ("topic")
elif country in top20:
vals.append ("top 20")
else;
vals .append ("other'1)
df["prod_country_ranl<"] = vals
0 Here are a few suggestions for optimizing the code you provided:
1. Instead of using the ' i terrowsO' method to loop through the rows of the
dataframe, you can use the ' apply()' method to apply a function to each row.
This can be more efficient and easier to read. For example:
Q Copy code
Prompt: I want you to act as a code optimizer. The code is poorly written.
How do I correct it? [Insert code here]
3, You can also use list comprehensions to create a new list that combines the
elements of'numbers' and 'letters':
Q Copy code
Which of these approaches is the best choice will depend on your specific needs
and the context in which the code is being used.
Prompt: I want you to act as an SQL code optimizer. The following code is
slow. Can you help me speed it up? [Insert SQL]
I want vou to act as an SQL code optimizer. The following code is stow. Can you help me
speed it up? SELECT
FROM TeslTable
WHERE DATEPART(YEAR, SartleMyDate) = '2O2T;
Here are some suggestions to optimize the performance of the above SQL code:
1. Use an index on the ' Som<?MyDatc' column; An Index on tho ' SomcMyDate '
column will allow the database to quickly find rows that match the condition in
the ' WHERE ' clause, rather than having to scar the entire table.
2. Use a covered index; If the "TestTable' table has a largo number of columns
and you only need to retrieve a few of them, you can create a "covered Index11
that includes al] the columns you need. This will allow the database to retrieve
the data from the index itself, rather than having to look up the rows in the table.
3. Use the 'EXTRACTO' function instead of ' DATEPARTO ': The ' EMTHACTO'
function may be slightly faster than ' DATE PART Q J ' for extracting the year from
a date. Here is how you can rewrite the query using 'EXTRACTQ’:
Q Copy code
SELECT *
FROM TestTable
WHERE EXTRACT<¥EAR FROM ScmeMyDot= 2021;
The first two suggestions here are valid suggestions for optimization.
Howeve, the third tip is incorrect. In fact, DATEPART is actually faster than
EXTRACT. For that reason, I’ll have to dock stars off for this category.
Prompt: I want you to act as a code simplifier. Can you simplify the
following code?
If you’ve written spaghetti code in Python, ChatGPT might be able to help
you simplify it.
Chapter 5: FORMAT CODE
28. Write Documentation
def func'CyJ:
■ IF II
Parameters:
y (int): The upper bound of the range of integers to be
Sumned.
Returns:
float: The sum of the integers from 0 to y,
sum-0.
for i in rangefy+j.}:
sum+-i
return sum
Prompt: I want you to act as a code analyzer. Can you improve the
following code for readability and maintainability? [Insert code]
U1 I want you to act as a code analyzer. Can you improve the following code for readability
and maintainability?
def list (a,b):
randomCamelCase = a
ra n do m_s nake_ca se-b
return [randomCamelCase, random_snake_case]
Prompt: I want you to act as a SQL formatter. Please format the following
SQL code. Please convert all reserved keywords to uppercase [Insert
requirements]. [Insert Code]
The code is formatted nicely and neatly with the correct indentation and
casing.
Chapter 6: TRANSLATE CODE
31. Translate Between DBMS
Prompt: I want you to act as a coder and write SQL code for MySQL. What
is the equivalent of PostgreSQL’s DATE_TRUNC for MySQL?_________
L| I want you to act as a coder and write SQL code for MySQL. What is the equivalent of
DATE_TRUNC for MySQL?
MySQL does not have a built-in function equivalent to PostgreSQL’s ' DATE_TRUNC'
function, which allows you to truncate a date or timestamp to a specified precision.
However, you can achieve similar results using the ' DATE_FORMAT' function, which
allows you to format a date or timestamp as a string using a variety of format
specifiers.
For example, to truncate a timestamp to the nearest hour, you can use the following
query:
SELECT OATE_FORMAT(timestamp.column, AS
truncated-timestamp
from tabl.e_name;
Prompt: I want you to act as a code translator. Can you please convert the
following code from python to R? [Insert code]____________________
kJ I want you to act as a code translator. Can you please convert the following code from
python to R?
@ Certainly! In Rd you can use the ' bind„rows' function from the dplyr' library to
concatenate multiple data frames along the rows (axis=O). Here is the equivalent
code tn R:
A Copy cotie
library(dpiyr}
This code first loads the ’ dplyr' library and then uses the ' bind_rows' function
to concatenate the data frames ' df_l', ' df_2',' df_n' along the rows. The
resulting data frame will have all the rows from the input data frames, in the order
they were specified.
Prompt: I want you to act as a code translator. Can you please convert the
following code from R to python? [Insert code]
J I want you to act as a code translator. Can you please convert the following code from R
to python?
spread (
df, key = ,toy_namel
value = 'value_name1
)
Certainly! In python, you can use the ’ pivot' method of a pandas DataFrame to
spread a column into multiple columns. Here is the equivalent code in python using
pandas:
This code uses the 'pivot' method to reshape the data frame by spreading the
'key_name’ column into multiple columns and using the 'vfllue_name' column as
the values in the new columns. The resulting data frame will have one column for
each unique value in the 'value_name' column, and the values in the new columns
will come from the ' value' column.
Imagine you have a bunch of pictures cf different animals, and you want to know
which ones are dogs. You could show the pictures to a person, and they would be
able to tell you which ones are dogs by looking for certain features, like four legs, a
wagging tail, and a furry face. A neural network does something similar, but it does it
all by itself, without any help from a person.
To do this, a neural network has a lot of little parts called "neurons," which are kind
of like tiny brains. These neurons are arranged in layers, and they work together to
analyze the patterns in the pictures. When they see a pattern that looks like a dog,
they carl say "Hey, this looks like a dog!1,1
Neural networks are really good at finding patterns, and they can be used for all
sorts of things, like recognizing faces in photos, understanding speech, and even
playing games like chess. They’re one of the most powerful tods that data scientists
have to solve all kinds of problems.
Each neuron receives input from other neurons or from external sources, and
combines these inputs using weights that reflect the importance of each input. The
neuron then applies an activation function to the combined input to produce an
output, which is passed on to other neurons in the next layer.
The weights and activation functions of the neurons are learned from training data,
allowing the neural network to learn complex patterns and relationships.
Neural networks can be used fora wide variety of tasks, such as image
classification, language translation, and time series prediction. They are particularly
well-suited to tasks that Involve large and complex datasets, and are able to learn
and adapt as they are exposed to more data.
professor.
Sure! A neural network is a type of machine (earning algorithm that is inspired by the
structure and function of the brain. It is composed of a large number of
interconnected processing nodes, called "neurons," which are organized into layers.
Each neuron receives input from other neurons, and uses this input to make a
decision or prediction based on the patterns it sees in the data.
Neural networks are typically used to solve problems that involve pattern recognition
or classification. For example, a neural network might be trained to recognize
handwritten digits based on a large dataset of labeled examples, or to classify
emails as spam or non-spam based on their content.
One of the key advantages of neural networks is that they are able to learn and
adapt to new data over time, without the need for explicit programming. This makes
them particularly useful fortasks that Involve complex, non-ilnear relationships, or
for situations where the desired output is not easily defined in advance.
Neurai networks are widely used in a variety of fields, including image and speech
recognition, natural language processing, and finance. They have also been used to
achieve state-of-the-art results on a number of benchmarks and real-world tasks.
For example, let's say you have a company that sells products online, and you want
to figure out which of your customers are most likely to buy a new product that you
just released. You could use a neural network to analyze data about your customers'
past purchases, their demographic information, and other factors that might be
relevant. Based on this analysis, the neural network could identify patterns that
seem to be associated with a higher likelihood of making a purchase, and It could
use these patterns to predict which customers are most likely to buy the new
product.
This is just one example of how neural networks can be used to solve problems in
business. They can also be used for things like predicting which employees are most
likely to leave the company, identifying fraudulent activity, and optimizing supply
chain operations, among many other applications.
Overall, neural networks are a powerful toot that can help businesses make better
decisions and improve their operations by finding patterns and insights in data that
would be difficult or impossible for a human to identify. rfh
In SQL, constraints are used to specify rules for the data in a table. If a constraint is
violated, the database will return an error and the transaction will be rolled back.
Constraints are used to ensure the integrity and correctness of the data in a
database.
* NOT NULL: This constraint specifies that a column cannot contain a NULL value.
■ UNIQUE: This constraint specifies that all the values in a column must be unique.
' PRIMARY KEY: This constraint specifies a column or a set of columns that
uniquely identifies each row in a table. A table can have only one primary key.
' FOREIGN KEY: This constraint specifies that the values In a column must match
the values in a column of a different table (the parent table). This is used to
enforce referential integrity.
* CHECK: This constraint specifies a condition that the values in a column must
satisfy..
* DEFAULT: This constraint specifies a default value for a column. If no value is
specified for the column when inserting a row, the default value will be used.
Here is an example of how constraints can be defined in SQL:
In this example, the 'Customers' table has four columns: 'Customerld', 'Name',
the table, the ' Name' column is defined as NOT NULL, the ' Email' column is
defined as UNIQUE, and the'Age' column is defined with a CHECK constraint that
ensures that the values in the column are greater than zero.
Not only is the answer correct and succinct, it also came with relevant
information (the different types of constraints) and the code snippets.
One tip that I have here is to explicitly prompt ChatGPT for code snippets,
sample tables and outputs for a more comprehensive answer.
Chapter 8: SUGGEST IDEAS
39. Suggest Edge Cases
I want you to act as a software developer. Please help me catch edge cases for this
function
def computefa, b):
return (a+b)/(a*b)
<S^ Certainly! Here are some edge cases you might want to consider when testing the
'compute' function:
* When 'o' and *b' are both zero. In this case, the function will raise a
'ZeroDivisionEr ror', since you cannot divide by zero.
* When 'a' is zero and ' b' is non-zero. In this case, the function will also raise a
'ZeroOivisionError'.
* When 'a' and 'b' are both negative. This might produce a different result than
when and 'b' are both positive, depending on the values.
* When ' a' is negative and ' b ‘ is positive, or vice versa. This might also produce
a different result than when ’a' and 'b' are both positive or both negative.
* When 'o' is very large and ft' is very small, or vice versa. This might produce a
different result than when 'o' and 'b' are both large or both small.
These are just a few examples of edge cases you might want to consider.
Depending on the specific requirements of your function, there might be other edge
cases to consider as well.
For a relatively simple function, ChatGPT is able to correctly and
exhaustively suggests the edge cases. It exhaustively considered all cases
without being prompted on what the cases are.
Prompt: I want you to act as a data science career coach. I want to build a
predictive model for [...]. At the same time, I would like to showcase my
knowledge in [...]. Can you please suggest the five most relevant datasets
for my use case?
| I want you to act as a data science career coach. I want to build a predictive model for
movies. At the same time, I would like to showcase my knowledge in computer vision. Car
you please suggest the five most relevant datasets for my use case?
@ Sure! Here are five potential datasets that you might find useful for building a
1. The Movie Lens dataset: This dataset contains movie ratings and tag
recommendations from the MovieLens website, and includes information about
3. The Netflix Prize dataset: This dataset contains movie ratings and other data
from the Netflix Prize competition, which challenged participants to build a
recommendation system for movies.
4. The TMDB 5000 Movie dataset: This dataset contains information about movies,
including their budget, revenue, and cast, as well as movie posters and other
images.
5. The Film Popularity dataset from the UCi Machine Learning Repository: This
dataset contains information about the popularity of movies, including their gross
However, note that I have only tested relatively simple use cases. For more
complex use cases that requires niche datasets, ChatGPT might not do well.
Sure! Here are five potential portfolio projects that could be relevant for a data
other infrastructure to support its electric vehicles. You could build a project that
helps optimize the design and placement of these stations based on factors such
sources.
machine learning model that predicts when maintenance will be needed, so that
which will need to be able to navigate complex traffic environments. You could
build a model that predicts traffic flow in different locations, which could be used
This is one of my favourite prompts. Its suggestions are relevant not only to
myself, but also to the industry that I am gunning for, and also for my
dream company. It does not suggest any generic projects like the Titanic or
Iris project. Instead, it makes salient suggestions based on its knowledge of
the company (Tesla, in this case), the industry (data science), and my
background in engineering.
I love this feature and I hope more people will try it out!
Prompt: I want you to act as a data science coach. I would like to learn
about [topic]. Please suggest 3 best specific resources. You can
include [specify resource type]
I want you to act as a data science coach. 1 would like to learn about data engineering.
Please suggest 3 best specific resources. You can include websites, courses and books.
Sure! Here are three resources that I would recommend for learning about data
engineering:
1. Online Courses: There are many online courses available that cover data
engineering concepts and best practices. Some options include "Data
Engineering on Google Cloud Platform” on Coursera and "Data Engineering
Fundamentals” on Pluralsight.
2. Books: There are also several books that are great for learning about data
engineering. Some options include ’’Designing Data-Intensive Applications1' 2by3
Martin Kleppmann and "Building Data Pipelines with Python1' by Alec Watson.
3. We bsites: The re a re ma ny we bs ites th at offer tuto ria I sd a rticl es, and ot he r
resources on data engineering. Some options include the "Data Engineering1'
section of the Dataquest blog and the "Data Engineering1' section of the
Databricks blog.
This one is for those looking to learn a new skill. I find that the resources
that it has suggested are not only relevant, but also critically acclaimed. It
also does not provide an overwhelming list of resources and instead provide
only a few. I like this because it certainly provides a good starting point for
anyone hunting for new resources.
Prompt: I want you to act as a software developer. Please compare the time
complexity of the two algorithms below. [Insert two functions]
fa.I I want you to act as a software developer Please compare the time complexity of the two
algorithms below, def fund (y):
sum=0.
for i in range(y+1):
sum+=i
return sum
def func2(y):
return y/2'(1+y)
Asa software developer, I would say that the time complexity of the * fund1
function is O(n), where n is the value of the input 'yr This is because the function
performs a single loop that runs from 0 to 'y', and the time taken to execute the
The time complexity of the ' func2' function is 0(1), because the function
performs a fixed number of operations regardless of the value of the input 'y'. The
time taken to execute the function is constant, and does not depend on the size of
the input.
Therefore, the 'funcZ' function is more efficient than the 'fund' function,
because it has a lower time complexity. The 'fund' function will always be faster
For this very simple example, ChatGPT is able to deduce the time
complexity.
Certainly! Here are a few ideas for features that you could engineer for this machine
learning problem:
1. Payment history: You could create a new feature that captures the overall
payment history of the borrower. For example, you could calculate the average
payment amount over the past six months, or count the number of months in
which the borrower paid the minimum amount due.
2. Credit utilization: You could calculate the credit utilization ratio for each
borrower, which is the ratio of the total amount of credit being used to the total
credit limit. This can be a useful feature because high credit utilization is often
associated with higher risk of default.
3. Payment to bill ratio: You could create a new feature that captures the ratio of
payment amount to bill amount for each month. This can give you an idea of how
much of the bill the borrower is paying off each month, which can be a useful
predictor of default risk.
Certainly! Here are the steps for designing an A/B test to compare the effectiveness
of a blue landing page background versus a red landing page background, including
1, Sampling:
random sample, or you could stratify the sample based on factors such as
' Divide the sample into two groups: the "treatment" group, which will see the
red background, and the"control" group, which will see the blue background,
it is important to ensure that the two groups are comparable in terms of the
' Implement the changes to the landing page for the treatment group. Make
sure to keep all other elements of the page the same for both groups, so that
* Track the key metric that you are interested in, such as the conversion rate
(the percentage of visitors who take a desired action, such as filling out a
form or making a purchase}. You will need to have a way to track this metric
* Run the experiment for a sufficient length of time to allow for a sufficient
depend on the size of the sample and the expected effect size.
■ Calculate the difference in the key metric between the treatment and control
groups. This will be the '’effect size1’ of the change in background color.
the means of two independent samples (the treatment and control groups).
■ If the p-value of the t-test is less than your predetermined significance level
(usually 0.05), you can conclude that the difference is statistically significant,
and that the change in background color had an effect on the key metric.
While ChatGPT’s steps for A/B testing is correct. I did find it a requirement
to explicitly ask it to include the statistical test required.
£ I want you to act as a career advisor I am looking for a role as a data engineer in Amazon
as a fresh graduate. I have an undergraduate degree in chemical engineering and have
self-studied data science through MOOCs. How do I land the role and with what
resources exactly in 6 months?
Sure, I'd be happy to help you out. As a data engineer at Amazon, you'll be
responsible for designing, building, maintaining, and troubleshooting data pipelines.
It's a role that combines software engineering with data management and analysis
skills.
languages such as Python and SQL, as well as tools like Hadoop, Spark, and
AWS. Consider taking online courses or earning certifications to enhance your
skills.
2. Gain practical experience: While a degree and MOOCs are a good start, practical
experience is crucial for landing a data engineering role. Consider working on
data-related projects in your spare time, or interning at a company that uses data
engineering. This will give you the opportunity to apply your skills and build a
portfolio of work to show potential employers.
3. Network: Networking can be a powerful tool for finding job opportunities. Attend
industry events and meetups, join online forums and groups, and reach out to
professionals in the field to learn more about data engineering and make
connections. You never know who might be able to help you find your next job.
4. Tailor your resume and cover letter: When applying for a data engineering role at
Amazon, make sure to highlight your relevant skills and experience. Include any
projects you've worked on, any relevant coursework or certifications, and any
relevant work experience. Customize your resume and cover letter for each job
you apply to, and make sure to highlight how your skills and experience make
you a strong fit for the role.
I have no surprises that ChatGPT did well here. It is able to regurgigate oft-
repeated advice for career changers. While this is technically correct, I
didn’t find it overwhelming impressive.
Side note: One can combine this tip with Tip 41 (Suggest portfolio ideas)
and 42 (Suggest dataset) when building one’s portfolio.
Chapter 9: TROUBLESHOOT
PROBLEM
47. Correct Own ChatGPT Code
Prompt: Your above code is wrong. [Point out what is wrong]. Can you try
again?
Your above terminal command is wrong. There is no "-T" option for cp. Can you try
again?
@ I apologize for the error in my previous answer. The ' -T' option is not a valid option
for the 'cp' command. To recursively copy all files from "Desktop/filesT1 to a folder
that has not been created "Desktop/files2" in the Linux terminal, you can use the
'cp' command with the '-r' option and the 'mkdir -p' command, like this:
mkdir -p Desktop/filesZ
cp -r Desktop/filesl/* Desktop/filesZ/
ChatGPT is quick to admit its mistakes and correct its code. In the above
example, I asked ChatGPT to create a shell command. I realized that it does
not actually run, so I pointed out what was wrong. ChatGPT is able to
provide the correct code on the second attempt.
This is good enough if you know where the bug is exactly. But in cases
where you do not know where the bug is, ChatGPT might not be able to
pinpoint where the fault is. The onus is still on the coder to figure things
out.
def add_from_1_to_k(k):
if{k>0}:
result = k+add_from_1_to_k(k-1)
return result
There is an error in the code you provided because it is missing a return statement
for the case where ' k' is 0 or negative. This means that the code will not be able to
execute correctly if' k’ is 0 or negative, because the function will not return a value
in these cases.
To fix this error, you can add a return statement for the case where ' k' is 0 or
negative, like this:
def add_fron_l_to_k(k):
if k <= 0:
return 0
else:
result - k + odd_f
return result
Prompt: I want you to act as a SQL code corrector. This code does not run
in [your DBMS, e.g. PostgreSQL]. Can you correct it for me? [SQL code
here]
H I want you to act as a SQL code corrector. This code does net run. Can you correct it for
me? SELECT name
FROM dish
WHERE name = 'Prawn Salad’
GROUP BY name
HAVING count(*} = 1
LIMIT 5 ORDER BY name ;
It looks like there is a problem with the syntax of your ' SELECT' statement. The
'LIMIT' clause should come before the ' ORDER BY' clause. Try the following
instead:
Q Copy code
SELECT name
FROM dish
WHERE name ” 'Prawn Salad’
GROUP BY name
HAVING countC*) - 1
ORDER BY name
LIMIT 5;
This should correct the syntax error and allow the query to run successfully,
It is difficult to determine if there are any problems with the Emission table without
more context. However, here are a few things to consider when working with this
table in Power Bl:
1. Data types: Make sure that the data types for each column are appropriate for
the data they contain. For example, the "Timestamp" column should have a data
type of "Date/Tlme", and the "Value" column should have a numeric data type
such as "Decimal Number” or ’’Whole Number".
2. Data quality: Check for any errors or inconsistencies in the data. For example,
you might want to ensure that all values in the "Unit" column are spelled
correctly, or that the ‘‘EmissionType" column does not contain any null values.
4 Performance: Consider the size of the Emission table and the complexity of your
I
measures and visuals. If the table is very large or your measures and visuals are
very complex, it could impact the performance of your Power Bl report.
Overall, the key is to understand the data in the Emission table and how you want to
use it in your Power Bl report. This will help you identify any potential problems and
come up with solutions to address them.
Mathias Halkjrer Petersen had a good chat with ChatGPT on this topic. He
showed that ChatGPT is able to troubleshoot PowerBI models effectively
when more context is provided. It is also able to write the models relatively
independently.
Chapter 10: WRITE SQL
51. Create Running Average
Prompt: I want you to act as a data scientist and write SQL code for me. I
have a table with two columns [Insert column names]. I would like to
calculate a running average for [which value]. What is the SQL code that
works for PostgreSQL 14?
I want you to act as * data scientist and write SQL code for me- I have a table with two
columns ''date'1, "product" and "revenue", I would like to calculate a running evernge for
each product. What Is the SQL code that works for PostgreSQL 14?
Here is the SQL code that you can use to calculate the running average for each
product in PostgreSQL 14:
C Copy code
SELECT
dote T
product ,
revenue .
AVGCrevenue) OVER C
PARTITION BY product
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
3 as rmntnf)_avfl
FROM sales_table
ORDER BY dale:
This code will do the following:
1. Select the ' date', 'product', ' revenue', and ' running_avg' columns from
the 'sales^table' table,
2. Calculate the running average of the ' revenue' column for each product using
a window function,
3. The ' AVGQ' function Is a window function that calculates the average of a set
of rows* The 'OVE R(}' clause specifies the window for the window function*
The 'PARTITION BY' clause divides the rows of the table into partitions to
which the window function is applied. In this case, the rows are partitioned by
’ product', The 'ORDER BY' clause specifies the order in which the rows are
processed within each partition. In this case, the rows are ordered by date'.
The 'ROWS' keyword specifies that the window is defined in terms of rows, and
the 'BETWEEN' keyword specifies the start and end of the window,
4. The ' running.avg' column will contain the average of all 'revenue' values
from the start of the partition up to the current row, for each product.
5. The resulting table will be ordered by'date'.
One tip that I have is to provide ChatGPT with the exact version and
flavour of SQL. This allows it to create versions of the data.
I would encourage practitioners to create test ChatGPT with their own SQL
questions and see if it creates a correct response.
Prompt: Assume you are given the tables... with the columns. Output the
following...
1 Assume that you are given the table below containing information on viewership by
device type (where the three types are laptop, tablet, and phone). Define "mobile'' as the
sum of tablet and phone viewership numbers. Write a query to compare the viewership
on laptops versus mobile devices.
Output the total viewership for laptop and mobile devices in the format of “laptop_views"
and "mobile_views'. The viewership table has the columns huser_td", "device_type", and
"view_time"
To compare the viewership on laptops versus mobile devices, you can. use the
following SQL query:
Q copy cods
SELECT
SUM(CASE WHEN device_type - laptop' THEM view_time ELSE 0 END)
AS laptop_vi€W5 f
SUMfCASL WHEN device_type IN Ctablet , 'phone) THEN vicw_time
ELSE 0 END) AS mobile.views
FROM viewership
This query uses a ' CASE' expression to sum the ’'view_time’' values for laptop and
mobile devices separately. The " SUM' function is then used to calculate the total
viewership for each device type.
Prompt: I want you to act as a bot that generates Google Sheets formula.
Please generate a formula that [describe requirements]
I want you to act as a bot that generates Google Sheets formula. Please generate a
formula that puts data from another sheet into current sheet.
Here is a formula that puts data from another sheet into the current sheet:
Q Copy cade
This formula uses the ’ IMPORTRANGE‘ function, which imports data from another
sheet. The first argument is the "sheetjd" of the sheet that you want to import data
from. The sheetjd is a unique identifier for the sheet, and you can find it in the URL
of the sheet when it's open in your browser. For example, if the URL of the sheet is
'https://docs,google.coin/spreadsheets/d/obcl23/edit' , thesheetjd is
abcl23'.
The second argument is the range of cells that you want to import. This can be a
single cell (e.g., "AT1). a range of cells (e.g., "AfiBIO"), or a named range.
For example, if you want to import the data from cells A1 through B10 in a sheet with
the sheetjd abc!23you would use the fol I owing formula:
Q Cop/code
Note that to use the 'INPORTRANGE' function, you'll need to grant access to the
sheet that you want to import data from. To do this, you can go to the "Data" menu
and select "import data." Then, choose the sheet that you want to import data from
and click "Allow access."
Once you've granted access, you can use the' importrange' function to import
data from the other sheet into the current sheet. The imported data will be
automatically updated whenever the source data is changed. £
ChatGPT excels at acting as a lookup table for all commands. As you can
see, I asked it for a function that pulls one sheet and puts it into another. It
faithfully provided the correct importrange.
54. Write R
Prompt: I want you to act as a data scientist using R. Can you write an R
script that [Insert requirement here]
I want you tn act as a data scientist using Ft. Can you write an R script that imports the iris
dataset and uses ggplots to perform data visualization. I would like to use a violinplot to
compare the distribution of sepal length between flower species.
Certainly! Mere is an R script that imparts the iris dataset and uses ggpiotS to create
a violinplot comparing the distribution of sepal length between flower species:
L] Copy cade
This script loads the ggplot2 library and the iris dataset, and then creates a
violinplot using the ‘geom_violin ' function. The plot shows the distribution of
sepal length for each flower species, with the width of the violins indicating the
density of the data at each point.
ChatGPT sure excelled at this basic R command. This code runs and indeed
creates the plot as requested.
However, do note that I did not test ChatGPT on R coding extensively. I can
imagine it failing or producing incorrect codes for more complex cases.
Prompt: I want you to act as a Linux terminal expert. Please write the code
to [describe requirements]
I want you to act as a Linux terminal expert. Please write the code to create a directory
called "new folder", and create a shell file "txt.sh", It should contain the command to list
all fifes in "new folder1'
To create a directory called "new folder" and a Shell file "txt.sh" that contains the
command to list all files in "new folder", you can use the following commands in the
Linux terminal;
Q Copy code
These are the correct commands for the requested instructions. It even
provided a line for granting the necessary permission to the file without me
explicitly asking it to
However, note that I cherrypicked this example out of all the prompts for
writing shell commands. In particular, when I asked ChatGPT to copy files
from a folder recursively, it gave an incorrect output that could not be
executed. In its defense, it’s able to correct its own code (See “47. Correct
Own ChatGPT Code” above.)
Yes, I can write a VBA code that will automatically print all sheet names in an Excel
workbook. Here is a sample code that you can use:
Pl Copy code
Sub PrintSheetNamesO
Dim ws As Worksheet
For Each ws In ThisWorkbook.Worksheets
Debug.Print ws.Name
Next ws
End Sub
To use this code, copy and paste it into the VBA editor in Excel, and then run it by
pressing the "Run" button or by using the F5 key. The code will loop through ail the
worksheets in the current workbook and print the name of each sheet to the debug
window.
Chapter 12: MISCALLANEOUS
57. Format Tables
I want you to act as a document formatter for me. Can you please format the following
into a nice table for me to place in Google Docs?
1 January 2021 Friday New Year’s Day New Year’s Day
12 February 2021 Friday Chinese New Year Chinese New Year
13 February 2021 Saturday Chinese New Year Chinese New Year
2 April 2021 Friday Good Friday Good Friday
1 May 2021 Saturday Labour Day Labour Day
Prompt: I want you to act as a technical book summarizer. Can you please
summarize the book [name] with 5 main points?
U I want you to act as a technical book summarizer. Can you please summarize the book
Storytelling With Data: A Data Visualization Guide for Business Professionals with 5 main
points?
@ Certainly! Here are five main points from "Storytelling with Data: A Data Visualization
Knaflic’s book is a gold mine of data storytelling tips. I personally feel that
the summary was too generic and did not capture the tips sufficiently.
59. Summarize Paper
I want you to act as an academic. Please summarise the paper "Distilling the knowledge
in a neural network" in simple terms in one paragraph.
"Distilling the knowledge in a neural network" is a paper that presents a method for
compressing a large, complex neural network model into a smaller, more efficient
model that can be deployed in real-world applications. The authors propose a
method called "knowledge distillation" in which the large model is used to train the
smaller model, effectively transferring the knowledge from the large model to the
smaller one. The smaller model is able to achieve similar performance to the large
model, but with fewer parameters and faster inference times. This method has been
shown to be effective in a variety of tasks, including image classification and
machine translation.
However, note that this is possible because enough people have written
reviews or summaries of the above paper. It will not be able to summarize
papers that are not as popular well.
Also, ChatGPT does not have an understanding of the world beyond 2021.
Thus, it will not be able to help you understand new papers.
@ I'm here to help in any way I can. It can be frustrating when we put
a lot of time and effort into a project and don't see the results we
were hoping for, but it's important to remember that it's a normal
part of the process and that setbacks can often lead to valuable
learning experiences.
Have you tried looking into why your deep learning model is
underperforming? it could be that the data you are using is not
well-suited fora deep learning model, or that there are issues with
the way you have designed and trained the model. It can be helpful
to try to identify what might be causing the issue and then see if
there are any changes you can make to address it. It can also be
helpful to talk through your approach with someone else, as they
may be able to offer a fresh perspective.
Will ChatGPT replace data scientists? No, I don’t think so. At least not yet.
However someone using ChatGPT might.