Automate ChatGPT Prompts for Data Science With Python_ Enhanced Coding for the Modern Python Developer
Automate ChatGPT Prompts for Data Science With Python_ Enhanced Coding for the Modern Python Developer
Table of Contents
Table of Contents
About the Authors
Table of Contents
Automate ChatGPT Prompts for Data Science with Python
Chapter 1: ChatGPT Prompts for Data Science (Tried, Tested, and Rated)
Automate data science tasks with ChatGPT
How to Read this
Table of Contents
Write python
Explain code
Optimize code
Format code
Translate code from one language to another
Explain concepts
Suggest ideas
Troubleshoot problem
Write SQL
Write other Code
Miscellaneous
Chapter 2: WRITE PYTHON
1. Train Classification Model
2. Automatic Machine Learning
3. Tune Hyperparameter
4. Explore Data
5. Generate Data
6. Write Regex
7. Train Time Series
8. Address Imbalance Data
9. Get Feature Importance
10. Visualize Data with Matplotlib
11. Visualize Image Grid Matplotlib
12. Explain Model with Lime
13. Explain Model with Shap
14. Write Multithreaded Functions
15. Compare Function Speed
16. Create NumPy Array
17. Write Unit Test
18. Validate Column
Chapter 3: EXPLAIN CODE
19. Explain Python
20. Explain SQL
21. Explain Google Sheets Formula
Chapter 4: OPTIMIZE CODE
22. Improve Code Speed
23. Optimize Pandas
24. Optimize Pandas Again
25. Optimize Python
26. Optimize SQL
27. Simplify Python
Chapter 5: FORMAT CODE
28. Write Documentation
29. Improve Readability
30. Format SQL
Chapter 6: TRANSLATE CODE
31. Translate Between DBMS
32. Translate Python to R
33. Translate R to Python
Chapter 7: EXPLAIN CONCEPTS
34. Explain to Five-Year-Old
35. Explain to Undergraduate
36. Explain to Professor
37. Explain to Business Stakeholder
38. Explain Like StackOverflow
Chapter 8: SUGGEST IDEAS
39. Suggest Edge Cases
40. Suggest Dataset
41. Suggest Portfolio Ideas
42. Suggest Resources
43. Suggest Time Complexity
44. Suggest Feature Engineering
45. Suggest A/B Testing Steps
46. Career Coaching
Chapter 9: TROUBLESHOOT PROBLEM
47. Correct Own ChatGPT Code
48. Correct Python Code
49. Correct SQL Code
50. Troubleshoot PowerBI Model
Chapter 10: WRITE SQL
51. Create Running Average
52. Solve Leetcode Question
Chapter 11: WRITE OTHER CODE
53. Write Google Sheets Formula
54. Write R
55. Write Shell
56. Write VBA
Chapter 12: MISCALLANEOUS
57. Format Tables
58. Summarize Book
59. Summarize Paper
60. Provide Emotional Support (My Favourite)
Closing
I rated 60 ChatGPT functions for Data Science. Use these prompts and ask
ChatGPT to write, and explain code, optimize data science code. It can also
explain data science concepts, suggest ideas, and troubleshoot problems.
Table of Contents
Write python
Train Classification Model
Automatic Machine Learning
Tune Hyperparameter
Explore Data
Generate Data
Write Regex
Train Time Series
Address Imbalance Data
Get Feature Importance
Visualize Data with Matplotlib
Visualize Image Grid Matplotlib
Explain Model with Lime
Explain Model with Shap
Write Multithreaded Functions
Compare Function Speeds
Create NumPy Array
Write Unit Test
Validate Column
Explain code
Explain Python
Explain SQL
Explain Google Sheets Formula
Optimize code
Improve Code Speed
Optimize Pandas
Optimize Pandas Again
Optimize Python
Optimize SQL
Format code
Simplify Python
Write Documentation
Improve Readability
Format SQL
Explain concepts
Explain to Five-Year-Old
Explain to Undergraduate
Explain to Professor
Explain to Business Stakeholder
Explain Like Stackoverflow
Suggest ideas
Suggest Edge Cases
Suggest Dataset
Suggest Portfolio Ideas
Suggest Resources
Suggest Time Complexity
Suggest Feature Engineering
Suggest A/B Testing Steps
Career Coaching
Troubleshoot problem
Correct ChatGPT Code
Correct Python Code
Correct SQL Code
Troubleshoot PowerBI Model
Write SQL
Create Running Average
Solve Leetcode Question
Miscellaneous
Format Tables
Summarize Book
Summarize Paper
Provide Emotional Support
Chapter 2: WRITE PYTHON
1. Train Classification Model
Prompt: I want you to act as a data scientist and code for me. I have a dataset
of [describe dataset]. Please build a machine learning model that predict [target
variable].
ChatGPT is able to provide satisfactory code that:
This is good because it provides enough code to start off a model. It is also
technically sound and correct. However, it’s lacking because it only uses
accuracy for evaluation.
Prompt: I want you to act as an automatic machine learning (AutoML) bot using
TPOT for me. I am working on a model that predicts [...]. Please write python
code to find the best classification model with the highest AUC score on the test
set.
ChatGPT provided good code (it runs!) that:
I’m not at all surprised that ChatGPT excels here since the code is pretty much
only boilerplate code. This can be copied and pasted from the documentation as
is.
3. Tune Hyperparameter
Prompt: I want you to act as a data scientist and code for me. I have trained
a [model name]. Please write the code to tune the hyper parameters.
For a decision tree classifier, ChatGPT wrote code that:
Not bad at all, and the code runs! Grid search is the most popular method for
hyperparameter tuning, so it’s no surprise that it came up on top.
However, note that grid search goes through the search space exhaustively. The
larger the search space, the longer time it will take. An alternative is the
randomized search, which implements a randomized search over parameters.
Once assigned a budget, randomized search takes the same amount of time
regardless of the search space.
4. Explore Data
Prompt: I want you to act as a data scientist and code for me. I have a dataset
of [describe dataset]. Please write code for data visualisation and exploration.
For an exploration, ChatGPT wrote code that:
Prompt: I want you to act as a fake data generator. I need a dataset that has x rows
and y columns: [insert column names]
ChatGPT produced code that generated the data as per my requirements
completely. It only used the pandas library and the inhouse random library,
making it portable. I actually did use the synthetic dataset for other tasks.
6. Write Regex
However, note that I have cherrypicked this example. Before this, I had tried
asking ChatGPT to create more complex other regex patterns but it failed. It even
sounded confident when it explained the regex, but the regex it produced didn’t
work.
Prompt: I want you to act as a data scientist and code for me. I have a time series
dataset [describe dataset]. Please build a machine learning model that
predict [target variable]. Please use [time range] as train and [time range] as
validation.
ChatGPT provided code that:
Alright, here’s a controversial one. ChatGPT is not able to properly write code for
time series validation.
Here’s why: Without the prompt “please use data from A to B as training. Please
use data from Y to Z as validation”, ChatGPT uses sklearn’s train_test_split to
randomly splits the data points into train and validation. This is a huge no-no
because the train_test_split function randomly splits the data with no regard for
the temporal relationship between the datasets. Instead, it should have used all
data points from an earlier timeframe as train and later timeframe as test.
Prompt: I want you to act as a coder. I have trained a machine learning model on
an imbalanced dataset. The predictor variable is the column [Insert column
name]. In python, how do I oversample and/or undersample my data?
ChatGPT used the imblearn library to write boilerplate code that randomly under
and oversamples the dataset.
The code is sound, but I would nitpick on its understanding of over and
undersampling.
Thus, it should have attempted to split the dataset into train and test sets first,
before performing the under and oversampling.
Prompt: I want you to act as a data scientist and explain the model’s results. I
have trained a decision tree model and I would like to find the most important
features. Please write the code.
ChatGPT correctly provided the code for plotting the feature importance of a
decision tree. It even ranked the feature importances from the most important to
the least important. Kudos!
Prompt: I want you to act as a coder in python. I have a dataset [name] with
columns [name]. [Describe graph requirements]
ChatGPT could provide the correct code for plotting visualizations in Matplotlib.
Prompt: I want you to act as a coder. I have a folder of images. [Describe how
files are organised in directory] [Describe how you want images to be printed]
Again, ChatGPT can provide the correct code for plotting grid of images with
Matplotlib.
Again, I provided convoluted and long instructions, but ChatGPT could follow
them to a tee. (I personally loathe creating charts with Matplotlib. It is extremely
flexible but writing code in it is rather painful.)
Personally, I find this helpful because I do not need to refer to the documentations
for the boilerplate code. Overall, ChatGPT provided accurate code that does not
violate any principles.
Prompt: I want you to act as a data scientist and explain the model’s results. I
have trained a scikit-learn XGBoost model and I would like to explain the output
using a series of plots with Shap. Please write the code.
14. Write Multithreaded Functions
Prompt: I want you to act as a coder. Can you help me parallelize this code across
threads in python?
I’m pleasantly surprised to find that this code actually works without any
modifications. I can foresee myself doing this often in the future.
Prompt: I want you to act as a software developer. I would like to compare the
efficiency of two algorithms that performs the same thing in python. Please write
code that helps me run an experiment that can be repeated for 5 times. Please
output the runtime and other summary statistics of the experiment. [Insert
functions]
ChatGPT is able to write code that compares two function speeds with built-in
packages.
The code runs and is correct. Though, I would rather ChatGPT use
the %%timeit magic in Python instead of writing verbose code.
Prompt: I want you to act as a data scientist. I need to create a numpy array. This
numpy array should have the shape of (x,y,z). Please initialize the numpy array
with random values.
Prompt: I want you to act as a software developer. Please write unit tests for the
function [Insert function]. The test cases are: [Insert test cases]
Prompt: I want you to act as a data scientist. Please write code to test if that my
pandas Dataframe [insert requirements here]
Chapter 3: EXPLAIN CODE
19. Explain Python
Prompt: I want you to act as a code explainer. What is this code doing? [Insert
code]
I am extremely pleased with ChatGPT’s response with this. Not only is it able to
explain concepts for each function. It is also able to guess why the programmer is
performing a certain action. For example, it explained the
line np.clip(reconstructed, 0, 1) by saying that “this is likely done to ensure that
the values are within the valid range for pixel intensity value”, which is
absolutely correct.
Prompt: I want you to act as a data science instructor. Can you please explain to
me what this SQL code is doing? [Insert SQL code]
Again, a correct and an awesome explanation. Much like its explanations for
Python, ChatGPT is able to understand and explain SQL code.
Prompt: I want you to act as a Google Sheets formula explainer. Explain the
following Google Sheets command. [Insert formula]
ChatGPT is able to explain the FILTER function well. I like that it provided the
generalize code snippet like one would see in the documentation, and proceeded
to explain the general snippet with the specific arguments provided by the users.
In particular, the line the "Filter" function is being used to filter the range
"C2:C12" based on the condition "A2:A12=F2" is an excellent explanation is
contextualized to the user’s arguments.
Chapter 4: OPTIMIZE CODE
22. Improve Code Speed
Prompt: I want you to act as a software developer. Please help me improve the
time complexity of the code below. [Insert code]
A correct optimization made by ChatGPT from O(n) to O(1). However, note that
this is a simple example. You might want to test it on more complex cases to see
if this holds up.
23. Optimize Pandas
Prompt: I want you to act as a code optimizer. Can you point out what’s wrong
with the following Pandas code and optimize it? [Insert code here]
I personally think these suggestions are particularly salient. It pointed out that
there is nothing wrong with the code (a correct observation) and moved to to
suggest good code practices.
Prompt: I want you to act as a code optimizer. Can you point out what’s wrong
with the following Pandas code and optimize it? [Insert code here]
Of course ChatGPT is able to catch iterrow as an anti-pattern in Pandas. Love this
optimization.
Prompt: I want you to act as a code optimizer. The code is poorly written. How
do I correct it? [Insert code here]
26. Optimize SQL
Prompt: I want you to act as an SQL code optimizer. The following code is slow.
Can you help me speed it up? [Insert SQL]
The first two suggestions here are valid suggestions for optimization. Howeve,
the third tip is incorrect. In fact, DATEPART is actually faster than EXTRACT.
For that reason, I’ll have to dock stars off for this category.
Prompt: I want you to act as a code simplifier. Can you simplify the following
code?
If you’ve written spaghetti code in Python, ChatGPT might be able to help you
simplify it.
Chapter 5: FORMAT CODE
28. Write Documentation
Prompt: I want you to act as a code analyzer. Can you improve the following
code for readability and maintainability? [Insert code]
Prompt: I want you to act as a SQL formatter. Please format the following SQL
code. Please convert all reserved keywords to uppercase [Insert
requirements]. [Insert Code]
The code is formatted nicely and neatly with the correct indentation and casing.
Chapter 6: TRANSLATE CODE
31. Translate Between DBMS
Prompt: I want you to act as a coder and write SQL code for MySQL. What is the
equivalent of PostgreSQL’s DATE_TRUNC for MySQL?
Prompt: I want you to act as a code translator. Can you please convert the
following code from python to R? [Insert code]
Prompt: I want you to act as a code translator. Can you please convert the
following code from R to python? [Insert code]
Again, a correct translation here. However, note that this is a very straightforward
function. ChatGPT might fail on on more complex code blocks.
Chapter 7: EXPLAIN CONCEPTS
Overall, I love ChatGPT’s ability to explain concepts correctly with an awareness
of the audience. Through the examples below, you can see how ChatGPT is able
to explain different concepts to audiences of different caliber. When explaining a
difficult concept to a child, it uses an appropriate analogy. When explaining the
concept to a professor, it uses the appropriate tone and language. When
explaining the concept to a business stakeholder, it drew parallels to the business
jargons and contexts. five stars for each of this.
Prompt: I want you to act as a data science instructor. Explain [concept] to a five-
year-old.
35. Explain to Undergraduate
One tip that I have here is to explicitly prompt ChatGPT for code snippets,
sample tables and outputs for a more comprehensive answer.
Chapter 8: SUGGEST IDEAS
39. Suggest Edge Cases
Prompt: I want you to act as a software developer. Please help me catch edge
cases for this function [insert function]
For a relatively simple function, ChatGPT is able to correctly and exhaustively
suggests the edge cases. It exhaustively considered all cases without being
prompted on what the cases are.
Prompt: I want you to act as a data science career coach. I want to build a
predictive model for [...]. At the same time, I would like to showcase my
knowledge in [...]. Can you please suggest the five most relevant datasets for my
use case?
I like ChatGPT’s dataset suggestions because they are customized to my use case.
I have also tested this prompt for different personas and use cases, and the
suggestions it gives are consistently excellent.
However, note that I have only tested relatively simple use cases. For more
complex use cases that requires niche datasets, ChatGPT might not do well.
This is one of my favourite prompts. Its suggestions are relevant not only to
myself, but also to the industry that I am gunning for, and also for my dream
company. It does not suggest any generic projects like the Titanic or Iris project.
Instead, it makes salient suggestions based on its knowledge of the company
(Tesla, in this case), the industry (data science), and my background in
engineering.
I love this feature and I hope more people will try it out!
Prompt: I want you to act as a data science coach. I would like to learn
about [topic]. Please suggest 3 best specific resources. You can include [specify
resource type]
This one is for those looking to learn a new skill. I find that the resources that it
has suggested are not only relevant, but also critically acclaimed. It also does not
provide an overwhelming list of resources and instead provide only a few. I like
this because it certainly provides a good starting point for anyone hunting for new
resources.
Prompt: I want you to act as a software developer. Please compare the time
complexity of the two algorithms below. [Insert two functions]
For this very simple example, ChatGPT is able to deduce the time complexity.
Prompt: I want you to act as a data scientist and perform feature engineering. I
am working on a model that predicts [insert feature name]. There are
columns: [Describe columns]. Can you suggest features that we can engineer for
this machine learning problem?
I absolutely loved ChatGPT’s reponse here. The features it suggested are sound
and actually useful for the use case.
Prompt: I want you to act as a career advisor. I am looking for a role as a [role
name]. My background is [...]. How do I land the role and with what resources
exactly in 6 months?
I have no surprises that ChatGPT did well here. It is able to regurgigate oft-
repeated advice for career changers. While this is technically correct, I didn’t find
it overwhelming impressive.
Side note: One can combine this tip with Tip 41 (Suggest portfolio ideas) and 42
(Suggest dataset) when building one’s portfolio.
Chapter 9: TROUBLESHOOT
PROBLEM
47. Correct Own ChatGPT Code
Prompt: Your above code is wrong. [Point out what is wrong]. Can you try again?
ChatGPT is quick to admit its mistakes and correct its code. In the above
example, I asked ChatGPT to create a shell command. I realized that it does not
actually run, so I pointed out what was wrong. ChatGPT is able to provide the
correct code on the second attempt.
This is good enough if you know where the bug is exactly. But in cases where
you do not know where the bug is, ChatGPT might not be able to pinpoint where
the fault is. The onus is still on the coder to figure things out.
Prompt: I want you to act as a SQL code corrector. This code does not run
in [your DBMS, e.g. PostgreSQL]. Can you correct it for me? [SQL code here]
A correct explanation for a simple query. However, upon further testing, I did
find ChatGPT’s understanding of complex SQL query to be somewhat lacking.
It’s not able to debug more complex SQL queries containing multiple tables.
Prompt: I want you to act as a PowerBl modeler. Here is the details of my current
project. [Insert details]. Do you see any problems with the table?
ChatGPT is able to suggest relevant improvements to the PowerBI model. It also
admitted its shortcoming (“It’s difficult to determine if there are any problems
without more context”)
Mathias Halkjær Petersen had a good chat with ChatGPT on this topic. He
showed that ChatGPT is able to troubleshoot PowerBI models effectively when
more context is provided. It is also able to write the models relatively
independently.
Chapter 10: WRITE SQL
51. Create Running Average
Prompt: I want you to act as a data scientist and write SQL code for me. I have a
table with two columns [Insert column names]. I would like to calculate a running
average for [which value]. What is the SQL code that works for PostgreSQL 14?
ChatGPT did provide a good response to the requesst here! It correctly provided
the output for creating running averages in SQL. It also provided a correct
explanation and breakdown.
One tip that I have is to provide ChatGPT with the exact version and flavour of
SQL. This allows it to create versions of the data.
I would encourage practitioners to create test ChatGPT with their own SQL
questions and see if it creates a correct response.
Prompt: Assume you are given the tables… with the columns… Output the
following…
ChatGPT could solve an introductory Leetcode question (Difficulty: Simple) on
SQL. The above code is sound and answers the question correctly.
However, it failed to produce runnable SQL code for a Leetcode question with
medium difficulty when I tested it.
Chapter 11: WRITE OTHER CODE
53. Write Google Sheets Formula
Prompt: I want you to act as a bot that generates Google Sheets formula. Please
generate a formula that [describe requirements]
ChatGPT excels at acting as a lookup table for all commands. As you can see, I
asked it for a function that pulls one sheet and puts it into another. It faithfully
provided the correct importrange.
54. Write R
Prompt: I want you to act as a data scientist using R. Can you write an R script
that [Insert requirement here]
ChatGPT sure excelled at this basic R command. This code runs and indeed
creates the plot as requested.
However, do note that I did not test ChatGPT on R coding extensively. I can
imagine it failing or producing incorrect codes for more complex cases.
Prompt: I want you to act as a Linux terminal expert. Please write the code
to [describe requirements]
These are the correct commands for the requested instructions. It even provided a
line for granting the necessary permission to the file without me explicitly asking
it to
However, note that I cherrypicked this example out of all the prompts for writing
shell commands. In particular, when I asked ChatGPT to copy files from a folder
recursively, it gave an incorrect output that could not be executed. In its defense,
it’s able to correct its own code (See “47. Correct Own ChatGPT Code” above.)
Prompt: I want you to act as an Excel VBA developer. Can you write a VBA
that [Insert function here]?
Chapter 12: MISCALLANEOUS
57. Format Tables
Prompt: I want you to act as a document formatter. Please format the following
into a nice table for me to place in Google Docs? [insert text table here]
The output speaks for itself. It is able to convert a table that I copied from the
internet into a nicely formatted table.
However, ChatGPT cannot offer any customization to the table. I asked it to
change the colors and font, but it was not able to.
Prompt: I want you to act as a technical book summarizer. Can you please
summarize the book [name] with 5 main points?
Knaflic’s book is a gold mine of data storytelling tips. I personally feel that the
summary was too generic and did not capture the tips sufficiently.
However, note that this is possible because enough people have written reviews
or summaries of the above paper. It will not be able to summarize papers that are
not as popular well.
Also, ChatGPT does not have an understanding of the world beyond 2021. Thus,
it will not be able to help you understand new papers.
Prompt: I want you to provide emotional support to me. [Explain problem here.]
Alright I’m no psychologist, so I do not think I am qualified to judge this. That
said, I personally think that this is a response that is both empathetic and
practical. It also showed a good understanding of the data science process.
Closing
ChatGPT performs well in explaining things and providing boilerplate code.
However, it falters when it comes to debugging or creating complex code
snippets.
Will ChatGPT replace data scientists? No, I don’t think so. At least not yet.
However someone using ChatGPT might.