0% found this document useful (0 votes)
18 views

Automate_ChatGPT_Prompts_for_Data_Science_with_Python_Enhanced_Coding

The document outlines a guide for automating ChatGPT prompts specifically for data science tasks using Python. It includes various chapters detailing prompts for writing and optimizing code, explaining concepts, suggesting ideas, troubleshooting, and more, with practical examples provided. The author, Thomas Carroll, emphasizes the utility of these prompts in enhancing productivity and efficiency in data science projects.

Uploaded by

Gregory Vodopyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Automate_ChatGPT_Prompts_for_Data_Science_with_Python_Enhanced_Coding

The document outlines a guide for automating ChatGPT prompts specifically for data science tasks using Python. It includes various chapters detailing prompts for writing and optimizing code, explaining concepts, suggesting ideas, troubleshooting, and more, with practical examples provided. The author, Thomas Carroll, emphasizes the utility of these prompts in enhancing productivity and efficiency in data science projects.

Uploaded by

Gregory Vodopyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

AUTOMATE CHATGPT

PROMPTS FOR DATA


SCIENCE WITH PYTHON

POL CAMARILLO
About the Authors

Thomas Carroll is currently a professional software developer. After


spending almost a decade working as a teacher in a prestigious school in his
hometown, he decides to devote herself to his greatest passion,
programming. Having dedicated his initial working life to teaching, he
decided, in addition to teaching through private courses, to write some
programming books for beginners. If you want to know more about
programming and machine learning I can help you.

Table of Contents

Table of Contents
About the Authors
Table of Contents
Automate ChatGPT Prompts for Data Science with Python
Chapter 1: ChatGPT Prompts for Data Science (Tried, Tested, and Rated)
Automate data science tasks with ChatGPT
How to Read this
Table of Contents
Write python
Explain code
Optimize code
Format code
Translate code from one language to another
Explain concepts
Suggest ideas
Troubleshoot problem
Write SQL
Write other Code
Miscellaneous
Chapter 2: WRITE PYTHON
1. Train Classification Model
2. Automatic Machine Learning
3. Tune Hyperparameter
4. Explore Data
5. Generate Data
6. Write Regex
7. Train Time Series
8. Address Imbalance Data
9. Get Feature Importance
10. Visualize Data with Matplotlib
11. Visualize Image Grid Matplotlib
12. Explain Model with Lime
13. Explain Model with Shap
14. Write Multithreaded Functions
15. Compare Function Speed
16. Create NumPy Array
17. Write Unit Test
18. Validate Column
Chapter 3: EXPLAIN CODE
19. Explain Python
20. Explain SQL
21. Explain Google Sheets Formula
Chapter 4: OPTIMIZE CODE
22. Improve Code Speed
23. Optimize Pandas
24. Optimize Pandas Again
25. Optimize Python
26. Optimize SQL
27. Simplify Python
Chapter 5: FORMAT CODE
28. Write Documentation
29. Improve Readability
30. Format SQL
Chapter 6: TRANSLATE CODE
31. Translate Between DBMS
32. Translate Python to R
33. Translate R to Python
Chapter 7: EXPLAIN CONCEPTS
34. Explain to Five-Year-Old
35. Explain to Undergraduate
36. Explain to Professor
37. Explain to Business Stakeholder
38. Explain Like StackOverflow
Chapter 8: SUGGEST IDEAS
39. Suggest Edge Cases
40. Suggest Dataset
41. Suggest Portfolio Ideas
42. Suggest Resources
43. Suggest Time Complexity
44. Suggest Feature Engineering
45. Suggest A/B Testing Steps
46. Career Coaching
Chapter 9: TROUBLESHOOT PROBLEM
47. Correct Own ChatGPT Code
48. Correct Python Code
49. Correct SQL Code
50. Troubleshoot PowerBI Model
Chapter 10: WRITE SQL
51. Create Running Average
52. Solve Leetcode Question
Chapter 11: WRITE OTHER CODE
53. Write Google Sheets Formula
54. Write R
55. Write Shell
56. Write VBA
Chapter 12: MISCALLANEOUS
57. Format Tables
58. Summarize Book
59. Summarize Paper
60. Provide Emotional Support (My Favourite)
Closing

Automate ChatGPT Prompts for


Data Science with Python
Chapter 1: ChatGPT Prompts for
Data Science (Tried, Tested, and
Rated)
Automate data science tasks with ChatGPT

I rated 60 ChatGPTfunctions for Data Science. Use these prompts and ask
ChatGPT to write, and explain code, optimize data science code. It can also
explain data science concepts, suggest ideas, and troubleshoot problems.

How to Read this


The table of content contains all 60 prompts. Click on the links in the
table of content to navigate to the prompt, ChatGPT’s response and the
rating.

Table of Contents
Write python

Train Classification Model

Automatic Machine Learning


Tune Hyperparameter

Explore Data

Generate Data

Write Regex

Train Time Series

Address Imbalance Data

Get Feature Importance

Visualize Data with Matplotlib

Visualize Image Grid Matplotlib

Explain Model with Lime

Explain Model with Shap

Write Multithreaded Functions

Compare Function Speeds

Create NumPy Array

Write Unit Test

Validate Column
Explain code

Explain Python

Explain SQL

Explain Google Sheets Formula

Optimize code

Improve Code Speed

Optimize Pandas

Optimize Pandas Again

Optimize Python

Optimize SQL

Format code

Simplify Python

Write Documentation

Improve Readability

Format SQL

Translate code from one language to another


Translate Between DBMS

Translate Python to R

Translate R to Python

Explain concepts

Explain to Five-Year-Old

Explain to Undergraduate

Explain to Professor

Explain to Business Stakeholder

Explain Like Stackoverflow

Suggest ideas

Suggest Edge Cases

Suggest Dataset

Suggest Portfolio Ideas

Suggest Resources

Suggest Time Complexity


Suggest Feature Engineering

Suggest A/B Testing Steps

Career Coaching

Troubleshoot problem

Correct ChatGPT Code

Correct Python Code

Correct SQL Code

Troubleshoot PowerBI Model

Write SQL

Create Running Average

Solve Leetcode Question

Write other Code

Write Google Sheets Formula

Write R

Write Shell

Write VBA
Miscellaneous

Format Tables

Summarize Book

Summarize Paper

Provide Emotional Support


Chapter 2: WRITE PYTHON
1. Train Classification Model

Prompt: I want you to act as a data scientist and code for me. I have a
dataset of [describe dataset]. Please build a machine learning model that
Next, we will handle missing values and encode categorical variables, if necessary.

We will also need to ensure that the data is in the correct format for modeling. This
may involve tasks such as scaling numerical columns or one'hot encoding

categorical variables.

|°| Copy code

# Handle missing values


data = data.fillna(data,mean{)]

# One-hot encode categorical variables


data = pd.get_d mutinies (data)

# Scale numerical columns


from skiearn.preprocessing import StandardScaler
scaler ■ StandardScalerQ
data[[' rium_column_l', ’nurti_coUm_Z’, 'num_coluitin_3']] -
scaler,fit_transfcrm(data[[ nun_toLui , ■nJ"-_cct„Tn_Z',
' nurr_column_3']])

Now that the data is cleaned and prepared, we can split ft into training and testing
sets and build a machine learning model.

Q Coftycoda

# Split data into training and testing sets


X ■ data, drgp(' churn , axis* J
y ■ data[ .hurt ]
X_train, X_test, y_train, y_te$t - train_test_splitCX, yt
test_size- .2, random_state- 2)

# Build model
from skiearn.ensemble import RandomForestClassifier

model - RandomForestClassifierO
model.fitQLtrain, y_train)
Finally, w can evaluate the model on the testing set.

0 Copy code

# Evaluate model
fror skiearn.metrics import accuracy_score

predictions - model ,piredict(X_test)


accuracy = accuracy_scorety_test, predictions)
printf ecuracy , accuracy)

ChatGPT is able to provide satisfactory code that:

provide the regular preprocessing (scaling the data and encoding


categorical variable),

drop the testing column

provide the train-test split

evaluate the model

This is good because it provides enough code to start off a model. It is also
technically sound and correct. However, it’s lacking because it only uses
accuracy for evaluation.

2. Automatic Machine Learning

Prompt: I want you to act as an automatic machine learning (AutoML) bot


using TPOT for me. I am working on a model that predicts [...]. Please
write python code to find the best classification model with the highest
AUC score on the test set.

i want you to act as an automatic machine learning (AutoML) hot using TPOT for me. I am
working on a model that predicts whether person will default on a credit card loan In
October Please write python code to find the best classification model with the highest
AUG score on the test set.

Certainly! Here is some code that you can use to find the best class if ication model
using TPOT:

Copy code

Lmpor-t pandas a* pd
from skIe □ rn.madeI_s electIon Import t r a i n_t e st_s p I it
from tpot import TPtJTClossifter

# Load the data


df — pd. read _cs" c redi t_ card_de faul t_ data. c sv >

# Extract the target column


y - df["default"]

# Extract the features


X — df.dropC iefault”। axis-1J

# Split the data into training and test sets


X_traint X_tcsts y_trainB y_test - train_test_splitCX, y9
tcst_si zc- ■7]
ti Create a TPOT classifier
tpot = TPOTClassifier(s€oring= roc.auc , verbosity= )

# Fit the classifier to the training data


tpot .fit(Orain, y.train)

# Score the classifier on the test set


print(tpot.score()Ltest, y.test))

# Export the best model as a Python script


tpotexport( es lodel )

ChatGPT provided good code (it runs!) that:

provide the train-test split

instantiated the AutoML classifier

fit it to the AutoML classifier

export the classifier

I’m not at all surprised that ChatGPT excels here since the code is pretty
much only boilerplate code. This can be copied and pasted from the
documentation as is.
3. Tune Hyperparameter

Prompt: I want you to act as a data scientist and code for me. I have trained
a [model name]. Please write the code to tune the hyper parameters.

J I want you to act as a data scientist and code for me. I have trained a scikit-learn decision
tree. Ptease write the code to tune the hyper parameters.

To tune the hyperparameters of a decision tree model using scikit-learn, we can use
the GridSearchCV' function from the * s k leo r n. nod el_ sei ection' module.
This function allows ms to specify a range of values for each hyper parameter, and it

will train and evaluate a model for each combination of hyperparameter values.

|°| Copy code

# Import libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model-selection impart GridSearchCV

# Load data
data = pd.read_csv( do m. csv’)
X = data.drop(1 targe. , 0x15=1)
y = dataftarget’]
it Create a decision tree classifier
model = DecisionTreeClassifierO

# Define the hyperparameter grid


param_grid = {'inax_dcpLh1 : [2, 6, 8, 10],
1 min .samples leaf': [2, 4, 6, 8, 10]}

it Create the grid search object


grid_5earch = GridSearchCVfniodel, param_grid, cv=L.f
scoring= )

it Fit the grid search object to the training data


grid_search,fit(X, y)

it Get the best hyper parameters


printC ' :5cs: riax_dcpt h : ’, grid_searchrbest_param5_['rna>:_depth '])
print(’ Best min .sanpies. leaf:',
grid_search.best_parans_[ ■ samples b if’])

For a decision tree classifier, ChatGPT wrote code that:

performed a grid search

suggested the best parameters.

Not bad at all, and the code runs! Grid search is the most popular method
for hyperparameter tuning, so it’s no surprise that it came up on top.

However, note that grid search goes through the search space exhaustively.
The larger the search space, the longer time it will take. An alternative is the
randomized search, which implements a randomized search over
parameters. Once assigned a budget, randomized search takes the same
amount of time regardless of the search space.
4. Explore Data

Prompt: I want you to act as a data scientist and code for me. I have a
dataset of [describe dataset]. Please write code for data visualisation and
exploration.
# Summary statistics of the numerical col inns
data.describe()

# Count plots of the categorical columns


far col in data-select_dtypes(include= abject')-columns:
sns.countplot ( x=col, da ta=da ta)
plt.showQ

# Correlation matrix of the numerical columns


sns.heatmapCdata.select_dtypes(exclude= ect j.corrQ, annot-True)

# Box plots of the numerical columns by target


for col in data.select_dtypes(exclude=’object’).columns:
sns.boxplotCx='churn’, y=col, data=data)
plt.showQ

# Bar plots of the categorical columns by target


for col in data.select_dtypes(include='object').columns:
sns.barplotCx='churn’, y=col, data=data)
pit. showQ

For an exploration, ChatGPT wrote code that:

used df.describeO to provide descriptive statistics

created count plots for visualizing the distribution of categorical


variable

created a heatmap to determine the between numerical variables



determined the relationship of numerical variables and the target
variable with box plots of numerical columns
determined the relationship of categorical variables and the
target variable with barplot.

The above choices of charts are indeed correct.

5. Generate Data

Prompt: I want you to act as a fake data generator. I need a dataset that has
x rows and y columns: [insert column names]

fcl I want you to act as a fake data generator. I need a dataset that 1,000 rows and 3
columns: "customerjd", "product", and "revenue".

Certainly!. Here is some example code that generates a fake dataset with 1,000 rows
and three columns: "cu5tomer_id"h "product", and "revenue":

□ copy code

i'par: randan
import pondos os pd

# Set the product names


products — ["Product A", "Product B", "Product C", "Product 0",
"Product t", "Product F]

# Set the customer names


customers " ["Customer 1*, "Customer 2", "Customer 3"r "Customer 4",
"Custouwr 5" J

# Create an empty dataframe


df - pd.DataFrame<eolumns-["custe<rter_id", "product", "revenue"]J
# Generate 1,00G rows of fake data
for i in rangeC ):
a Select □ random product
product - randomLchoiccoproducts)

If Select a random customer


customer = random,chai«(customers)

# Generate a random revenue amount


revenue - random, ran di nt( >■ , U )

# Add the data tO the datafrane


df - df,append<{ । ■■ ■■ id": customer, • : product,
"revenue : revenue}, ign ore .index-true)

If Preview the data


df.headO

ChatGPT produced code that generated the data as per my requirements


completely. It only used the pandas library and the inhouse random library,
making it portable. I actually did use the synthetic dataset for other tasks.

6. Write Regex

Prompt: I want you to act as a coder. Please write me a regex in Python


that [describe regex]
I want you to act as a coder. Please write me a regex in python that would match any
strings starts with an alphanumeric character, contains the word "data" contains the
word '‘science11, and ends with an exclamation point.

Surel Here Is a regex In python that will match strings that start with an
alphanumeric character, contain the word "data", contain the word “science", and
end with an exclamation point;

Q Copy code

import re

d»f molch stringCstring};


regex •» r"A[a-iA-Z0-9J . ■ lata . ’•science.. *| 1"
return boal(re*inatch<regex, string}}

This regex uses the following elements:

' 'A': This matches the start of the string.

* ' [a-zA-Z0-9] *: This matches any a I ph anomeric character.

‘ \ *': This matches any number of any characters (except a newline), including

none.

* 'data': This matches the string "datah.

’ ' science': This matches the string "science11.

’ '!This matches the exclamation point.

* ' $': This matches the end of the string.

The 'mat ch .string' function will return 'True' if the string matches the regex,

and 'False' if it does not.

Here, ChatGPT provided code that matched a regex as per requirements in


Python.
However, note that I have cherrypicked this example. Before this, I had
tried asking ChatGPT to create more complex other regex patterns but it
failed. It even sounded confident when it explained the regex, but the regex
it produced didn’t work.

7. Train Time Series

Prompt: I want you to act as a data scientist and code for me. I have a time
series dataset [describe dataset]. Please build a machine learning model that
predict [target variable]. Please use [time range] as train and [time range] as
validation.

I want you to act as a data scientist and code for me. I have a time series dataset of sales
revenue from 2Q19 to 2022. Please build a machine learning model that predict sales
revenue for 12 months in 2023, Please use 2019 to 2021 data as train and 2022 data as
validation set. Please do not write code for data visualisation and exploration.

To build a machine learning model for forecasting future sales revenue, we will first

need to import the necessary libraries and load the data into our environment

Q Copy cade

# Invert libraries
import pandas as pd
import numpy as np

# Load data
data = pd.read_csvC'data.csv')
Next, we will handle missing values and encode categorical variables, if necessary.
We will also need to ensure that the data is in the correct format for modeling. This
may involve tasks such as scaling numerical columns and creating lag variables.

f°| Copy ccae

# Handle missing values


data = data.fillnaCdata.ineariO)

# One-het encode categorical variables


data - pd.get_dunimies(;data)

# Scale numerical columns


from skiearn.preprocessing import StandardSealer
scaler = StandardSealerQ
data[[ I.- - scaler,fit_transform(data[[ Les

# Create lag variables


far i Ln rangeCi, 13}:
data[ Ing I .format(i)J - data.sales.shiftfi)
data = data, dropnaO

Now that the data is cleaned and prepared, we can split it into training and

validation sets and build a machine learning model.

|°] Copy code

# Split data into training and validation sets


train = data[data.date < ’2022’]
valid = datafdata.date >= '2022']

X_ti*ain = train.dropCsales', axis=l)


y.train = train[’sales']
X_valid = valid.drop('sales', axi$=1)
y_valid = valid['sales’]
Ta build the model, v^e will use a random forest regressor from the ' skleorn'
library.

Q Copy code

# Build model
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressorO
model. ruin, y_troirf)

Finally, we can evaluate the model on the validation set.

Q Copy code

# Evaluate model
from skiearn.metrics inp rt mean_absolute_error

predictions ■ model .predict(X_vtilid)


mae = metiri_absolute_errorty_valid, predictions}
printf’Mean Absolute Error:', mne}

ChatGPT provided code that:

preprocessed the data using standard scaling

Created lag variables

Trained a RandomForestRegressor for the evaluation

Used Mean Absolute Error for evaluation

Alright, here’s a controversial one. ChatGPT is not able to properly write


code for time series validation.
Here’s why: Without the prompt “please use data from A to B as training.
Please use data from Y to Z as validation”, ChatGPT uses
sklearn’s train_test_split to randomly splits the data points into train and
validation. This is a huge no-no because
the train_test_split function randomly splits the data with no regard for the
temporal relationship between the datasets. Instead, it should have used all
data points from an earlier timeframe as train and later timeframe as test.

8. Address Imbalance Data

Prompt: I want you to act as a coder. I have trained a machine learning


model on an imbalanced dataset. The predictor variable is the
column [Insert column name]. In python, how do I oversample and/or
undersample my data?

I want you to act as a coder, I have trained a machine learning model on an imbalanced
dataset. The predictor variable is the column "scam". The dataset has 1% scam dataset
and 99% non-scam dataset. In python, how do I eve ream pie and/or undersample my
data?

To oversample the minority class in your dataset, you can use the ‘ SMOTE"
(Synthetic Minority Oversampling Technique) method, which is available in the
'imbleorn' library. Here Is an example of how to use it:

|“| Copy code

from i mb "I earn _ over_sompling import SMOTE

# Split the dataset into features and labels


X = df .drop{ ’ scam’ , axis=.)
y = df[ ’scam’J

# Oversample the minority class using SMOTE


smote = SMOTEO
X_over, y_over = smote.fit.resamplefX, y}
To undersample the majority class, you can use the 'RandomUnderSampler'

method from the ' imblearn' library. Here is an example of how to use it:

Pl Copy code

from imblearn.under_sampling import RandcmllnderSampler

# Split the dataset into features and labels


X ■ df.drapCscan’, axis* )
y = df[scam]

# Undersample the majority class using RandomUnderSampler


rus « RandomUnderSampler()
X_undert y_under - rusrfit_resample(Xt yj

ChatGPT used the imblearn library to write boilerplate code that randomly
under and oversamples the dataset.

The code is sound, but I would nitpick on its understanding of over and
undersampling.

Undersampling and oversampling should only be done on the train


dataset. It should not be done on the entire dataset, which includes the test
dataset. Otherwise, the evaluation will be done on a resampled test set that
no longer reflects the actual distribution of the real dataset.

Thus, it should have attempted to split the dataset into train and test sets
first, before performing the under and oversampling.

9. Get Feature Importance


Prompt: I want you to act as a data scientist and explain the model’s results.
I have trained a decision tree model and I would like to find the most
important features. Please write the code.___________________________
MJ I want you to act as a data scientist and explain the model's results. I have trained a
decision tree model and I would like to find the most important features. Please write the
code.

Certainly! Here is some code that you can use to find the most important features in
a decision tree model:

Q Cop/code

import nuffipy as np

# Load your model and the data that you used to train it
model = # Load your model
X_train ■ # Load your training data

H Get the feature importances from the model


importances = model.feature_importances_

# Sort the feature importances in descending order


indices ™ np . argsort £ importances^ : : ]

it Print the feature ranking


prints 'Feature ranking;’}

for f in rangeCX_train.shapef:]};
printC'Xd. feature %d £%O” % £f + 1, indicesffj,
importances[indicesCfJ]

ChatGPT correctly provided the code for plotting the feature importance of
a decision tree. It even ranked the feature importances from the most
important to the least important. Kudos!

10. Visualize Data with Matplotlib

Prompt: I want you to act as a coder in python. I have a dataset [name] with
columns [name]. [Describe graph requirements]
L1 I want you to act as a coder in python. I have an dataset called iris_df. It has the columns
Petal Length, Petal Width, Sepal Length, Sepal Width and Species Type. On the left, I
want a graph that depicts the relationship between Petal Width and Petal Length. On the
right, I want a graph that depicts the relationship between Sepal Width and Sepal Length.
For both graphs, please differentiate between the different species. Please make It
visually pleasing and colorblind-friendly. Please use matplotlib, ensure that the graph is
large enough for a presentation, and save the output to a file called "out put .png"

Certainly! Here is some code that should accomplish what you've described:

Q Copy code

inport matplotlib.pyplot as pit

# Extract data for Petal Width and Petal Length


petal_width = iris_df[1 Petal Width ]
petal_length = iris_df[ Petal Length']
species = iriSjtfCSpeci.es Type ]

# Create a figure with two subplots


fig, (axl, ax£) - plt.subplotsCl, 2, figsize< - , 5)]
# Plot the relationship between Petal Width and Petal Length on the
left
for spec in setfspecies)!
spec_dF = iris_df[iris_df[ ipeci 5 Type ] = spec]
axl,scatter(spec_df[hPetal Width’], spec_df[ Petal Length'],
label=spec)
axl.set_xlabel( Jeta,l fidth’)
axl.set_ylabel[ Petal Length')
axl.legendO

# Plot the relationship between Sepal Width and Sepal Length on the
right
for spec Ln set(species);
spec_df = irisudf[iris.jdf[’Species Type ] == spec]
axZ.scatter(spec_df['Sepal '■ dth ], spec_df[ Sepal Length’],
label=spec)
ax2,set_xlabel( epal Width )
ax2.set^yl abel('Sepal Length')
ax2.legendC)

# Save the figure to a file


fig,savefig( utput.pr )

# Show the plot


pit. showQ

ChatGPT could provide the correct code for plotting visualizations in


Matplotlib.

I am particularly impressed because I provided very long and rather


convoluted instructions, and ChatGPT followed them to a tee. Here is the
output from the code:
11. Visualize Image Grid Matplotlib

Prompt: I want you to act as a coder. I have a folder of images. [Describe


how files are organised in directory] [Describe how you want images to be
printed]

I want you to act as a coder. Il have a folder of images. The subdirectories are "TRAIN”
and ’’TEST'". Within "Train” and "Test" are HO foiders from "O'" to '"Q". Within each folder,
there are JPGr JPEG and PNG images. Write code that randomly selects 20 images from
the TRAIN subdirectory and display them in a 4 by 5 grid. Repeat the same for the test
dataset.

Surer Hero is some code that should accomplish what you've described:;

Q Copy code

import os
import randOrti
import mdtplCjit I ib . pypltjt ci.% pit
from mat pl Pt lib■, Image import imreaicl

# Set; "the base directory


basc_di r ■ " path/to/your/fa1der *

# Set the subdirectories


train_dir = os.path.joirCbase_dir, 'TRAIN'3
test_dir = os.path.joinCbase.dir^ ’TEST'}

# Set the number of images to display


num_images — 20
# Select random images from the train directory
train_images = □
for i in range£l0}:
subdir = os.path.join£trairLdir, str£i})
files = [os,path.joinCsubdir, f) for f in os,listdir£subdir) if
f,endswith<C - jpg'r ’-jpeg', ’■png'
train_imoges += random,sample<files, num_images // 10)

# Select random images from the test directory


test_images = □
for i in rangc£) J:
subdir ■ os.path.joinCtest_dir, str£i})
files - [os.path.jainCsubdir, f) for f in os,listdir£subdir) if
f.endswithCC jpg f -jpeg’, ’.png'))]
test_images +■ random.scunpleCfiles, num_images // 10}

# Cisplay the train images


fig, ax ™ pIt.subplots£i, b, figsize»£'H, 2 })
for' i, image_path in enumerate£train_images}!
axfiZ/ ] QiJS ] . imshow£iniread(image_pdth}}
pit.showC)

# Display the test images


fig, ax = pit.subplots0, 5, figsize=£20T 20}}
for i, image-path in enumerate£test_images}:
ax[i//>][i% ].imshow£imread(image.path}}
pit. showQ

Again, ChatGPT can provide the correct code for plotting grid of images
with Matplotlib.

Again, I provided convoluted and long instructions, but ChatGPT could


follow them to a tee. (I personally loathe creating charts with Matplotlib. It
is extremely flexible but writing code in it is rather painful.)

12. Explain Model with Lime


Prompt: I want you to act as a data scientist and explain the model’s results.
I have trained a [library name] model and I would like to explain the output
using LIME. Please write the code.

l want you to act as a data scientist and explain the model's results. I have trained a
scikit-learn K-nearest-neighbour model and I would like to explain the output using LIME.
Please write the code.

Certainly! Here is some code that you can use to explain the results of a scikit-learn

K-nearest-neighbors model using LIME (Local Interpretable Model-agnostic


Explanations):

Q Copy code

import lime
import lineAine_tabular
import numpy as np

# Load your model and the data that you used to train it
model - # Load your model
X_train = # Load your training data
X.test - # Load the test data that you want to explain

# Create a LIME explainer object


explainer = lime. I iitie_tabular.LimeTabular Explainer (X_t rain,
featu re_name s=featu re_name S, class_nameS-class. nameS,
kernel_width=3)
# Choose an instance to explain
i - # Choose an index of the test instance that you want to explain
x - x_test[i]

# Explain the prediction for this instance


exp = explainer.explain_instanceCx, model.predict_proba,
num_features=5, tap_labels=l)

# ■ Print the explanation


print(exp.as_ListO)

This code will create a LIME explainer object, which you can use to generate an
explanation for a single instance in your test dataset. The explanation will consist of
a list of the most important features that contributed to the model's prediction,
along with the effect of each feature on the prediction.

ChatGPT provided code for explaining the model with


LimeTabularExplainer and explained an instance with the correct
commands.

Personally, I find this helpful because I do not need to refer to the


documentations for the boilerplate code. Overall, ChatGPT provided
accurate code that does not violate any principles.

13. Explain Model with Shap

Prompt: I want you to act as a data scientist and explain the model’s results.
I have trained a scikit-learn XGBoost model and I would like to explain the
output using a series of plots with Shap. Please write the code.
I want you to act as a data scientist and explain the model's results. I have trained a
scikit-learn XG Boost model and I would like to explain the output using a series of plots
with Shap. Please write the code.

Certainly! Here is same code that you can use to create a series of plots using Shap

(SHapley Additive exPlanations) to explain the results of a scikit-learn XGBoost

model:

Q Copy Cddt

import Shop
import matplotlib.pyplot as pit

# Load your model and the data that you used to train it
model - # Load your model
X_troin = # Load your training data

# Create a SHAP explainer object


explainer — shap.TreeExplainerCmodeO

# Explain the model’s predictions on the training data


shap_values = explainer. shap_valuesCX_tr<ain}

# Loop through the features and create a plot for each one
for i in rainge(X_train.shape[l]):
# Create a subplot
pit. subplot (Strain .shape[l], 1, i+ )

# Create a SHAP summary plot


shap,suirniary_plot(shap_v(ilues[:,i], X_train, plot_type= bar",
stiow^False)

# Show the plot


plt.showO

14. Write Multithreaded Functions


Prompt: I want you to act as a coder. Can you help me parallelize this code
across threads in python?
# Start the threads
threadl.startQ
thread?. startQ

This will create two threads, 'threadl' and 'thread?', and start them
simultaneously. The 'print_time' function will be executed by both threads, with
the 'threadName' and 'delay' arguments being passed to it from the 'args'
parameter in the 'Thread' constructor.

I’m pleasantly surprised to find that this code actually works without any
modifications. I can foresee myself doing this often in the future.

15. Compare Function Speed

Prompt: I want you to act as a software developer. I would like to compare


the efficiency of two algorithms that performs the same thing in python.
Please write code that helps me run an experiment that can be repeated for 5
times. Please output the runtime and other summary statistics of the
experiment. [Insert functions]
I want you to set as a software developer. I would like to compare the efficiency of two
algorithms that performs the same thing in python. Please write code that helps me run
an experiment that can be repeated for 5 times. Please output the runtime and other
summary statistics of the experiment.

Function 1:
defcoinputeKa, b);
return (a + b) / (a * bj
Function 2;
Import numpy as np
def compute2(a, b):
return np.divide(np.sum([a,b]np.multipliy(a,b))

As a software developer, II would suggest the following code to run an experiment


comparing the efficiency of the two algorithms* which can be repeated for 5 times
and outputs the runtime and summary statistics of the experiment:

n CO0y code

import time
import statistics

dtr* mr ■ Lire 1 Ln (fiine, a,


start — tihie.perf_coijrtterO
functa,
end — time. perf.counLerQ
return end - start

def run_^xperiment(funcl, func2, af b, num.repeat»-53 :


timesl — fl
timesZ — □

far 1 in range(num_repeats
t i mes 1.. a ppe ndC mea sli r e_ ti me C fu nd., a, b}}
timesZ .append<measLire_ti.meCfuncZ , a,

meon_timel —■ sum(timeslj / 1 en{ ti me s 1J


mcan_timc2 — suitiCtimcsZ} / leri<times2>
stdevl = stotistics,stdev(timeslj
stdevZ — statistics.stdevCtimes23

printCf'tfuncl___ name__ J: mean time - {rnean_tirnel: .6f}s, stdev -


{stdevl:,6f}s >
pri ntCf"{fphc2___ name__ }: mean time - {meen_time2 : .6f}s, stdev -
{stdev2 : . Gt]s J
if mean_tiniei t mean_ti-nie2;
printCf " {fund. __name__ i is faster than {fund. name [ "J
eliT mean_timel > mean_"timeZ;
printCf " {fund . name } is faster than {fund. name
else;
print<f " {fund name } and {func2.__nam« } have the same
performance"3

# Example usage
run_experiinentCcoinputel, computeZ, 10, 203
ChatGPT is able to write code that compares two function speeds with
built-in packages.

The code runs and is correct. Though, I would rather ChatGPT use
the %%timeit magic in Python instead of writing verbose code.

16. Create NumPy Array

Prompt: I want you to act as a data scientist. I need to create a numpy array.
This numpy array should have the shape of (x,y,z). Please initialize the
numpy array with random values.

I want you to act as a data scientist. I need to create a numpy array. This numpy array
should have the shape of (10, 3, 4, 5), Please initialize the numpy array with random
values between 1 to 5.

Sure! Here's how you can create a numpy array with the desired shape and
initialized with random values between 1 and 5:

Q Copy code

■impart; numpy 05 np

# Create a numpy array with the desired shape


arr - np.random.randintfl, 6, size-ClG, 3, 4, 5}}

#• Print the array to verify its contents


printCarrJ

This will create a numpy array with a shape of (TO, 3, 4, 5) and fill it with random
integers between 1 and 5 (inclusive).

I hope this helps! Let me know if you have any questions.

This simple command results in a simple function. Here, ChatGPT


demonstrated its ability to translate human language into an appropriate
function and parameter.

The code is correct, and I have zero complaints here.

17. Write Unit Test

Prompt: I want you to act as a software developer. Please write unit tests for
the function [Insert function]. The test cases are: [Insert test cases]

18. Validate Column

Prompt: I want you to act as a data scientist. Please write code to test if that
my pandas Dataframe [insert requirements here]
Chapter 3: EXPLAIN CODE
19. Explain Python

Prompt: I want you to act as a code explainer. What is this code


doing? [Insert code]

I want you to act as a code explainer. What is this code doing? reconstructed =
np.c lip (reconstructed, 0, 1)
pit. imshow(np. transpose (reconstructed, (1, 2, 0)))
pit.showQ

Certainly! This code appears to be displaying an image using the j ImshowO'


function from the Matplotlib library.

' 'np . dipt reconstructed f 0, ID This function clips the values in the
' reconstructed’ array so that all values less than 0 are set to 0, and all values
greater than 1 are set to 1. This is likely being done to ensure that the values in
the array are within the valid range for pixel Intensity values (which are usually
between O and 1).
■ 'np.transposefreconstructed, fl, Z, : This function transposes
the dimensions of the ' reconstructed ' array. The dimensions of an image
array are usually ' (hei ght, width, channels}', but the ' inishowO'
function expects the dimensions to be(width, height, channels.}The
' transpose^} ' function reorders the dimensions of the array so that it has the
correct shape for ’imshowf)'.
* 'pit,imshow( This function displays the image stored in the transposed
'reconstructed' array.
■ 'pit. showO': This function displays the figure created by ' imshow(}'.

I am extremely pleased with ChatGPT’s response with this. Not only is it


able to explain concepts for each function. It is also able to guess why the
programmer is performing a certain action. For example, it explained the
line np.clip(reconstructed, 0, 1) by saying that “this is likely done to ensure
that the values are within the valid range for pixel intensity value”, which is
absolutely correct.

ChatGPT’s ability to understand code is pretty good and robust, in my


opinion. For that, I would strongly suggest any coders who face problems
understanding a code use ChatGPT as a complement to the API’s
documentation.

20. Explain SQL

Prompt: I want you to act as a data science instructor. Can you please
explain to me what this SQL code is doing? [Insert SQL code]
L] I want you to act as a data science instructor. Can ycu please explain to me what this SQL
code Is doing? SELECT year, revenue, SUM(revenue] OVER (ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) FROM table

Certainly! Here is an explanation of what this SQL code is doingi

■ ' SELECT year, revenue'; This clause specifies the columns that you want to
include in the output. In this case, the output will include the "year11 and
■ revenue" columns,
■ 'SLM(revenue) OVER {RONS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW}': This is a window function that calculates the sum of the
"revenue" column for each row in the table. The 'OVERO ’ clause defines the
window over which the function will operate, in this case the ' ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW' window. This window includes all
rows from the start of the table ('UNBOUNDED PRECEDING') up to the current
row ('CURRENT ROW ).
’ ' FROM table': This clause specifies the table that you want to retrieve data
from. In this case, the table is simply named "table".

Overall, this SQL code is selecting the "year" and "revenue" columns from the
"table" table, and calculating the sum of the "revenue" column for each raw in the
table. The output will include a column with the sum of the "revenue" column for
each row, as well as the original "year" and "revenue" columns.

Again, a correct and an awesome explanation. Much like its explanations


for Python, ChatGPT is able to understand and explain SQL code.

I like its explanation of the line SUM(Revenue) OVER (ROWS


BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). Note
that it’s not only able to provide a high level explanation of what the line is
doing, but also provide a breakdown of what the OVER(),UNBOUNDED
PRECEDING and CURRENT ROW clauses are doing.

21. Explain Google Sheets Formula


Prompt: I want you to act as a Google Sheets formula explainer. Explain the
following Google Sheets command. [Insert formula]

ki I want you to act as a Google Sheets formula explainer. Explain the following Google
Sheets command. =filter(C2:C12, A2:A12=F2)

The ' FILTER' function in Google Sheets is used to filter a range of cells based on
specified criteria and return the resulting rows. The function has the following
syntax:

|°| Copy code

=FILTERCrange, condition!, [condition2], ...)

Here's an explanation of each of the arguments:

* ' range ': This is the range of cells that you want to filter. In this case, the range

is C2;C1Z',

" ' condition!', ’ conditionZ', etc.: These are the conditions that you want to
use to filter the range. In this case, the condition is 'AZ:A12-F2', which
specifies that you want to include only the rows where the value in column A is
equal to the value in cell F2.

So, in this case, the ' FILTER' function is being used to filter the range ' C2:C12'
based on the condition 'AZ : A1Z-FZ which means that only the rows where the
value in column A is equal to the value in cell F2 will be included in the resulting
dataset. The resulting dataset will include all of the values from column C that meet
this criteria.

Note that the 'FILTER' function is an array function, which means that you'll need
to enter It using Ctrl + Shift + Enter (on Windows) pr Command + Shift + Enter (on
Mac). This will cause the formula to be enclosed in curly braces {} to indicate that
it's an array formula. rf->
ChatGPT is able to explain the FILTER function well. I like that it provided
the generalize code snippet like one would see in the documentation, and
proceeded to explain the general snippet with the specific arguments
provided by the users.

In particular, the line the "Filter" function is being used to filter the range
"C2:C12" based on the condition "A2:A12=F2" is an excellent explanation
is contextualized to the user’s arguments.
Chapter 4: OPTIMIZE CODE
22. Improve Code Speed

Prompt: I want you to act as a software developer. Please help me improve


the time complexity of the code below. [Insert code]

I want you to act as a software developer. Please help me improve the time complexity of
the code below, sum=O.
for i in renge(y-tl):
sum+=i
return Eum

To improve the time complexity of the code you provided, you could use a
mathematical formula to calculate the sum Of the integers from O toy', rather than
using a loop. This would eliminate the need to loop through the integers and add
them up one by one, and would allow you to calculate the sum in a single step.

Here is hew you could rewrite the code to use a mathematical formula to calculate
the sum:

Q Copy code

(y

This function calculates the sum of the integers from O to ’ y ’ using the formula
' fn • <n + 120 / 2', which is the formula for the sum of an arithmetic series.
This formula has a time complexity of 0(1). because it performs a fixed number of
operations regardless of the value of 'y'.

A correct optimization made by ChatGPT from O(n) to O(1). However,


note that this is a simple example. You might want to test it on more
complex cases to see if this holds up.
23. Optimize Pandas

Prompt: I want you to act as a code optimizer. Can you point out what’s
wrong with the following Pandas code and optimize it? [Insert code here]
I want you to act as a code optimizer Can you point out what's wrong with the following
Pandas code and optimize it? df = pd,read_csv("titles.csv")

# Mutation-DON'T DO THIS
df_bad = dtqueryfTuntime > 30 & type = 'SHOW”)
df_bad [''score'] = df_bad[['1mdb_score1\" tmdb_score"]].sum(axis=1)
df_bad = df_bad[["seasonsnf "score"]]
df_bad = df_bad.groupby("seasons").agg( ["count", "mean"])
df_bad - df_bad.droplevel(axis=1, level=0)
df.bad = df_bad.qijery ("count > 10")

There’s nothing inherently wrong with the code you've provided, but there are a few
suggestions I can make to optimize it:

1, Consider using the' i si n 0' method to filter the dataframe instead of using the
query method. This can be faster and more readable, especially if you have
multiple conditions.

For example:

3 Copy code

df.filtered = df[df["runtime.isirC[30„ 40, 50]) &


dfFtype'J.isina’SHOr, "MOVIE"])]
I personally think these suggestions are particularly salient. It pointed out
that there is nothing wrong with the code (a correct observation) and moved
to to suggest good code practices.

24. Optimize Pandas Again

Prompt: I want you to act as a code optimizer. Can you point out what’s
wrong with the following Pandas code and optimize it? [Insert code here]
| I want you to act as a code optimizer. Can you optimize this Pandas code for me?
vals = []
for Ind, row in df. iter rows ():
country = rOwfprodCCuntryT]
if country in top3:
vals.append("top3’)
elif country in topIO:
vals.append ("topic")
elif country in top20:
vals.append ("top 20")
else;
vals .append ("other'1)
df["prod_country_ranl<"] = vals

0 Here are a few suggestions for optimizing the code you provided:

1. Instead of using the ' i terrowsO' method to loop through the rows of the
dataframe, you can use the ' apply()' method to apply a function to each row.
This can be more efficient and easier to read. For example:

Q Copy code

def cl a ■,i c< .(row):


country - row[ prod courttryl ]
if country in top3:
return ,1top3‘l
elif country in topl.0:
return ”topl0"
elif country in top20:
return "top20"
else:
return "other11

df[ prod_country r k"] = df.apply(classify_country, axis-1)


Of course ChatGPT is able to catch iterrow as an anti-pattern in Pandas.
Love this optimization.

25. Optimize Python

Prompt: I want you to act as a code optimizer. The code is poorly written.
How do I correct it? [Insert code here]
3, You can also use list comprehensions to create a new list that combines the
elements of'numbers' and 'letters':

Q Copy code

combined = [(number, letter) for number, letter in z -'(numbers,


letters)]

Which of these approaches is the best choice will depend on your specific needs
and the context in which the code is being used.

26. Optimize SQL

Prompt: I want you to act as an SQL code optimizer. The following code is
slow. Can you help me speed it up? [Insert SQL]
I want vou to act as an SQL code optimizer. The following code is stow. Can you help me
speed it up? SELECT

FROM TeslTable
WHERE DATEPART(YEAR, SartleMyDate) = '2O2T;

Here are some suggestions to optimize the performance of the above SQL code:

1. Use an index on the ' Som<?MyDatc' column; An Index on tho ' SomcMyDate '
column will allow the database to quickly find rows that match the condition in
the ' WHERE ' clause, rather than having to scar the entire table.
2. Use a covered index; If the "TestTable' table has a largo number of columns
and you only need to retrieve a few of them, you can create a "covered Index11
that includes al] the columns you need. This will allow the database to retrieve
the data from the index itself, rather than having to look up the rows in the table.
3. Use the 'EXTRACTO' function instead of ' DATEPARTO ': The ' EMTHACTO'
function may be slightly faster than ' DATE PART Q J ' for extracting the year from
a date. Here is how you can rewrite the query using 'EXTRACTQ’:

Q Copy code

SELECT *
FROM TestTable
WHERE EXTRACT<¥EAR FROM ScmeMyDot= 2021;

The first two suggestions here are valid suggestions for optimization.
Howeve, the third tip is incorrect. In fact, DATEPART is actually faster than
EXTRACT. For that reason, I’ll have to dock stars off for this category.

27. Simplify Python

Prompt: I want you to act as a code simplifier. Can you simplify the
following code?
If you’ve written spaghetti code in Python, ChatGPT might be able to help
you simplify it.
Chapter 5: FORMAT CODE
28. Write Documentation

Prompt: I want you to act as a software developer. Please provide


documentation for func1 below. [Insert function]_____________
I want you to act as a software developer. PJease provide documentation for fund below,
def fund(y):
sum=O,
for i in range(y+1)i
aum+=l
return sum

(Sgji Here is the documentation for the ' fund' function:

|"| Copy code

def func'CyJ:
■ IF II

This function calculates the sum of the integers from 0 to y


(inclusiveJ„

Parameters:
y (int): The upper bound of the range of integers to be
Sumned.

Returns:
float: The sum of the integers from 0 to y,

sum-0.
for i in rangefy+j.}:
sum+-i
return sum

ChatGPT could provide a correct, nicely formatted docstring. It also


provides a one-liner that captures the essence of the function (“calculates
the sum of the integers”). This is neat and I foresee myself doing this a lot.
29. Improve Readability

Prompt: I want you to act as a code analyzer. Can you improve the
following code for readability and maintainability? [Insert code]
U1 I want you to act as a code analyzer. Can you improve the following code for readability
and maintainability?
def list (a,b):
randomCamelCase = a
ra n do m_s nake_ca se-b
return [randomCamelCase, random_snake_case]

Certainly! Here is the improved code for readability and maintainability:

|°j Copy code

def get.listCq, b):


random_camel_case = a
randorTi.snake.ccise = b
return [random_caitiel._casef random_snake_case]

I am slightly disappointed here because I had hope ChatGPT would help me


remove the redundant variables in the function. That said, its suggestions
already did help with readability.

30. Format SQL

Prompt: I want you to act as a SQL formatter. Please format the following
SQL code. Please convert all reserved keywords to uppercase [Insert
requirements]. [Insert Code]
The code is formatted nicely and neatly with the correct indentation and
casing.
Chapter 6: TRANSLATE CODE
31. Translate Between DBMS

Prompt: I want you to act as a coder and write SQL code for MySQL. What
is the equivalent of PostgreSQL’s DATE_TRUNC for MySQL?_________
L| I want you to act as a coder and write SQL code for MySQL. What is the equivalent of
DATE_TRUNC for MySQL?

MySQL does not have a built-in function equivalent to PostgreSQL’s ' DATE_TRUNC'
function, which allows you to truncate a date or timestamp to a specified precision.
However, you can achieve similar results using the ' DATE_FORMAT' function, which
allows you to format a date or timestamp as a string using a variety of format
specifiers.

For example, to truncate a timestamp to the nearest hour, you can use the following
query:

|°] Copy code

SELECT OATE_FORMAT(timestamp.column, AS
truncated-timestamp
from tabl.e_name;

While ChatGPT is able to translate the DATE_TRUNC function in this


example, it failed on other instances. In particular, it failed to translate the
Redshift’s PIVOT into a query that contains a series of CASE WHEN.
Thus, this translation might work well only for simple use cases.
32. Translate Python to R

Prompt: I want you to act as a code translator. Can you please convert the
following code from python to R? [Insert code]____________________
kJ I want you to act as a code translator. Can you please convert the following code from
python to R?

pd.concat([df_1r df_n], axis=0)

@ Certainly! In Rd you can use the ' bind„rows' function from the dplyr' library to
concatenate multiple data frames along the rows (axis=O). Here is the equivalent
code tn R:

A Copy cotie

library(dpiyr}

bind_rows(df_l, ..., df_n)

This code first loads the ’ dplyr' library and then uses the ' bind_rows' function
to concatenate the data frames ' df_l', ' df_2',' df_n' along the rows. The
resulting data frame will have all the rows from the input data frames, in the order
they were specified.

This is a correct translation! However, note that this is a very


straightforward function. ChatGPT might fail on on more complex code
blocks.

33. Translate R to Python

Prompt: I want you to act as a code translator. Can you please convert the
following code from R to python? [Insert code]
J I want you to act as a code translator. Can you please convert the following code from R
to python?
spread (
df, key = ,toy_namel
value = 'value_name1
)

Certainly! In python, you can use the ’ pivot' method of a pandas DataFrame to
spread a column into multiple columns. Here is the equivalent code in python using
pandas:

[°| Copy code

df.pivotOndex='key nr.-r1, columns-'v:il ue name', values-1 value }

This code uses the 'pivot' method to reshape the data frame by spreading the
'key_name’ column into multiple columns and using the 'vfllue_name' column as
the values in the new columns. The resulting data frame will have one column for
each unique value in the 'value_name' column, and the values in the new columns
will come from the ' value' column.

Again, a correct translation here. However, note that this is a very


straightforward function. ChatGPT might fail on on more complex code
blocks.
Chapter 7: EXPLAIN
CONCEPTS
Overall, I love ChatGPT’s ability to explain concepts correctly with an
awareness of the audience. Through the examples below, you can see how
ChatGPT is able to explain different concepts to audiences of different
caliber. When explaining a difficult concept to a child, it uses an appropriate
analogy. When explaining the concept to a professor, it uses the appropriate
tone and language. When explaining the concept to a business stakeholder,
it drew parallels to the business jargons and contexts. five stars for each of
this.

34. Explain to Five-Year-Old

Prompt: I want you to act as a data science instructor. Explain [concept] to a


five-year-old.
I want you to act as a data science instructor. Explain what is neural network to a five-
year-old.

Sure! A neural network is a type of computer program that is designed to recognize


patterns, it does this by looking at lots of examples and trying to find things that
they have in common.

Imagine you have a bunch of pictures cf different animals, and you want to know
which ones are dogs. You could show the pictures to a person, and they would be
able to tell you which ones are dogs by looking for certain features, like four legs, a
wagging tail, and a furry face. A neural network does something similar, but it does it
all by itself, without any help from a person.

To do this, a neural network has a lot of little parts called "neurons," which are kind
of like tiny brains. These neurons are arranged in layers, and they work together to
analyze the patterns in the pictures. When they see a pattern that looks like a dog,
they carl say "Hey, this looks like a dog!1,1

Neural networks are really good at finding patterns, and they can be used for all
sorts of things, like recognizing faces in photos, understanding speech, and even
playing games like chess. They’re one of the most powerful tods that data scientists
have to solve all kinds of problems.

35. Explain to Undergraduate

Prompt: I want you to act as a data science instructor. Explain [concept] to


an undergraduate.
u I want you to act as a data science instructor. Explain what is neural network to an
undergraduate student,

A neural network is a type of machine learning model that is inspired by the


structure and function of the brain. It consists of layers of interconnected ’’neurons,"
which process and transmit information.

Each neuron receives input from other neurons or from external sources, and
combines these inputs using weights that reflect the importance of each input. The
neuron then applies an activation function to the combined input to produce an
output, which is passed on to other neurons in the next layer.

The weights and activation functions of the neurons are learned from training data,
allowing the neural network to learn complex patterns and relationships.

Neural networks can be used fora wide variety of tasks, such as image
classification, language translation, and time series prediction. They are particularly
well-suited to tasks that Involve large and complex datasets, and are able to learn
and adapt as they are exposed to more data.

36. Explain to Professor

Prompt: I want you to act as a data science instructor. Explain [concept] to a


professor.
Fl I want you to act as a data science instructor Explain what is neural network to a

professor.

Sure! A neural network is a type of machine (earning algorithm that is inspired by the
structure and function of the brain. It is composed of a large number of
interconnected processing nodes, called "neurons," which are organized into layers.
Each neuron receives input from other neurons, and uses this input to make a
decision or prediction based on the patterns it sees in the data.

Neural networks are typically used to solve problems that involve pattern recognition
or classification. For example, a neural network might be trained to recognize
handwritten digits based on a large dataset of labeled examples, or to classify
emails as spam or non-spam based on their content.

One of the key advantages of neural networks is that they are able to learn and
adapt to new data over time, without the need for explicit programming. This makes
them particularly useful fortasks that Involve complex, non-ilnear relationships, or
for situations where the desired output is not easily defined in advance.

Neurai networks are widely used in a variety of fields, including image and speech
recognition, natural language processing, and finance. They have also been used to
achieve state-of-the-art results on a number of benchmarks and real-world tasks.

37. Explain to Business Stakeholder

Prompt: I want you to act as a data science instructor. Explain [concept] to a


business stakeholder.
U I want you to act as a data science instructor. Explain what is neural network to a non­
technical business stakeholder in a company.

Certainly! A neural network is a type of computer program that is designed to


recognize patterns in data. It's like a "virtual brain'1 that can learn from examples and
make decisions or predictions based on what it has learned.

For example, let's say you have a company that sells products online, and you want
to figure out which of your customers are most likely to buy a new product that you

just released. You could use a neural network to analyze data about your customers'
past purchases, their demographic information, and other factors that might be
relevant. Based on this analysis, the neural network could identify patterns that
seem to be associated with a higher likelihood of making a purchase, and It could
use these patterns to predict which customers are most likely to buy the new
product.

This is just one example of how neural networks can be used to solve problems in
business. They can also be used for things like predicting which employees are most
likely to leave the company, identifying fraudulent activity, and optimizing supply
chain operations, among many other applications.

Overall, neural networks are a powerful toot that can help businesses make better
decisions and improve their operations by finding patterns and insights in data that
would be difficult or impossible for a human to identify. rfh

38. Explain Like StackOverflow

Prompt: I want you to act as an answerer on StackOverflow. You can


provide code snippets, sample tables and outputs to support your
answer. [Insert technical question]
I want you to act as an answerer on StackOverflow. You can provide code snippets,
sample tables and outputs to support your answer. What is the use of constraints in SQL?

In SQL, constraints are used to specify rules for the data in a table. If a constraint is
violated, the database will return an error and the transaction will be rolled back.
Constraints are used to ensure the integrity and correctness of the data in a
database.

There are several types of constraints that can be defined in SQL:

* NOT NULL: This constraint specifies that a column cannot contain a NULL value.
■ UNIQUE: This constraint specifies that all the values in a column must be unique.
' PRIMARY KEY: This constraint specifies a column or a set of columns that
uniquely identifies each row in a table. A table can have only one primary key.
' FOREIGN KEY: This constraint specifies that the values In a column must match
the values in a column of a different table (the parent table). This is used to
enforce referential integrity.
* CHECK: This constraint specifies a condition that the values in a column must
satisfy..
* DEFAULT: This constraint specifies a default value for a column. If no value is
specified for the column when inserting a row, the default value will be used.
Here is an example of how constraints can be defined in SQL:

|°1 Copy code

CREATE TABLE Customers C


CustomerId INI PRIMARY KEY,
Name VARCHAR(255) NOT NULL,
Email VARCHAR(255) UNIQUE,
Age INT CHECK (Age > 0)
);

In this example, the 'Customers' table has four columns: 'Customerld', 'Name',

'Email', and 'Age'.The 'Customerld' column is defined as the primary key of

the table, the ' Name' column is defined as NOT NULL, the ' Email' column is

defined as UNIQUE, and the'Age' column is defined with a CHECK constraint that

ensures that the values in the column are greater than zero.

Not only is the answer correct and succinct, it also came with relevant
information (the different types of constraints) and the code snippets.

One tip that I have here is to explicitly prompt ChatGPT for code snippets,
sample tables and outputs for a more comprehensive answer.
Chapter 8: SUGGEST IDEAS
39. Suggest Edge Cases

Prompt: I want you to act as a software developer. Please help me catch


edge cases for this function [insert function]

I want you to act as a software developer. Please help me catch edge cases for this
function
def computefa, b):
return (a+b)/(a*b)

<S^ Certainly! Here are some edge cases you might want to consider when testing the
'compute' function:

* When 'o' and *b' are both zero. In this case, the function will raise a
'ZeroDivisionEr ror', since you cannot divide by zero.
* When 'a' is zero and ' b' is non-zero. In this case, the function will also raise a
'ZeroOivisionError'.
* When 'a' and 'b' are both negative. This might produce a different result than
when and 'b' are both positive, depending on the values.
* When ' a' is negative and ' b ‘ is positive, or vice versa. This might also produce
a different result than when ’a' and 'b' are both positive or both negative.
* When 'o' is very large and ft' is very small, or vice versa. This might produce a
different result than when 'o' and 'b' are both large or both small.

These are just a few examples of edge cases you might want to consider.
Depending on the specific requirements of your function, there might be other edge
cases to consider as well.
For a relatively simple function, ChatGPT is able to correctly and
exhaustively suggests the edge cases. It exhaustively considered all cases
without being prompted on what the cases are.

40. Suggest Dataset

Prompt: I want you to act as a data science career coach. I want to build a
predictive model for [...]. At the same time, I would like to showcase my
knowledge in [...]. Can you please suggest the five most relevant datasets
for my use case?
| I want you to act as a data science career coach. I want to build a predictive model for
movies. At the same time, I would like to showcase my knowledge in computer vision. Car
you please suggest the five most relevant datasets for my use case?

@ Sure! Here are five potential datasets that you might find useful for building a

predictive model for movies:

1. The Movie Lens dataset: This dataset contains movie ratings and tag
recommendations from the MovieLens website, and includes information about

the movies, users, and ratings.


2. The IMDb dataset: This dataset contains information about movies and TV
shows, including cast, crew, and plot summaries.

3. The Netflix Prize dataset: This dataset contains movie ratings and other data
from the Netflix Prize competition, which challenged participants to build a
recommendation system for movies.

4. The TMDB 5000 Movie dataset: This dataset contains information about movies,
including their budget, revenue, and cast, as well as movie posters and other
images.

5. The Film Popularity dataset from the UCi Machine Learning Repository: This
dataset contains information about the popularity of movies, including their gross

earnings and IMDb ratings.

I like ChatGPT’s dataset suggestions because they are customized to my use


case. I have also tested this prompt for different personas and use cases, and
the suggestions it gives are consistently excellent.

However, note that I have only tested relatively simple use cases. For more
complex use cases that requires niche datasets, ChatGPT might not do well.

41. Suggest Portfolio Ideas


Prompt: I want you to act as a data science coach. My background is
in [...] and I would like to [career goal]. I need to build a portfolio of data
science projects that will help me land a role in [...] as a [...]. Can you
suggest five specific portfolio projects that will showcase my expertise
in [...] and are of relevance to [company]?_________________________
I want you to act as a data science coach. I have a degree in civil engineering and 1 wouid
like to make a switch into data science. I know I need to build a portfolio of data science
projects that will help me land a role in Tesla as a data scientist. Can you suggest five
specific portfolio projects that will showcase my expertise in civil engineering and are of
relevance to Tesla?

Sure! Here are five potential portfolio projects that could be relevant for a data

science role at Tesla and showcase your expertise in civil engineering:

1. Infrastructure optimization; Tesla is building a network of charging stations and

other infrastructure to support its electric vehicles. You could build a project that

helps optimize the design and placement of these stations based on factors such

as traffic patterns, population density, and the availability of renewable energy

sources.

2. Predictive maintenance; Tesla's vehicles and charging stations are complex

systems that require regular maintenance to operate efficiently. Youcouldbuilda

machine learning model that predicts when maintenance will be needed, so that

it can be scheduled in advance and downtime can be minimized.

3. Traffic flow prediction: Tesla is working on developing self-driving vehicles,

which will need to be able to navigate complex traffic environments. You could

build a model that predicts traffic flow in different locations, which could be used

to help autonomous vehicles navigate more efficiently.

This is one of my favourite prompts. Its suggestions are relevant not only to
myself, but also to the industry that I am gunning for, and also for my
dream company. It does not suggest any generic projects like the Titanic or
Iris project. Instead, it makes salient suggestions based on its knowledge of
the company (Tesla, in this case), the industry (data science), and my
background in engineering.

I love this feature and I hope more people will try it out!

42. Suggest Resources

Prompt: I want you to act as a data science coach. I would like to learn
about [topic]. Please suggest 3 best specific resources. You can
include [specify resource type]

I want you to act as a data science coach. 1 would like to learn about data engineering.
Please suggest 3 best specific resources. You can include websites, courses and books.

Sure! Here are three resources that I would recommend for learning about data
engineering:

1. Online Courses: There are many online courses available that cover data
engineering concepts and best practices. Some options include "Data
Engineering on Google Cloud Platform” on Coursera and "Data Engineering
Fundamentals” on Pluralsight.
2. Books: There are also several books that are great for learning about data
engineering. Some options include ’’Designing Data-Intensive Applications1' 2by3
Martin Kleppmann and "Building Data Pipelines with Python1' by Alec Watson.
3. We bsites: The re a re ma ny we bs ites th at offer tuto ria I sd a rticl es, and ot he r
resources on data engineering. Some options include the "Data Engineering1'
section of the Dataquest blog and the "Data Engineering1' section of the
Databricks blog.
This one is for those looking to learn a new skill. I find that the resources
that it has suggested are not only relevant, but also critically acclaimed. It
also does not provide an overwhelming list of resources and instead provide
only a few. I like this because it certainly provides a good starting point for
anyone hunting for new resources.

43. Suggest Time Complexity

Prompt: I want you to act as a software developer. Please compare the time
complexity of the two algorithms below. [Insert two functions]
fa.I I want you to act as a software developer Please compare the time complexity of the two
algorithms below, def fund (y):
sum=0.
for i in range(y+1):
sum+=i
return sum
def func2(y):
return y/2'(1+y)

Asa software developer, I would say that the time complexity of the * fund1

function is O(n), where n is the value of the input 'yr This is because the function
performs a single loop that runs from 0 to 'y', and the time taken to execute the

function increases linearly with the value of y'.

The time complexity of the ' func2' function is 0(1), because the function

performs a fixed number of operations regardless of the value of the input 'y'. The

time taken to execute the function is constant, and does not depend on the size of

the input.

Therefore, the 'funcZ' function is more efficient than the 'fund' function,

because it has a lower time complexity. The 'fund' function will always be faster

than the ‘ fund' function, regardless of the size of the input.

For this very simple example, ChatGPT is able to deduce the time
complexity.

44. Suggest Feature Engineering

Prompt: I want you to act as a data scientist and perform feature


engineering. I am working on a model that predicts [insert feature name]
There are columns: [Describe columns]. Can you suggest features that we
can engineer for this machine learning problem?
jjj | I want you to act as a data scientist and perform feature engineering. I am working on a
model that predicts whether person will default on a credit card loan in October. There
are columns: LIMIT_BAL [Credit card limit), SEX, EDUCATION, MARRIAGE, AGE,
BILL_AMT(1-6] (amount of bill statement for April to September), PAY_AMT[1-6] payment
amount for April to September. Can you suggest features that we can engineer for this
machine learning problem?

Certainly! Here are a few ideas for features that you could engineer for this machine
learning problem:

1. Payment history: You could create a new feature that captures the overall
payment history of the borrower. For example, you could calculate the average
payment amount over the past six months, or count the number of months in
which the borrower paid the minimum amount due.
2. Credit utilization: You could calculate the credit utilization ratio for each
borrower, which is the ratio of the total amount of credit being used to the total
credit limit. This can be a useful feature because high credit utilization is often
associated with higher risk of default.
3. Payment to bill ratio: You could create a new feature that captures the ratio of
payment amount to bill amount for each month. This can give you an idea of how
much of the bill the borrower is paying off each month, which can be a useful
predictor of default risk.

I absolutely loved ChatGPT’s reponse here. The features it suggested are


sound and actually useful for the use case.

45. Suggest A/B Testing Steps

Prompt: I want you to act as a statistician. [Describe context] Please design


an A/B test for this purpose. Please include the concrete steps on which
statistical test I should run.
I want you to act as a statistician. (Vly web designer has changed the landing page’s
background from blue to red. Please design an A/B test for this purpose. Please include
the concrete steps on which statistical test I should run.

Certainly! Here are the steps for designing an A/B test to compare the effectiveness

of a blue landing page background versus a red landing page background, including

which statistical test to use:

1, Sampling:

* Choose a representative sample of visitors to your website. This could be a

random sample, or you could stratify the sample based on factors such as

age, gender, or geographic location to ensure that the sample is

representative of your target audience.

' Divide the sample into two groups: the "treatment" group, which will see the

red background, and the"control" group, which will see the blue background,

it is important to ensure that the two groups are comparable in terms of the

characteristics of the visitors.


’ Set the sample size to be large enough to detect a statistically significant

difference between the two groups, if one exists.


2. Running the experiment:

' Implement the changes to the landing page for the treatment group. Make

sure to keep all other elements of the page the same for both groups, so that

any differences in the outcomes can be attributed to the background color.

* Track the key metric that you are interested in, such as the conversion rate

(the percentage of visitors who take a desired action, such as filling out a

form or making a purchase}. You will need to have a way to track this metric

for both the treatment and control groups.

* Run the experiment for a sufficient length of time to allow for a sufficient

number of visitors to be included in the sample. The length of time will

depend on the size of the sample and the expected effect size.

3. Analyzing the results:

■ Calculate the difference in the key metric between the treatment and control

groups. This will be the '’effect size1’ of the change in background color.

' Use a two-sample t-test to determine whether the observed difference is

statistically significant. This test is appropriate because you are comparing

the means of two independent samples (the treatment and control groups).

■ If the p-value of the t-test is less than your predetermined significance level

(usually 0.05), you can conclude that the difference is statistically significant,

and that the change in background color had an effect on the key metric.

While ChatGPT’s steps for A/B testing is correct. I did find it a requirement
to explicitly ask it to include the statistical test required.

46. Career Coaching

Prompt: I want you to act as a career advisor. I am looking for a role as


a [role name]. My background is [...]. How do I land the role and with what
resources exactly in 6 months?

£ I want you to act as a career advisor I am looking for a role as a data engineer in Amazon
as a fresh graduate. I have an undergraduate degree in chemical engineering and have
self-studied data science through MOOCs. How do I land the role and with what
resources exactly in 6 months?

Sure, I'd be happy to help you out. As a data engineer at Amazon, you'll be
responsible for designing, building, maintaining, and troubleshooting data pipelines.
It's a role that combines software engineering with data management and analysis

skills.

To land a data engineering role at Amazon, you’ll need to demonstrate your


technical skills, problem-solving abilities, and attention to detail. Here are a few
things you can do to improve your chances:

1. Continue to build your technical skills: As a chemical engineering graduate, you


have a strong foundation in math and science, which will be useful in data

engineering. However, you'll also need to be proficient in programming

languages such as Python and SQL, as well as tools like Hadoop, Spark, and
AWS. Consider taking online courses or earning certifications to enhance your
skills.
2. Gain practical experience: While a degree and MOOCs are a good start, practical
experience is crucial for landing a data engineering role. Consider working on
data-related projects in your spare time, or interning at a company that uses data
engineering. This will give you the opportunity to apply your skills and build a
portfolio of work to show potential employers.
3. Network: Networking can be a powerful tool for finding job opportunities. Attend
industry events and meetups, join online forums and groups, and reach out to
professionals in the field to learn more about data engineering and make
connections. You never know who might be able to help you find your next job.
4. Tailor your resume and cover letter: When applying for a data engineering role at
Amazon, make sure to highlight your relevant skills and experience. Include any
projects you've worked on, any relevant coursework or certifications, and any
relevant work experience. Customize your resume and cover letter for each job
you apply to, and make sure to highlight how your skills and experience make
you a strong fit for the role.

I have no surprises that ChatGPT did well here. It is able to regurgigate oft-
repeated advice for career changers. While this is technically correct, I
didn’t find it overwhelming impressive.

Side note: One can combine this tip with Tip 41 (Suggest portfolio ideas)
and 42 (Suggest dataset) when building one’s portfolio.
Chapter 9: TROUBLESHOOT
PROBLEM
47. Correct Own ChatGPT Code

Prompt: Your above code is wrong. [Point out what is wrong]. Can you try
again?
Your above terminal command is wrong. There is no "-T" option for cp. Can you try
again?

@ I apologize for the error in my previous answer. The ' -T' option is not a valid option

for the 'cp' command. To recursively copy all files from "Desktop/filesT1 to a folder
that has not been created "Desktop/files2" in the Linux terminal, you can use the
'cp' command with the '-r' option and the 'mkdir -p' command, like this:

[°| Copy code

mkdir -p Desktop/filesZ
cp -r Desktop/filesl/* Desktop/filesZ/

ChatGPT is quick to admit its mistakes and correct its code. In the above
example, I asked ChatGPT to create a shell command. I realized that it does
not actually run, so I pointed out what was wrong. ChatGPT is able to
provide the correct code on the second attempt.

This is good enough if you know where the bug is exactly. But in cases
where you do not know where the bug is, ChatGPT might not be able to
pinpoint where the fault is. The onus is still on the coder to figure things
out.

48. Correct Python Code

Prompt: I want you to act as a software developer. This code is supposed


to [expected function]. Please help me debug this python code that cannot
be run. [Insert function]
| I want you to act as a software developer. This code is supposed to add numbers from 1
to k. Please help me debug this python code that cannot be run.

def add_from_1_to_k(k):
if{k>0}:
result = k+add_from_1_to_k(k-1)
return result

There is an error in the code you provided because it is missing a return statement
for the case where ' k' is 0 or negative. This means that the code will not be able to
execute correctly if' k’ is 0 or negative, because the function will not return a value
in these cases.

To fix this error, you can add a return statement for the case where ' k' is 0 or
negative, like this:

|°| Copy code

def add_fron_l_to_k(k):
if k <= 0:
return 0
else:
result - k + odd_f
return result

I provided ChatGPT with a simple incorrect function and it is able to tell


me why it is wrong correctly.

49. Correct SQL Code

Prompt: I want you to act as a SQL code corrector. This code does not run
in [your DBMS, e.g. PostgreSQL]. Can you correct it for me? [SQL code
here]
H I want you to act as a SQL code corrector. This code does net run. Can you correct it for
me? SELECT name
FROM dish
WHERE name = 'Prawn Salad’
GROUP BY name
HAVING count(*} = 1
LIMIT 5 ORDER BY name ;

It looks like there is a problem with the syntax of your ' SELECT' statement. The
'LIMIT' clause should come before the ' ORDER BY' clause. Try the following
instead:

Q Copy code

SELECT name
FROM dish
WHERE name ” 'Prawn Salad’
GROUP BY name
HAVING countC*) - 1
ORDER BY name
LIMIT 5;

This should correct the syntax error and allow the query to run successfully,

A correct explanation for a simple query. However, upon further testing, I


did find ChatGPT’s understanding of complex SQL query to be somewhat
lacking. It’s not able to debug more complex SQL queries containing
multiple tables.

50. Troubleshoot PowerBI Model

Prompt: I want you to act as a PowerBl modeler. Here is the details of my


current project. [Insert details]. Do you see any problems with the table?
KI I want you to act as a PowerBi modeler. Here is the details of my current project.
Table: Emission
- Column: Timestamp {datetime)
- Column: Value (string)
- Column: Unit (string)
- Column: EmissionType (string)
। Column: Entity (string)

Do you see any problems with the table?

It is difficult to determine if there are any problems with the Emission table without
more context. However, here are a few things to consider when working with this
table in Power Bl:

1. Data types: Make sure that the data types for each column are appropriate for
the data they contain. For example, the "Timestamp" column should have a data
type of "Date/Tlme", and the "Value" column should have a numeric data type
such as "Decimal Number” or ’’Whole Number".
2. Data quality: Check for any errors or inconsistencies in the data. For example,
you might want to ensure that all values in the "Unit" column are spelled
correctly, or that the ‘‘EmissionType" column does not contain any null values.

4 Performance: Consider the size of the Emission table and the complexity of your
I
measures and visuals. If the table is very large or your measures and visuals are
very complex, it could impact the performance of your Power Bl report.

Overall, the key is to understand the data in the Emission table and how you want to
use it in your Power Bl report. This will help you identify any potential problems and
come up with solutions to address them.

ChatGPT is able to suggest relevant improvements to the PowerBI model. It


also admitted its shortcoming (“It’s difficult to determine if there are any
problems without more context”)

Mathias Halkjrer Petersen had a good chat with ChatGPT on this topic. He
showed that ChatGPT is able to troubleshoot PowerBI models effectively
when more context is provided. It is also able to write the models relatively
independently.
Chapter 10: WRITE SQL
51. Create Running Average

Prompt: I want you to act as a data scientist and write SQL code for me. I
have a table with two columns [Insert column names]. I would like to
calculate a running average for [which value]. What is the SQL code that
works for PostgreSQL 14?

I want you to act as * data scientist and write SQL code for me- I have a table with two
columns ''date'1, "product" and "revenue", I would like to calculate a running evernge for
each product. What Is the SQL code that works for PostgreSQL 14?

Here is the SQL code that you can use to calculate the running average for each
product in PostgreSQL 14:

C Copy code

SELECT
dote T
product ,
revenue .
AVGCrevenue) OVER C
PARTITION BY product
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
3 as rmntnf)_avfl
FROM sales_table
ORDER BY dale:
This code will do the following:

1. Select the ' date', 'product', ' revenue', and ' running_avg' columns from
the 'sales^table' table,
2. Calculate the running average of the ' revenue' column for each product using
a window function,

3. The ' AVGQ' function Is a window function that calculates the average of a set
of rows* The 'OVE R(}' clause specifies the window for the window function*
The 'PARTITION BY' clause divides the rows of the table into partitions to

which the window function is applied. In this case, the rows are partitioned by
’ product', The 'ORDER BY' clause specifies the order in which the rows are

processed within each partition. In this case, the rows are ordered by date'.
The 'ROWS' keyword specifies that the window is defined in terms of rows, and
the 'BETWEEN' keyword specifies the start and end of the window,

4. The ' running.avg' column will contain the average of all 'revenue' values
from the start of the partition up to the current row, for each product.
5. The resulting table will be ordered by'date'.

ChatGPT did provide a good response to the requesst here! It correctly


provided the output for creating running averages in SQL. It also provided a
correct explanation and breakdown.

One tip that I have is to provide ChatGPT with the exact version and
flavour of SQL. This allows it to create versions of the data.

I would encourage practitioners to create test ChatGPT with their own SQL
questions and see if it creates a correct response.

52. Solve Leetcode Question

Prompt: Assume you are given the tables... with the columns. Output the
following...
1 Assume that you are given the table below containing information on viewership by
device type (where the three types are laptop, tablet, and phone). Define "mobile'' as the
sum of tablet and phone viewership numbers. Write a query to compare the viewership
on laptops versus mobile devices.

Output the total viewership for laptop and mobile devices in the format of “laptop_views"
and "mobile_views'. The viewership table has the columns huser_td", "device_type", and
"view_time"

To compare the viewership on laptops versus mobile devices, you can. use the
following SQL query:

Q copy cods

SELECT
SUM(CASE WHEN device_type - laptop' THEM view_time ELSE 0 END)
AS laptop_vi€W5 f
SUMfCASL WHEN device_type IN Ctablet , 'phone) THEN vicw_time
ELSE 0 END) AS mobile.views
FROM viewership

This query uses a ' CASE' expression to sum the ’'view_time’' values for laptop and
mobile devices separately. The " SUM' function is then used to calculate the total
viewership for each device type.

ChatGPT could solve an introductory Leetcode question (Difficulty:


Simple) on SQL. The above code is sound and answers the question
correctly.

However, it failed to produce runnable SQL code for a Leetcode question


with medium difficulty when I tested it.
Chapter 11: WRITE OTHER
CODE
53. Write Google Sheets Formula

Prompt: I want you to act as a bot that generates Google Sheets formula.
Please generate a formula that [describe requirements]
I want you to act as a bot that generates Google Sheets formula. Please generate a
formula that puts data from another sheet into current sheet.

Here is a formula that puts data from another sheet into the current sheet:

Q Copy cade

=IMPORTRANGEC" sheetjd", range )

This formula uses the ’ IMPORTRANGE‘ function, which imports data from another

sheet. The first argument is the "sheetjd" of the sheet that you want to import data
from. The sheetjd is a unique identifier for the sheet, and you can find it in the URL
of the sheet when it's open in your browser. For example, if the URL of the sheet is
'https://docs,google.coin/spreadsheets/d/obcl23/edit' , thesheetjd is

abcl23'.

The second argument is the range of cells that you want to import. This can be a
single cell (e.g., "AT1). a range of cells (e.g., "AfiBIO"), or a named range.

For example, if you want to import the data from cells A1 through B10 in a sheet with
the sheetjd abc!23you would use the fol I owing formula:
Q Cop/code

=IMPORTRANGE( 'nbclZB", " L:B10 )

Note that to use the 'INPORTRANGE' function, you'll need to grant access to the
sheet that you want to import data from. To do this, you can go to the "Data" menu
and select "import data." Then, choose the sheet that you want to import data from
and click "Allow access."

Once you've granted access, you can use the' importrange' function to import
data from the other sheet into the current sheet. The imported data will be
automatically updated whenever the source data is changed. £

ChatGPT excels at acting as a lookup table for all commands. As you can
see, I asked it for a function that pulls one sheet and puts it into another. It
faithfully provided the correct importrange.

However, again, I cherrypicked this example. ChatGPT fails when asked to


provide more complex formulas. In particular, I asked it to provide a
formula that outputs the number of Mondays in 2023. It failed to produce a
formula that works. Yet, it still confidently explained it as though it
performed splendidly.

54. Write R

Prompt: I want you to act as a data scientist using R. Can you write an R
script that [Insert requirement here]
I want you tn act as a data scientist using Ft. Can you write an R script that imports the iris
dataset and uses ggplots to perform data visualization. I would like to use a violinplot to
compare the distribution of sepal length between flower species.

Certainly! Mere is an R script that imparts the iris dataset and uses ggpiotS to create
a violinplot comparing the distribution of sepal length between flower species:

L] Copy cade

# Load the ggplot2 Library


li braryCggplot2)

# Load the iris dataset


dataC'iris"}

# Create a violinplot of sepal length by species


ggplotCdata — iris, ae$(x - Species, y - Sepal.Length}} +
geom„violinO

This script loads the ggplot2 library and the iris dataset, and then creates a
violinplot using the ‘geom_violin ' function. The plot shows the distribution of
sepal length for each flower species, with the width of the violins indicating the
density of the data at each point.

ChatGPT sure excelled at this basic R command. This code runs and indeed
creates the plot as requested.

However, do note that I did not test ChatGPT on R coding extensively. I can
imagine it failing or producing incorrect codes for more complex cases.

55. Write Shell

Prompt: I want you to act as a Linux terminal expert. Please write the code
to [describe requirements]
I want you to act as a Linux terminal expert. Please write the code to create a directory
called "new folder", and create a shell file "txt.sh", It should contain the command to list
all fifes in "new folder1'

To create a directory called "new folder" and a Shell file "txt.sh" that contains the
command to list all files in "new folder", you can use the following commands in the
Linux terminal;

Q Copy code

# Create the directory


ntkdir ne«_folder

# Change into the new directory


cd new_folder

# Create the shell file


echo "Is" > txt.sh

# Make the shell file executable


chmod +x txt.sh

These are the correct commands for the requested instructions. It even
provided a line for granting the necessary permission to the file without me
explicitly asking it to

However, note that I cherrypicked this example out of all the prompts for
writing shell commands. In particular, when I asked ChatGPT to copy files
from a folder recursively, it gave an incorrect output that could not be
executed. In its defense, it’s able to correct its own code (See “47. Correct
Own ChatGPT Code” above.)

56. Write VBA


Prompt: I want you to act as an Excel VBA developer. Can you write a
VBA that [Insert function here]?
I want you to act as an Excel VBA developer. Can you write a VBA that automatically print
all sheet names?

Yes, I can write a VBA code that will automatically print all sheet names in an Excel
workbook. Here is a sample code that you can use:

Pl Copy code

Sub PrintSheetNamesO
Dim ws As Worksheet
For Each ws In ThisWorkbook.Worksheets
Debug.Print ws.Name
Next ws
End Sub

To use this code, copy and paste it into the VBA editor in Excel, and then run it by

pressing the "Run" button or by using the F5 key. The code will loop through ail the
worksheets in the current workbook and print the name of each sheet to the debug
window.
Chapter 12: MISCALLANEOUS
57. Format Tables

Prompt: I want you to act as a document formatter. Please format the


following into a nice table for me to place in Google Docs? [insert text table
here]

I want you to act as a document formatter for me. Can you please format the following
into a nice table for me to place in Google Docs?
1 January 2021 Friday New Year’s Day New Year’s Day
12 February 2021 Friday Chinese New Year Chinese New Year
13 February 2021 Saturday Chinese New Year Chinese New Year
2 April 2021 Friday Good Friday Good Friday
1 May 2021 Saturday Labour Day Labour Day

Here is the formatted table for you:

Date Day Holiday Description

1 January Friday New Year's Day New Year's Day

12 February Friday Chinese New Year Chinese New Year

13 February Saturday Chinese New Year Chinese New Year

2 April Friday Good Friday Good Friday

1 May Saturday Labour Day Labour Day


\_
The output speaks for itself. It is able to convert a table that I copied from
the internet into a nicely formatted table.

However, ChatGPT cannot offer any customization to the table. I asked it to


change the colors and font, but it was not able to.

58. Summarize Book

Prompt: I want you to act as a technical book summarizer. Can you please
summarize the book [name] with 5 main points?

U I want you to act as a technical book summarizer. Can you please summarize the book
Storytelling With Data: A Data Visualization Guide for Business Professionals with 5 main
points?

@ Certainly! Here are five main points from "Storytelling with Data: A Data Visualization

Guide for Business Professionals" by Cole Nussbaumer Knaflic:

1. The purpose of data visualization is to communicate information clearly and


effectively through graphical means.
2. It is important to consider the context in which the data will be presented and the
audience that will be viewing it.
3. Good data visualization should include appropriate and effective use of charts,
graphs, and other visual encodings.
4. The design and aesthetics of the visualization are important, but function should
be the primary focus.
5. It is important to clearly label and annotate the visualization so that the
information being conveyed is easily understood. fCi

Knaflic’s book is a gold mine of data storytelling tips. I personally feel that
the summary was too generic and did not capture the tips sufficiently.
59. Summarize Paper

Prompt: I want you to act as an academic. Please summarise the


paper [...] in simple terms in one paragraph.

I want you to act as an academic. Please summarise the paper "Distilling the knowledge
in a neural network" in simple terms in one paragraph.

"Distilling the knowledge in a neural network" is a paper that presents a method for
compressing a large, complex neural network model into a smaller, more efficient
model that can be deployed in real-world applications. The authors propose a
method called "knowledge distillation" in which the large model is used to train the
smaller model, effectively transferring the knowledge from the large model to the
smaller one. The smaller model is able to achieve similar performance to the large
model, but with fewer parameters and faster inference times. This method has been
shown to be effective in a variety of tasks, including image classification and
machine translation.

I asked ChatGPT to summarize a relatively old paper “Distilling the


knowledge in a neural network”. It is able to explain the most important
concepts in a relatively straightforward and jargon-free manner.

However, note that this is possible because enough people have written
reviews or summaries of the above paper. It will not be able to summarize
papers that are not as popular well.

Also, ChatGPT does not have an understanding of the world beyond 2021.
Thus, it will not be able to help you understand new papers.

60. Provide Emotional Support (My Favourite)


Prompt: I want you to provide emotional support to me. [Explain problem
here.]

U1 l want you to provide emotional support to me* l am a data scientist who


has been training rny deep learning model for 3 months, and my model
is still underperforming a linear regression model. K

@ I'm here to help in any way I can. It can be frustrating when we put

a lot of time and effort into a project and don't see the results we
were hoping for, but it's important to remember that it's a normal
part of the process and that setbacks can often lead to valuable
learning experiences.

Have you tried looking into why your deep learning model is
underperforming? it could be that the data you are using is not

well-suited fora deep learning model, or that there are issues with
the way you have designed and trained the model. It can be helpful

to try to identify what might be causing the issue and then see if
there are any changes you can make to address it. It can also be
helpful to talk through your approach with someone else, as they
may be able to offer a fresh perspective.

In the meantime, it's important to take care of yourself and try to

find ways to manage any stress or frustration you may be feeling.


This might involve taking breaks, finding healthy ways to relax, or
seeking support from friends, family, or a professional. 91

Alright I’m no psychologist, so I do not think I am qualified to judge this.


That said, I personally think that this is a response that is both empathetic
and practical. It also showed a good understanding of the data science
process.
Closing
ChatGPT performs well in explaining things and providing boilerplate
code. However, it falters when it comes to debugging or creating complex
code snippets.

Will ChatGPT replace data scientists? No, I don’t think so. At least not yet.
However someone using ChatGPT might.

You might also like