LAB 4 - Amazon Comprehend

The document describes how to configure and run a sentiment analysis job using Amazon Comprehend. It explains how to define the input and output configurations, start the job, check the status, and retrieve and analyze the results. The results are mapped to numerical values and compared to the test data using a confusion matrix and various performance metrics.

Uploaded by

Rotenda Mantsha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views3 pages

LAB 4 - Amazon Comprehend

Uploaded by

Rotenda Mantsha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

Challenge: Configuring the Amazon Comprehend job parameters

In the next cell, configure the Amazon Comprehend job parameters.

In input_data_config -
S3Uri: Replace <S3_INPUT_GOES_HERE> with the test_uri that was defined previously
InputFormat: Replace <INPUT_FORMAT_GOES_HERE> with ONE_DOC_PER_LINE
In output_data-config -
S3Uri: Replace <S3_OUTPUT_GOES_HERE> with the s3_output_location
data_access_role_arn: Replace
arn:aws:iam::899882598055:role/service-role/c67833a1330685l3262871t1w-
ComprehendDataAccessRole-1EAP15HQRX9QE with the Amazon Resource Name (ARN) from the
Lab Details file
input_data_config={
'S3Uri': 'S3_INPUT_GOES_HERE',
'InputFormat': 'INPUT_FORMAT_GOES_HERE'
},

output_data_config={
'S3Uri': 'S3_OUTPUT_GOES_HERE'
},
data_access_role_arn =
'arn:aws:iam::899882598055:role/service-role/c67833a1330685l3262871t1w-
ComprehendDataAccessRole-1EAP15HQRX9QE'

### BEGIN_SOLUTION
input_data_config={
'S3Uri': test_url,
'InputFormat': 'ONE_DOC_PER_LINE'
}
output_data_config={
'S3Uri': s3_output_location
}
data_access_role_arn =
'arn:aws:iam::899882598055:role/service-role/c67833a1330685l3262871t1w-
ComprehendDataAccessRole-1EAP15HQRX9QE'
### END_SOLUTION
Now that you defined the job parameters, start the sentiment detection job.

response = comprehend.start_sentiment_detection_job(
InputDataConfig=input_data_config,
OutputDataConfig=output_data_config,
DataAccessRoleArn=data_access_role_arn,
JobName='movie_sentiment',
LanguageCode='en'
)

print(response['JobStatus'])
The following cell will loop until the job is completed. (This step might take a
few minutes to complete.)

%%time
import time
job_id = response['JobId']
while True:
job_status=(comprehend.describe_sentiment_detection_job(JobId=job_id))
if job_status['SentimentDetectionJobProperties']['JobStatus'] in
['COMPLETED','FAILED']:
break
else:
print('.', end='')
time.sleep(15)
print((comprehend.describe_sentiment_detection_job(JobId=job_id))
['SentimentDetectionJobProperties']['JobStatus'])
When the job is complete, you can return the details from the job by calling the
describe_sentiment_detection_job function.

output=(comprehend.describe_sentiment_detection_job(JobId=job_id))
print(output)
In the OutputDataConfig section, you should see the S3Uri. Extracting that URI will
give you the file that you must download from Amazon S3. You can use the results to
calculate metrics in the same way that you calculated the results from a batch
transformation by using an algorithm.

comprehend_output_file = output['SentimentDetectionJobProperties']
['OutputDataConfig']['S3Uri']
comprehend_bucket, comprehend_key = comprehend_output_file.replace("s3://",
"").split("/", 1)

s3r = boto3.resource('s3')
s3r.meta.client.download_file(comprehend_bucket, comprehend_key, 'output.tar.gz')

# Extract the tar file

import tarfile
tf = tarfile.open('output.tar.gz')
tf.extractall()
The extracted file should be named output. Read the the lines in this file.

import json
data = ''
with open ('output', "r") as myfile:
data = myfile.readlines()
Add the lines to an array.

results = []
for line in data:
json_data = json.loads(line)
results.append([json_data['Line'],json_data['Sentiment']])
Convert the array to a pandas dataframe.

c = pd.DataFrame.from_records(results, index='index',
columns=['index','sentiment'])
c.head()
The results contain NEGATIVE, POSITIVE, NEUTRAL, and MIXED results instead of
numerical values. To compare these results to your test data, they can be mapped to
numerical values, as shown in the following cell. The index in the returned results
is also out of order. The sort_index function should fix this issue.

class_mapper = {'NEGATIVE':0, 'POSITIVE':1, 'NEUTRAL':2, 'MIXED':3}

c['sentiment']=c['sentiment'].replace(class_mapper)
c = c.sort_index()
c.head()
# Build list to compare for Amazon Comprehend
test_2 = test.reset_index()
test_3 = test_2.sort_index()
test_labels = test_3.iloc[:,2]
You can display a confusion matrix by using the plot_confusion_matrix function.
Because Amazon Comprehend also includes mixed and neutral in the results, the chart
will be different.
plot_confusion_matrix(test_labels, c['sentiment'])
The existing function to print metrics won't work because you have too many data
dimensions. The following code cell will calculate the same values.

cm = confusion_matrix(test_labels, c['sentiment'])

TN = cm[0,0]
FP = cm[0,1]
FN = cm[1,0]
TP = cm[1,1]

Sensitivity = float(TP)/(TP+FN)*100
# Specificity or true negative rate
Specificity = float(TN)/(TN+FP)*100
# Precision or positive predictive value
Precision = float(TP)/(TP+FP)*100
# Negative predictive value
NPV = float(TN)/(TN+FN)*100
# Fall out or false positive rate
FPR = float(FP)/(FP+TN)*100
# False negative rate
FNR = float(FN)/(TP+FN)*100
# False discovery rate
FDR = float(FP)/(TP+FP)*100
# Overall accuracy
ACC = float(TP+TN)/(TP+FP+FN+TN)*100

print(f"Sensitivity or TPR: {Sensitivity}%")

print(f"Specificity or TNR: {Specificity}%")
print(f"Precision: {Precision}%")
print(f"Negative Predictive Value: {NPV}%")
print( f"False Positive Rate: {FPR}%")
print(f"False Negative Rate: {FNR}%")
print(f"False Discovery Rate: {FDR}%" )
print(f"Accuracy: {ACC}%")

HeadRush Amp & Effect List
No ratings yet
HeadRush Amp & Effect List
10 pages
Ultrasound For Primary Care - 1st Edition Readable PDF Download
100% (17)
Ultrasound For Primary Care - 1st Edition Readable PDF Download
14 pages
Technical Delay Report
100% (1)
Technical Delay Report
1 page
Best Practices in Teaching Mathematics: Closing The Achievement Gap
No ratings yet
Best Practices in Teaching Mathematics: Closing The Achievement Gap
24 pages
GE Fanuc Automation: Computer Numerical Control Products
No ratings yet
GE Fanuc Automation: Computer Numerical Control Products
1,266 pages
WSP ELE ES 002 00 Engineering Specification For Electrical Facilities
No ratings yet
WSP ELE ES 002 00 Engineering Specification For Electrical Facilities
24 pages
Entrepreneurship: Quarter 1 - Module 1
No ratings yet
Entrepreneurship: Quarter 1 - Module 1
23 pages
Bo de Thi Tieng Anh Lop 4 Hoc Ki 1 Co Dap An
No ratings yet
Bo de Thi Tieng Anh Lop 4 Hoc Ki 1 Co Dap An
60 pages
New Holland E200SR Excavator Workshop Service Repair Manual
No ratings yet
New Holland E200SR Excavator Workshop Service Repair Manual
21 pages
Chapter 6 Curriculum Evaluation Hazel 014525
No ratings yet
Chapter 6 Curriculum Evaluation Hazel 014525
10 pages
History of Computers DBS
No ratings yet
History of Computers DBS
34 pages
Capital Budget
No ratings yet
Capital Budget
15 pages
01 Bio Cell 2024
No ratings yet
01 Bio Cell 2024
28 pages
2022 - Digital Transformation Towards Education 4.0
No ratings yet
2022 - Digital Transformation Towards Education 4.0
28 pages
Cladistics and Phylogeny - Notes
No ratings yet
Cladistics and Phylogeny - Notes
6 pages
SIIT Student Handbook
No ratings yet
SIIT Student Handbook
49 pages
DTM Excel Report
No ratings yet
DTM Excel Report
3 pages
Business Research Methods: Problem Definition and The Research Proposal
No ratings yet
Business Research Methods: Problem Definition and The Research Proposal
37 pages
Marisela Frasuto - Beverly Hills Cop
No ratings yet
Marisela Frasuto - Beverly Hills Cop
5 pages
AWS S3 Cheatsheet
No ratings yet
AWS S3 Cheatsheet
10 pages
Self Assessment and Reflection 1
100% (2)
Self Assessment and Reflection 1
7 pages
Only One Mind PDF
No ratings yet
Only One Mind PDF
34 pages
Singing Competition Themanoor
No ratings yet
Singing Competition Themanoor
4 pages
DBMS Lab 6
No ratings yet
DBMS Lab 6
3 pages
Tutorial 5
No ratings yet
Tutorial 5
3 pages
Open or Closed Communion?: by A. C. Sas
No ratings yet
Open or Closed Communion?: by A. C. Sas
8 pages
Safety Data Sheet: 1. Identification of The Substance/Mixture and The Supplier
No ratings yet
Safety Data Sheet: 1. Identification of The Substance/Mixture and The Supplier
8 pages
After You Graduate You Get A Job in A Small
No ratings yet
After You Graduate You Get A Job in A Small
2 pages
Usg Plasters Hydrocal Gypsum Cements Sealers Parting Compounds Brochure en IG515
No ratings yet
Usg Plasters Hydrocal Gypsum Cements Sealers Parting Compounds Brochure en IG515
2 pages
Il999 sf123 Spec
No ratings yet
Il999 sf123 Spec
1 page