0% found this document useful (0 votes)
29 views3 pages

LAB 4 - Amazon Comprehend

The document describes how to configure and run a sentiment analysis job using Amazon Comprehend. It explains how to define the input and output configurations, start the job, check the status, and retrieve and analyze the results. The results are mapped to numerical values and compared to the test data using a confusion matrix and various performance metrics.

Uploaded by

Rotenda Mantsha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views3 pages

LAB 4 - Amazon Comprehend

The document describes how to configure and run a sentiment analysis job using Amazon Comprehend. It explains how to define the input and output configurations, start the job, check the status, and retrieve and analyze the results. The results are mapped to numerical values and compared to the test data using a confusion matrix and various performance metrics.

Uploaded by

Rotenda Mantsha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Challenge: Configuring the Amazon Comprehend job parameters

In the next cell, configure the Amazon Comprehend job parameters.

In input_data_config -
S3Uri: Replace <S3_INPUT_GOES_HERE> with the test_uri that was defined previously
InputFormat: Replace <INPUT_FORMAT_GOES_HERE> with ONE_DOC_PER_LINE
In output_data-config -
S3Uri: Replace <S3_OUTPUT_GOES_HERE> with the s3_output_location
data_access_role_arn: Replace
arn:aws:iam::899882598055:role/service-role/c67833a1330685l3262871t1w-
ComprehendDataAccessRole-1EAP15HQRX9QE with the Amazon Resource Name (ARN) from the
Lab Details file
input_data_config={
'S3Uri': 'S3_INPUT_GOES_HERE',
'InputFormat': 'INPUT_FORMAT_GOES_HERE'
},

output_data_config={
'S3Uri': 'S3_OUTPUT_GOES_HERE'
},
data_access_role_arn =
'arn:aws:iam::899882598055:role/service-role/c67833a1330685l3262871t1w-
ComprehendDataAccessRole-1EAP15HQRX9QE'

### BEGIN_SOLUTION
input_data_config={
'S3Uri': test_url,
'InputFormat': 'ONE_DOC_PER_LINE'
}
output_data_config={
'S3Uri': s3_output_location
}
data_access_role_arn =
'arn:aws:iam::899882598055:role/service-role/c67833a1330685l3262871t1w-
ComprehendDataAccessRole-1EAP15HQRX9QE'
### END_SOLUTION
Now that you defined the job parameters, start the sentiment detection job.

response = comprehend.start_sentiment_detection_job(
InputDataConfig=input_data_config,
OutputDataConfig=output_data_config,
DataAccessRoleArn=data_access_role_arn,
JobName='movie_sentiment',
LanguageCode='en'
)

print(response['JobStatus'])
The following cell will loop until the job is completed. (This step might take a
few minutes to complete.)

%%time
import time
job_id = response['JobId']
while True:
job_status=(comprehend.describe_sentiment_detection_job(JobId=job_id))
if job_status['SentimentDetectionJobProperties']['JobStatus'] in
['COMPLETED','FAILED']:
break
else:
print('.', end='')
time.sleep(15)
print((comprehend.describe_sentiment_detection_job(JobId=job_id))
['SentimentDetectionJobProperties']['JobStatus'])
When the job is complete, you can return the details from the job by calling the
describe_sentiment_detection_job function.

output=(comprehend.describe_sentiment_detection_job(JobId=job_id))
print(output)
In the OutputDataConfig section, you should see the S3Uri. Extracting that URI will
give you the file that you must download from Amazon S3. You can use the results to
calculate metrics in the same way that you calculated the results from a batch
transformation by using an algorithm.

comprehend_output_file = output['SentimentDetectionJobProperties']
['OutputDataConfig']['S3Uri']
comprehend_bucket, comprehend_key = comprehend_output_file.replace("s3://",
"").split("/", 1)

s3r = boto3.resource('s3')
s3r.meta.client.download_file(comprehend_bucket, comprehend_key, 'output.tar.gz')

# Extract the tar file


import tarfile
tf = tarfile.open('output.tar.gz')
tf.extractall()
The extracted file should be named output. Read the the lines in this file.

import json
data = ''
with open ('output', "r") as myfile:
data = myfile.readlines()
Add the lines to an array.

results = []
for line in data:
json_data = json.loads(line)
results.append([json_data['Line'],json_data['Sentiment']])
Convert the array to a pandas dataframe.

c = pd.DataFrame.from_records(results, index='index',
columns=['index','sentiment'])
c.head()
The results contain NEGATIVE, POSITIVE, NEUTRAL, and MIXED results instead of
numerical values. To compare these results to your test data, they can be mapped to
numerical values, as shown in the following cell. The index in the returned results
is also out of order. The sort_index function should fix this issue.

class_mapper = {'NEGATIVE':0, 'POSITIVE':1, 'NEUTRAL':2, 'MIXED':3}


c['sentiment']=c['sentiment'].replace(class_mapper)
c = c.sort_index()
c.head()
# Build list to compare for Amazon Comprehend
test_2 = test.reset_index()
test_3 = test_2.sort_index()
test_labels = test_3.iloc[:,2]
You can display a confusion matrix by using the plot_confusion_matrix function.
Because Amazon Comprehend also includes mixed and neutral in the results, the chart
will be different.
plot_confusion_matrix(test_labels, c['sentiment'])
The existing function to print metrics won't work because you have too many data
dimensions. The following code cell will calculate the same values.

cm = confusion_matrix(test_labels, c['sentiment'])

TN = cm[0,0]
FP = cm[0,1]
FN = cm[1,0]
TP = cm[1,1]

Sensitivity = float(TP)/(TP+FN)*100
# Specificity or true negative rate
Specificity = float(TN)/(TN+FP)*100
# Precision or positive predictive value
Precision = float(TP)/(TP+FP)*100
# Negative predictive value
NPV = float(TN)/(TN+FN)*100
# Fall out or false positive rate
FPR = float(FP)/(FP+TN)*100
# False negative rate
FNR = float(FN)/(TP+FN)*100
# False discovery rate
FDR = float(FP)/(TP+FP)*100
# Overall accuracy
ACC = float(TP+TN)/(TP+FP+FN+TN)*100

print(f"Sensitivity or TPR: {Sensitivity}%")


print(f"Specificity or TNR: {Specificity}%")
print(f"Precision: {Precision}%")
print(f"Negative Predictive Value: {NPV}%")
print( f"False Positive Rate: {FPR}%")
print(f"False Negative Rate: {FNR}%")
print(f"False Discovery Rate: {FDR}%" )
print(f"Accuracy: {ACC}%")

You might also like