NLP - Project 2
NLP - Project 2
PROJECT
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited
AIML MODULE PROJECT
5
AIML module projects are designed
1
to have a detailed hands on to
integrate theoretical knowledge with
actual practical implementations.
Takeaways 4
AIML module projects are designed
to be scored using a prede ined
rubric based system.
AIML
MODULE
PROJECT
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited Page 1
f
AIML MODULE PROJECT
SEQUENTIAL
NLP
AIML module project Part I and II consists of industry based
NLP dataset which can be used to design a text classi ier using
sequential NLP models.
TOTAL
SCORE 60
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited Page 2
f
AIML MODULE PROJECT
PART
ONE PROJECT BASED TOTAL
SCORE 30
• DOMAIN: Digital content and entertainment industry
• CONTEXT: The objective of this project is to build a text classi ication model that
analyses the customer's sentiments based on their reviews in the IMDB database. The
model uses a complex deep learning model to build an embedding layer followed by
a classi ication algorithm to analyse the sentiment of the customers.
• DATA DESCRIPTION: The Dataset of 50,000 movie reviews from IMDB, labelled by
sentiment (positive/negative). Reviews have been preprocessed, and each review is
encoded as a sequence of word indexes (integers). For convenience, the words are
indexed by their frequency in the dataset, meaning the for that has index 1 is the
most frequent word. Use the irst 20 words from each review to speed up training,
using a max vocabulary size of 10,000. As a convention, "0" does not stand for a
speci ic word, but instead is used to encode any unknown word.
• PROJECT OBJECTIVE: Build a sequential NLP classi ier which can use input text
parameters to determine the customer sentiments.
Hint: The aim here Is to import the text, process it such a way that it can be taken as an inout to the ML/NN
classi iers. Be analytical and experimental here in trying new approaches to design the best model.
6. Use the designed model to print the prediction on any one sample.
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited Page 3
f
f
f
f
f
f
AIML MODULE PROJECT
PART
TWO PROJECT BASED TOTAL
SCORE 30
• DOMAIN: Social media analytics
• CONTEXT: Past studies in Sarcasm Detection mostly make use of Twitter datasets collected
using hashtag based supervision but such datasets are noisy in terms of labels and
language. Furthermore, many tweets are replies to other tweets and detecting sarcasm in
these requires the availability of contextual tweets.In this hands-on project, the goal is to
build a model to detect whether a sentence is sarcastic or not, using Bidirectional LSTMs.
• DATA DESCRIPTION:
The dataset is collected from two news websites, theonion.com and hu ingtonpost.com.
This new dataset has the following advantages over the existing Twitter datasets:
Since news headlines are written by professionals in a formal manner, there are no spelling mistakes and
informal usage. This reduces the sparsity and also increases the chance of inding pre-trained embeddings.
Furthermore, since the sole purpose of TheOnion is to publish sarcastic news, we get high-quality labels with
much less noise as compared to Twitter datasets.
Unlike tweets that reply to other tweets, the news headlines obtained are self-contained. This would help us in
teasing apart the real sarcastic elements
Content: Each record consists of three attributes:
is_sarcastic: 1 if the record is sarcastic otherwise 0
headline: the headline of the news article
article_link: link to the original news article. Useful in collecting supplementary data
Reference: https://github.com/rishabhmisra/News-Headlines-Dataset-For-Sarcasm-Detection
• PROJECT OBJECTIVE: Build a sequential NLP classi ier which can use input text parameters
to determine the customer sentiments.
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited Page 4
f
f
f
ff
f
AIML MODULE PROJECT
LEARNING
OUTCOME
Hands on experience on importing, pre-processing and computing a text
dataset using python.
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited Page 5
f
AIML MODULE PROJECT
THAT’s YOU
Assume that you are working at the company which
has received the above problem statement from
internal/external client. Finding the best solution for
the problem statement will enhance the business/
operations for your organisation/project. You are
responsible for the complete delivery. Put your best
analytical thinking hat to squeeze the raw data into
relevant insights and later into an AIML working model.
PLEASE NOTE
Designing a data driven decision product typically traces the following process:
1. Data and insights:
Warehouse the relevant data. Clean and validate the data as per the the functional requirements of the problem statement. Capture and validate
all possible insights from the data as per the the functional requirements of the problem statement. Please remember there will be numerous
ways to achieve this. Sticking to relevance is of utmost importance. Pre-process the data which can be used for relevant AIML model.
2. AIML training:
Use the data to train and test a relevant AIML model. Tune the model to achieve the best possible learnings out of the data. This is an iterative
process where your knowledge on the above data can help to debug and improvise. Di erent AIML models react di erently and perform
depending on quality of the data. Baseline your best performing model and store the learnings for future usage.
Design a trigger or user interface for the business to use the designed AIML model for future usage. Maintain, support and keep the model/
product updated by continuous improvement/training. These are generally triggered by time, business or change in data.
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited Page 6
ff
ff
AIML MODULE PROJECT
IMPORTANT
POINTERS
Project should be submitted as a single “.html” and “.ipynb” ile. Follow the below
best practices where your submission should be:
• ”.html” and ".ipynb" iles should be an exact match.
• Pre-run codes with all outputs intact.
• Error free & machine independent i.e. run on any machine without adding any extra code.
• Well commented for clarity on code designed, assumptions made, approach taken, insights
found and results obtained.
HAPPY
LEARNING
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited Page 7
f
ff
f