Assignment
Assignment
1. How will you treat text having short cut words (like bcz, u, thr etc…)?
Ans. This type of short cut words are present in the chat or social post on social
networking sites, users share their feeling on the social site or on chats with using
these type of shortcut words. Because there is limit of text content on some social
sites like tweeter, for this reason users have to post their emotions on them with
some short cut key words like LOL,IMAO,BCZ,U etc. If we have a lot of data, then
we can apply the text mining model on them, or we can create our own
abbreviation list for these type of shortcut words and use them in our pre-
processing of the text or we can also remove them from the data if we don’t have
huge data.
2. Write R and python code to replace “bcz” with “because” in whole text?
3. How do you deal with the English text having Hindi words in between?
Ans. The very first thing we need is a corpus of Hindi text, which can be extracted
from the web. Then we can apply algorithms for English we can use bag of words,
naive bayes, SVM and for Hindi we can try Hindi wordnet. We also can use the
language translation tool to convert the text into English and do the sentiment
analysis. Sentiment Analysis has become the area of deep research due to
necessity wrought about by the advent of social media tools such as Twitter,
Facebook, WhatsApp, etc. These social media tools have made their mark when
they became the primary tool for the collaboration. Sentiment Analysis of such a
plain text in English, Hindi or Marathi is already a solved problem, with
improvements being done each day and new tools coming up to increase the
accuracy. However, if the same is to be done for code-mix script then it becomes
a much involving task with accuracies ranging from very low to somewhat
medium. There is a growing need to inculcate the ability to analyze the mixed
script, if the benefits mentioned in the previous paragraphs are to be achieved.
And therein lies the real problem with great amount of work being done on the
subject matter. In this research, we focus on the two combinations, as mentioned
above, English-Hindi and English-Marathi, with the script being used is of Latin
characters. The reason for the choice of these characters is due to the usage by
the people for communication, including the author himself.
4. Write R code to connect with this public API - http://www.omdbapi.com
Ans.
5. What are the different methods to deploy a model into production system?
Ans. Most of the time, energy and resources are spent on training the model to
achieve the desired results, so allocating additional time and energy to decide on
the computational resources to set up the appropriate infrastructure to replicate
the model for achieving similar results in a different environment (production) at
scale will be a difficult task. Overall, it’s a lengthy process that can easily take up
months right from the decision to use DL to deploying the model. One of the
challenging problems in today’s data science is the deployment of the trained
model in production for any consumer-centric organizations or individuals who
want to make their solutions reach a wider audience. Below a five best practice
steps that you can take when deploying your predictive model into production.
Software deployment is all of the activities that make a software system available
for use.
Following are some ways to deploy our data models into the Production system: