0% found this document useful (0 votes)
2 views

Assignment

The document discusses handling shortcut words in text, particularly in social media contexts, and suggests using text mining models or creating abbreviation lists for preprocessing. It also addresses the challenges of analyzing English text with Hindi words, emphasizing the need for appropriate algorithms and tools for mixed scripts. Additionally, it outlines various methods for deploying machine learning models into production, including data mining tools, programming languages, and the use of PMML for predictive models.

Uploaded by

kmzdr1
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment

The document discusses handling shortcut words in text, particularly in social media contexts, and suggests using text mining models or creating abbreviation lists for preprocessing. It also addresses the challenges of analyzing English text with Hindi words, emphasizing the need for appropriate algorithms and tools for mixed scripts. Additionally, it outlines various methods for deploying machine learning models into production, including data mining tools, programming languages, and the use of PMML for predictive models.

Uploaded by

kmzdr1
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment

1. How will you treat text having short cut words (like bcz, u, thr etc…)?

Ans. This type of short cut words are present in the chat or social post on social
networking sites, users share their feeling on the social site or on chats with using
these type of shortcut words. Because there is limit of text content on some social
sites like tweeter, for this reason users have to post their emotions on them with
some short cut key words like LOL,IMAO,BCZ,U etc. If we have a lot of data, then
we can apply the text mining model on them, or we can create our own
abbreviation list for these type of shortcut words and use them in our pre-
processing of the text or we can also remove them from the data if we don’t have
huge data.
2. Write R and python code to replace “bcz” with “because” in whole text?

3. How do you deal with the English text having Hindi words in between?

Ans. The very first thing we need is a corpus of Hindi text, which can be extracted
from the web. Then we can apply algorithms for English we can use bag of words,
naive bayes, SVM and for Hindi we can try Hindi wordnet. We also can use the
language translation tool to convert the text into English and do the sentiment
analysis. Sentiment Analysis has become the area of deep research due to
necessity wrought about by the advent of social media tools such as Twitter,
Facebook, WhatsApp, etc. These social media tools have made their mark when
they became the primary tool for the collaboration. Sentiment Analysis of such a
plain text in English, Hindi or Marathi is already a solved problem, with
improvements being done each day and new tools coming up to increase the
accuracy. However, if the same is to be done for code-mix script then it becomes
a much involving task with accuracies ranging from very low to somewhat
medium. There is a growing need to inculcate the ability to analyze the mixed
script, if the benefits mentioned in the previous paragraphs are to be achieved.
And therein lies the real problem with great amount of work being done on the
subject matter. In this research, we focus on the two combinations, as mentioned
above, English-Hindi and English-Marathi, with the script being used is of Latin
characters. The reason for the choice of these characters is due to the usage by
the people for communication, including the author himself.
4. Write R code to connect with this public API - http://www.omdbapi.com

Ans.

5. What are the different methods to deploy a model into production system?

Ans. Most of the time, energy and resources are spent on training the model to
achieve the desired results, so allocating additional time and energy to decide on
the computational resources to set up the appropriate infrastructure to replicate
the model for achieving similar results in a different environment (production) at
scale will be a difficult task. Overall, it’s a lengthy process that can easily take up
months right from the decision to use DL to deploying the model. One of the
challenging problems in today’s data science is the deployment of the trained
model in production for any consumer-centric organizations or individuals who
want to make their solutions reach a wider audience. Below a five best practice
steps that you can take when deploying your predictive model into production.
Software deployment is all of the activities that make a software system available
for use.
Following are some ways to deploy our data models into the Production system:

 Data mining tools


 RevoDeployR
 Orange
 Weka
 R
 Xl Miner, etc.
 Programming language
 Java
 C
 Visual Basic
 Python
 R
 Database and SQL scripts
 TSQL
 PL-SQL
 Windows Azure
PMML:- PMML stands for "Predictive Model Markup Language". It is the standard
to represent predictive solutions. A PMML file may contain a pyramid of data
transformations (pre- and post-processing) as well as one or more predictive
models.
Schedulers: - Scheduling is a decision-making process that plays an important role
in most manufacturing and service industries. It is used in procurement and
production, in transportation and distribution, and in information processing and
communication. The scheduling function usually uses mathematical techniques or
heuristic methods to allocate limited resources to the processing of tasks. A
proper allocation of resources enables the company to optimize its objectives and
achieve its goals. Resources may be machines in a workshop, runways at an
airport, or crews at a construction set. Tasks may be operations in a workshop,
takeoff and landings at an airport, or stages in a construction project. Each task
may have a priority level, an earliest possible starting time, and a due date. The
objectives may also take many forms, such as minimizing the time to complete all
tasks or minimizing the worst performance of the schedule.

You might also like