0% found this document useful (0 votes)
8 views15 pages

Data Digest - June 2022 Edition

Uploaded by

VAIBHAV PATIL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views15 pages

Data Digest - June 2022 Edition

Uploaded by

VAIBHAV PATIL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DATA DIGEST

BITS & BYTES

[email protected]
18XHT46RCY

JUNE
2022 EDITION
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

WHAT’S INSIDE?

Spotlight 03

Great Learning Journey 04

MU SIGMA Hackathon 05

That’s a Good Question! 06

Data Science at Work 09

[email protected]
18XHT46RCY Discover 11

What’s New? 12

Data Science Crossword 14

This file is meant for personal use by [email protected] only. 02


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

SPOTLIGHT
enhance customer engagement. Using Recency
Frequency Monetary (RFM) analysis technique,
Devising Schemes based on Customer three customer segments were identified as Gold,
Segmentation for Gaming Industry Silver and Bronze.

This project was presented by Akshata Salvi,


Dhanshree Gaikwad, Neha Bhangale, and Pranati Now let us understand what were the key

Jha, who were enrolled in the PGPDSBA course. observations.

Their paper got selected for AICTE Sponsored


Each segment’s buying pattern was analyzed
Online International Conference on Data Science,
to suggest the optimum combination of games
Machine learning and Its Applications and a
and customized packages for that segment.
follow-up paper was published in the conference
Association Rule Mining Technique was used
journal. Let us learn about the project in brief.
to extract information from the customer’s
transaction history. Initial data exploration reported
The problem statement was to increase customer
that the Arcade games contribute approximately
engagement over the gaming platforms.
50% to the total revenue for the centre. ‘Pacman’
being one of the highest-grossing games under
In today’s world, insights from customer
Arcade, one recommendation for the Gold tier is
interactions are primarily important to provide
to combine it with ‘SuperKeeper’ to boost revenue.
valuable customer experience and top-notch
customer service. Customer segregation can help
Fridays’, where various combinations of games
businesses understand customer needs, purchase
[email protected]
18XHT46RCY can be played at certain price points, to increase
trends and interests. These attributes can aid in
footfall on Fridays, which happens to be the least
creating customized marketing plans, packages
popular day of the week.
and advertisement campaigns specific to the
intended groups. Gaming centres, that combine
sports, virtual reality, music and dining, are These conclusions were pretty useful in formulating

becoming one of the fastest-growing industries in customer engagement strategies for the future.

the entertainment sector. They are always on the


lookout for opportunities to increase the footfall.
With a diverse product line-up of video games,
arcade games, virtual reality games, simulation
games, karting etc. along with dining options,
the gaming centres serve as a hub for leisure and
refreshment.

Let us understand how the study was conducted


and the approach used to address the problem.

The study focused on new combinations that can

much more variety in terms of games along with


attractive deals and packages thereby increasing
their profit. The goal here is to work on customer

This file is meant for personal use by [email protected] only. 03


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

GREAT LEARNING JOURNEY

The mentored learning sessions made the learning


convenient and provided the much-needed
guidance to cruise through all the modules
smoothly. All the sessions were creative, and
informative and served as an ocean of knowledge.
The mentor taught in a way that ensured concepts
were clear. The mentor had a key role to play as
due to his efforts and teaching skills the course
subjects seemed extremely understandable.

VIPIN BHATIA I have been using various data science concepts in


my current work process. This includes techniques
PGPDSBA ALUMNUS
learnt from courses such as predictive modelling
and time series forecasting among many others.
I graduated in Mining Engg. from IIT (ISM)
Dhanbad in 2004. I have learnt a lot while working
My friendly advice to people seeking a transition
in various roles such as Engg. (Mines) in core
or upskilling will be to stay focused and enjoy the
mining operations and then into sales, followed by
learning path. The course and mentoring sessions
marketing in the mining engineering domain.
will surely help you learn the key data science
[email protected]
18XHT46RCY
Currently, I am working as a Regional Manager with
Epiroc based out of Chennai and handling a team
learnt a lot during my time and from the subjects
shaping them to achieve great heights. My journey
taught which are facilitating my professional
has been a phenomenal learning experience and
growth.
provided great exposure. Read about my journey
with Great Learning’s PGP Data Science and
Business Analytics Course in my own words.

I felt that the biggest challenge was learning an


entirely different subject including Python, which

this challenge and motivated myself to learn


something new that is going to dominate the
world in terms of skills and opportunities.

I knew that Great Lakes is a reputed institute.


However, due to the current scenario, the learning
had to be online. But, this never proved to be a
bottleneck as the mentor was great, mentored
learning sessions were fruitful and I got to interact
with a good bunch of batch mates. Overall, it was
a happy-go-lucky kind of experience for me.

This file is meant for personal use by [email protected] only. 04


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

MU SIGMA HACKATHON
The May month was marked by a Mega Hackathon features and reducing sub-levels in categorical
event - Great Learning MU Sigma Hackathon. It variables. The real-time leaderboard in the
went on for 4 days from 19th May until 22nd May. hackathon raised spirits. It was a big challenge to
The competition was intense and there was a enter into the top 5.”
great deal of enthusiasm amongst the learners.
More than 1400+ learners registered for the Paushali Dutta says “The MuSigma hackathon
event and altogether 10800+ submissions were was a great experience which challenged my
made after multiple rounds of iterations during skills as a data science student. I enjoyed solving
the event. The participants were always on the case study because it enabled me to put my
their toes as the leaderboard kept on changing knowledge to practice. Hackathons like these
dynamically. While the event had rewards for promote healthy competition between like-
the taking that was worth INR 25k for the Top minded individuals to build a model with the best
3 performers (15k, 7.5k and 2.5k respectively) optimal solution. Eagerly looking forward to more
and interview opportunities for the Top 10 interesting opportunities like this in the future. It
Great Learners on the leaderboard. It must be was indeed a great learning experience.”
understood that such events are not to be missed
at any cost as there is something for everyone Shrikanth Madan says “I participated in a
who had participated. Participation in itself hackathon conducted by Mu-Sigma between 19th
is highly useful as it immensely improves the and 22nd of May. It was very competitive and I
[email protected]
18XHT46RCY level of students in their abilities and
confidence had to be constantly on my toes to secure a good
accelerates their learning. We have also captured place. The competitive nature of the hackathon
some heartwarming experiences from the top gave me a lot of learning both technically and
performers. psychologically. In the end, I was only able to
secure the 6th position, but in my mind I knew
Rakesh Patnala says “The Mega Hackathon was that I had come a long way, getting into the
very challenging as well as fun to participate in. course without an ounce of coding knowledge to
Checking the leaderboard to get to the top made make it to the top 10. It was truly a phenomenal
it more exciting. It was my first Hackathon and event which made me push beyond my limit.”
would always be a very memorable experience.
Thank you Musigma and Great Learning for the
wonderful opportunity and hope we will get to
witness more such events in the future.”

Marripally Ravikumar says “It was a great learning


experience. It provided a good chance to review
the concepts and apply them to improve the
metrics. I got an opportunity to explore ensemble
learning algorithms like Random Forest, Extreme
Gradient Boosting, Light Gradient Boosting and
Cat boosting to compete with other participants.
I explored feature engineering techniques like
dropping insignificant features, creating new

This file is meant for personal use by [email protected] only. 05


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

THAT’S A GOOD QUESTION!


In this edition, we will be focusing on why logistic Regression is a supervised learning technique
regression is considered a regression technique. where the target or dependent or the output
Let us hear it from our mentors. variable is continuous. For example, measures
like height, weight, length, price, etc. fall under
Netali Agarwal says that Linear Regression continuous values.
and Logistic Regression are two such machine Some common use cases where Regression is
learning algorithms that every data scientist would used:
have stumbled on at least once in their journey. 1. Predicting the stock price
Linear Regression is used to solve regression 2. To find the bonus/variable for the employees
problems and logistic regression is used to based on salary, experience, designation, etc.
solve classification problems. Oh! Then why it 3. To predict the price of the pre-owned cars
is called Logistic Regression instead of Logistic
Classification? Classification is also a supervised learning
technique where the target variable is discrete.
Let us learn about the fundamentals before we That means the output variable is classified - yes or
no, 0 or 1, etc.
the key element here - structured or unstructured. Some common use cases where Classification is
Structured data refers to data organised in a used:
tabular format, well-defined and easily readable 1. To find whether the borrower will default on the
[email protected]
while unstructured data refers to the raw data
18XHT46RCY loan or not
stored in an unstructured manner. To make it 2. To find whether the email is spam or not
easily understandable, we will focus on structured 3. To find if a person is at risk of heart disease or
data. Structured data can be information such not
as employee details, attendance records, bank
customer details, etc. Based on the historical data between Regression and Classification. Next,
availability, this is further classified as a Supervised let us ask ourselves why Logistic Regression
and Unsupervised Machine Learning technique. is Regression although it solves Classification
problems as well?
Supervised Learning is where the machine is
supervised based on the labelled data. This means
that some data is already tagged with the correct
answer. This is used to train the machine to learn
patterns and then predict the new set of examples.
The Unsupervised Learning technique doesn’t
have labelled data at all. It consists of data that
is grouped based on a similar pattern. But the
question arises why are we discussing all this? The
reason is that supervised learning is further divided
into two categories - Regression and Classification.
Right, now it makes sense.

This file is meant for personal use by [email protected] only. 06


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

In Linear Regression, we predict the dependent


variable Y based on the linear relationship Regression concepts. It finds out the linear
between X and Y variables. The equation for linear regression and then the model is converted into
regression is as given below: the classification model. So actually, it uses two
steps to come up with this model. It is a method
Y = b0 + b1X1 + b2X2 + ….. + bnXn which is based on Linear Regression but it is not
where, used to predict any values but used to predict
Y = Dependant/Output Variable classes. The response variable is categorical and
X1,X2,Xn = Independent variables not a continuous number. The target variable will
b0 = Y intercept be defaulters, non-defaulters, etc. As mentioned,
b1, b2, bn = Coefficient of slope we use multi-class classification also with some
tweaks like one with the rest concept (OVR).
In Logistic Regression, we do the same thing but
with a small addition of a function known as the
Sigmoid Function to predict the output Y.

Y =Sigmoid (b0 + b1X1 + b2X2 + ….. + bnXn)

Logistic Regression is regressing for the probability


of categorical output. Thus, both Linear and
Logistic Regression use the same concept.

Linear regression gives a continuous value of


[email protected]
18XHT46RCY
output Y for a given input X whereas, logistic
regression gives a continuous value of P(Y=1) for
a given input X, which is later converted to Y=0 or
Y=1 based on a threshold value. That’s the reason,
logistic regression has ‘Regression’ in its name.

Rakesh Ambudkar says Logistic Regression


is purely used for classification. The word
‘Regression’ appears here, which creates a lot of
confusion. Prediction is only about the classes
and not any values. Classes are binary classes
like Class 1 or Class 2., or Defaults, non-defaults,
Malign or Benign. It is purely used for binary
classification though with small change using the
concept of OVR (Ones versus the rest) multiclass
can be combined to predict the desired class. This
is how Logistic Regression can be used for Multi-
Class classification. The algorithm behaves like
a binary classifier but behaves like a multi-class
classification. The reason, it is logistic regression is
because it is a two-stage process.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
07
June 2022 Edition

Internally, it behaves as a binary classifier. To as a dependent variable. This is how the logistic
understand the logistic regression, let’s take up an regression responds.
example. Say, one of the leading banks has hired a
data scientist and the management expects you to The value of p = 1 / 1+e-(mx+c) is called Logistic
develop a model to predict if the loan applicant is function. This logistic function transforms the
a defaulter or non-defaulter with the information. linear model into a probability model. Hence, it
The assumption is that we have the past data of
gets the name regression because it actually does
people who were defaulters or not.
regression and then transforms the regression into
Say, one of the independent variables is
probability.
expenditure. Higher the expenditure, the more
chance of the loan applicant defaulting the
process. There exists a positive correlation between
the expenditure and the default which is measured
in terms of probability (Probability of default and
probability of non-defaults.). Let’s represent the
expenditure on the x-axis and the probability of
default is on the y-axis. A linear relationship can be
established between them by building a model like
Y = mx +c. However, the values of Y tend to move
beyond 1 towards +infinity and less than 0 towards
-infinity. However, the probability values have to
be in the range of 0 to 1 and cannot be negative or
[email protected]
beyond 1.
18XHT46RCY

To overcome this problem, we make the linear


model go through a transformational process. We
first build a linear model, Y=mx+c where m is the

on the y-axis and then send this linear model


through some transformation process. We take the
best fit line as we do in Linear regression and then
substitute it into the transformation function which

p = 1 / 1+ e-(mx+c)

as a dependent variable. This is how the logistic


regression responds.

The value of p = 1 / 1+e-(mx+c) is called Logistic


function. This logistic function transforms the linear
model into a probability model. Hence, it gets the
name regression because it actually does regression
and then transforms the regression into probability.

This file is meant for personal use by [email protected] only.


08
Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

DATA SCIENCE AT WORK


System and Estimation Analysis: Next, the client
and business analysis team finalized the project
estimations while focusing on development,
testing, and deployment considerations.

Development and Testing: Using these estimations,


the design team utilised the inputs supplied. The
team developed the circuitry and incorporated the
standards as were communicated. For compliance
and safety, the team tested the design as per a
standard checklist after the client’s verification.
RAJAT PRAKASH SINGH
This design was shared for site acceptance and
PGPDSBA JUNE 2021
modifications. After the changes were made,
approval was taken and the design was forwarded
optimized the design parameters of a railway for deployment. Rajat analysed how the time taken
project using descriptive and bivariate data for testing the design and final verification was to
analysis” be reduced. He used descriptive and bivariate data
analysis obtained from the business analysis team.
Rajat Prakash Singh works as a Signal Function The objective was to derive meaningful insights as
Design Engineer at Hitachi India. He works for
[email protected] required for design and development.
18XHT46RCY
the development of railway control systems
including design, testing and error analysis. The It was observed that processes like procedural
process involves adhering to specific guidelines testing, checklist testing and finalisation, design
to ensure adequate safety as per client and verification and incorporation of the changes
station requirements. The design phase consists as suggested by the testing team were time-
of obtaining inputs from the concerned authority consuming. Rajat in coordination with various
of the Indian Railway and Research Design and teams created checklists based on the analysis
Standards Organization (RDSO). Thereafter, the done. He also created a centroid value for error
team works on deploying the design. It is divided analysis on basic types of errors for the design
into various phases that are interdependent. team along with a set of guidelines to help
in testing. These guidelines were crucial for
1. Business Analysis inspecting the design during the development
2. System analysis and Estimation stage.
3. Development Stage
4. Testing Stage These guidelines helped the design team in the
empanelment of the pilot project and ensure timely
Business Analysis: It started with the business completion. In addition to it, the design was much
analysis team taking requirements and then more reliable as compared to the previous versions
analyzing them. The key activities here were with a reduced budget. The approach helped Rajat
identification of the budget to be allocated and and his team to deliver projects with an increased
pre-estimation such as the timeline for completion
keeping in mind testing, designing, etc., for the
final output.

This file is meant for personal use by [email protected] only. 09


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

production, domestic sales and export processes


for the years 2023 and 2024. He was able to
deduce that the manufacturing was comparatively
higher during October – March period. It was also
noted that domestic sales and exports peaked
during the same period. The key manufacturing
contributors were analysed using visual scatter
plots for every vehicle segment. The trends were
also studied and it was understood that the export

SATHEESH M pandemic but the domestic sales fell drastically.


PGPDSBA DECEMBER 2021
Satheesh applied multiple linear regression
models as learnt in the predictive modelling
useful insights from automobile manufacturer’s course. The concepts were useful in helping
data by applying various data science tools him transform all the independent variables by
and techniques” applying label encoding and even applied multiple
linear regression having a target variable. The
Satheesh works as a Data Analyst intern in the
predictive approach helped him learn about how
marketing department of ZF Friedrichshafen AG
Commercial Vehicle Control System and completed
accordingly. He also used text analytics to study
his bachelor’s in mechanical engineering. He was
which vehicle type has been a major contributor to
assigned the responsibility of drawing insights
[email protected] sales and is a growing customer preference.
18XHT46RCY
from the data which consisted of information such
as Indian vehicle manufacturers from 2015-2021.
Satheesh made use of his learnings and utilised
The data was obtained from the Society of Indian
techniques such as EDA, data visualization,
Automotive Manufacturers (SIAM) and was largely
encoding, feature engineering, machine learning
unstructured. Let us learn how Satheesh was able
and text analytics. He also recommended the
to derive useful information from this data.
business insights to his manager with the help of a
report. He is thankful to the PGPDSBA program for
Satheesh used MS Excel to restructure the data
helping him with the required skillsets which have
and further created 27000+ and 18 columns. As
boosted his confidence. Satheesh also believes that
per him, it was challenging to handle such a large
these learnings will help him in the long run and in
volume of data and identify how to start the
getting the desired job opportunity.

segments such as two-wheeler, three-wheeler,


passenger carriers and commercial vehicles. The
data had several anomalies such as missing and
erroneous information.

Initially, Exploratory Data Analysis (EDA) was used


to clean the data and organise it. The tools such as
Power BI and Jupyter notebook were used for data
visualisation and quarterly analysis. In addition to
it, Satheesh created five interactive dashboards for
analysis and applied forecasting techniques to

This file is meant for personal use by [email protected] only. 10


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

DISCOVER

Python Main Function and denotes the total number of items or the size of
Examples with Code the list.
To know more in detail, check out the link below:
With its performance, Python has earned a
reputation as the most popular and demanding https://www.mygreatlearning.com/blog/python-
[email protected]
programming language to learn in the world of
18XHT46RCY length-of-list/
software technology. To excel in Python, it is
essential to understand and learn each aspect of
How to Remove an Item From
the Python language. The Python main function is
List Python
a very important aspect of Python.
This article will provide you with deep insights into Python is one of the most popular programming
the main function of Python programming. Let’s languages in today’s market. We can use Python
start by understanding more about the terms from for Machine Learning, Artificial Intelligence, Data
the following link: Mining, Data Analysis, Software Development, Web
Development, etc. The reason behind that is the
https://www.mygreatlearning.com/blog/python-
main/
functionalities is a List that helps programmers to a
great extent. Today we will learn about how we can
remove items from List Python.
Python But before moving on, let’s learn about what lists
are and why we use them from the link:
The Python Length of List is a collection data
type in Python in which we can store our entries. https://www.mygreatlearning.com/blog/remove-
These entries are ordered and can be changed. item-from-list-python/
Python’s len() method helps find the length of an
array, object, list, etc. To implement the list, you
must store a sequence of various types of data in
it. There are alternatives to the list, but the most
reliable option is a list. Python list has a length that

This file is meant for personal use by [email protected] only. 11


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

WHAT’S NEW
Anaconda acquires PythonAnywhere for
cloud expansion

PythonAnywhere, a cloud-based Python


development and hosting ecosystem have
been bought by Anaconda Inc., maker of the
most well-known data science platform in the
world. As a full-featured, cloud-based Python
development environment, PythonAnywhere
frees Python developers from the onerous task
of managing infrastructure and enables them to
easily build web applications within a cloud-based
Python environment—a vital step in sharing and
collaborating within distributed teams.

Python developers now have access to a cloud-


based environment with notebooks, tools, and an
easy method to connect with their team thanks to
PythonAnywhere’s new knowledge, which enables
Anaconda to better support its user community
[email protected]
of over 30 million people. The acquisition occurs
18XHT46RCY
shortly after PyScript, an open-source framework
for running Python applications inside the HTML
environment, was released by Anaconda.

AI-powered robots introduced in


Coimbatore airport

Innovation, according to the Airport Authority of


India (AAI), has a big impact on the civil aviation
industry and has the power to improve airport
operations and amenities for passengers.

To improve the travelling experience for


passengers, the Airport Authority of India has
installed AI-based robots at the Coimbatore
airport. The robots wander throughout the terminal
building of the airport and approach travellers
to greet them and inquire whether they require
assistance. Passengers can use them to access
a variety of airport services and get guidance to
specific airport destinations. To find their intended
sites within the airport, hey can also use the

This file is meant for personal use by [email protected] only. 12


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

robots’ assistance with guidance. The AI-powered To recommend additional lines of code and
robots can be used by travellers to communicate functions given the context of current code,
with airport workers if they so choose. Through Copilot is driven by an AI model called Codex
video conversations, the robots will connect them that has been trained on billions of lines of public
to airport workers. The AI-powered robots can also code. In response to a developer’s description of
set up video chats between travellers and airport what they wish to do (such as “Say hello world”),
workers. Copilot can surface a method or solution based
on its knowledge base and the situation at hand.
Developer’s job could be made easier by Developers can cycle through suggestions for
GITHUB’S AI-powered COPILOT Python, JavaScript, TypeScript, Ruby, Go, and
many other programming languages using Copilot,
Over the previous 12 months, more than 1.2 million and then accept, reject, or manually amend
developers registered to utilise the GitHub Copilot them. Copilot responds to changes made by
preview. This tool will continue to be free for developers by suggesting unit tests that match
verified students and maintainers of well-known implementation code and matching specific coding
open-source projects. According to GitHub, Copilot styles to auto-fill boilerplate or repetitive code
is now writing over 40% of the code in files where patterns.
it is enabled.

[email protected]
18XHT46RCY

This file is meant for personal use by [email protected] only. 13


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

DATA SCIENCE CROSSWORD

[email protected]
18XHT46RCY

ACROSS DOWN
4. Identify the data type from the code- 1. Module in the python standard library parses
L = [2, 54, 'javatpoint', 5] options received from the command line.
5. Python supports the creation of anonymous 2. Used to define a block of code in Python
functions at runtime, using a construct language.
called? 3. Example of user-defined data type.
6. Who developed Python Programming 7. Keyword used for function in Python
Language? language.
9. In which language is Python written? 8. Output of the code -
10. Method used inside the class in python >>>'javatpoint'[5:]
language.

This file is meant for personal use by [email protected] only. 14


Sharing or publishing the contents in part or full is liable for legal action.
June 2022 Edition

LEARNING BIRD CHIRPS:

The key to pursuing excellence is to embrace an


organic, long-term learning process, and not to live
in a shell of static, safe mediocrity. Usually, growth
comes at the expense of previous comfort or safety.

- Josh Waitzkin

[email protected]
18XHT46RCY

THE EDITORIAL TEAM:

Surbhi Bhandari Srijan Purang Abhirup Dey Mack Donald

This file is meant for personal use by [email protected] only. 15


Sharing or publishing the contents in part or full is liable for legal action.

You might also like