0% found this document useful (0 votes)
161 views64 pages

Shubradip Ghosh

This internship report describes projects completed at SkillVertex from January to March 2023 and Feynn Labs from February to June 2023 as part of an MSc program. The intern gained experience with a health expectancy project in python and market segmentation projects using machine learning. The internship provided an opportunity to apply data analysis skills and gain practical work experience.

Uploaded by

Shubradip Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views64 pages

Shubradip Ghosh

This internship report describes projects completed at SkillVertex from January to March 2023 and Feynn Labs from February to June 2023 as part of an MSc program. The intern gained experience with a health expectancy project in python and market segmentation projects using machine learning. The internship provided an opportunity to apply data analysis skills and gain practical work experience.

Uploaded by

Shubradip Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

A Report on

2023
Final Year Internship
In
SkillVertex,Bangalore & FeynnLabs,Guwahati.

(January 2022-May 2022)


Internship Report

Submitted by:
Shubradip Ghosh

MSc. in Biostatistics & Epidemiology Reg No: RA2122021010009


Mentor Name: Dr. Prakash M,
Associate Professor, SPH, SRMIST

Submitted to:

SRM SCHOOL OF PUBLIC HEALTH


SRM Institute of Science and Technology
Kattankulathur-603203
DECLARATION

I, Shubradip Ghosh, hereby declare that the presented report of internship titled A
Report on Final Year Internship in SkillVertex & FeynnLabs is uniquely prepared by
me after the completion of total five months’ virtual internship from both the
organization,
I also confirm that this report is only prepared for my academic requirement for the
partial fulfilment of the required degree of MSc. In Biostatistics & Epidemiology and
has not been/will not be submitted elsewhere for any other purposes.
………………………………………………………………
Shubradip Ghosh,

RA2122021010009,
MSc. In Biostatistics & Epidemiology,
School of Public Health,
Signature
SRM Institute of Science & Technology.

I certify that the above declaration is true to the best of my knowledge and belief.

Organization Guide:
School of public health,
SRMIST,
Mentor:
Sanjay Basumatary,Founder @ Feynn Labs|work4ai.com
Mayank Gathole,Academic Head @ SkillVertex
Dr. Prakash M.

DEAN
Dr.Hari Singh,
School of Public Health,
SRMIST.
ACKNOWLEDGEMENT

I would first want to express my sincere gratitude to my mentor, Dr. Prakash M., an associate
professor, for his constant and invaluable advice and ideas throughout my internship in all areas. I
want to thank my campus dean, Dr. Hari Singh as well as the entire professors and staff of the
School of Public Health at SRMIST for their constant support.
I would like to extend my sincere gratitude to Mayank Gathole, Academic Head at SkillVertex, for
providing me with this opportunity.
It is with great pleasure that I extend my heartfelt thanks to Sanjay Basumatary, founder of Feynn
Labs|work4AI.com, who, despite his busy schedule, took the time to listen to me, give me advice,
and keep me on the right track.
I received a fantastic opportunity for learning and professional growth with the internships
I had with both organisations. I therefore feel myself to be a lucky person since I was
given the chance to join them. I am also appreciative of the opportunity to get to know so
many lovely people and professionals who guided me during my internship.
My father and friends' unwavering love, inspiration, and support have allowed me to finish
my obligations on time, and I am incredibly appreciative of them.
I want to thank everyone who supported me during my internship and send my best
wishes..
TABLE OF CONTENTS

Sl.No Particulars Page


No.
1 Declaration 01

2 Acknowledgement 02

4 Summary 06

5 Introduction 07-08

6 Organization profile 08-10

7 Description of department 10

8 Life Expectancy prediction project 11-14

References 15

9 Market segmentation 16-18

10 EV Market segmentation 18-22

11 Online cab booking project 23-30

11 learnings 31

12 challenges 32

13 Achievements 33

14 Conclusion 34

16 Annexure -A 35

17 Internship Certificates 36-38


SUMMARY
This internship report is based on a five-month internship programme that I successfully
completed at SkillVertex from January 5th to Mar 5th , 2023 and 27th Jan to Jun 27th ,2023
from Feynn Labs as part of the MSc. In Biostatistics & Epidemiology programme at
SRMIST's School of Public Health. As someone who is completely new to practical and
real-life work environment experience, every hour spent at doing projects from diverse fields
provided me with some level of experience, all of which is indescribable. Nonetheless, they
were all beneficial to my career.

This internship report describes the major projects I completed for academic and
nonacademic purposes. The purpose of this document is to identify and describe the data
collected, projects completed, and experience gained, with a focus on the intern's
accomplishments.

Finally, the internship has been a fantastic learning opportunity. My practical knowledge,
AI/ML work, communication skills,learning different software & statistical tools and data
analysis skills all are improved. I learned about my strengths, which will prove beneficial
in the future, as well as my weaknesses, which I can work on to improve.

I attained new skills and abilities, as well as the chance to improve my practical knowledge
of concepts learned in the School of Public Health. Many of my learning goals were
achieved. The work I did during my internship gave me new perspectives and inspired me
to pursue a career in statistical field. It taught me important lessons that will undoubtedly
aid me in my future employment, as well as how to prepare for my career path and how to
manage my time.
1.1INTRODUCTION:

This report provides a compilation of my five-month internship as part of the Master


programme curriculum at SRM Institute of Science and Technology's, School of Public
Health. This internship was completed at Skill Vertex , Bengaluru, Karnataka 560102 and
Feynn Labs , Guwahati,assam.

I was hired by SkillVertex as an intern. I began my internship on January 5th, 2023 and
completed on March 5th , 2023.Further I was hired by Feynn Labs as an intern on 27th Jan
2023 and will complete my internship by the end of June,2023. My internship supervisor
was Mayank Gathole,Academic Head @ SkillVertex. When I joined the internship, I was
assigned a health expectancy project. I worked in python programming language and
health expectancy project for two months (January 5th to Mar 5th ) and I continued my
internship from another organization ,(FeynnLabs) there again I have been assigned with
several market segmentation projects for rest four months (Feb 27th to June 27th ).Both the
internships were virtual/remote basis. In market segmentation Project I worked under the
supervision of Sanjay Basumatary, Founder @ Feynn Labs | work4ai.com. I consider
myself lucky to have Sanjay Basumatary and Mayank Gathole as my supervisors, since
they would always find time for my questions and be willing to help.

I was provided with the opportunity to work on four different projects. Initially, I worked
on project called health expectancy project, which contains python programming
language,AI/ML tools, datascience, advance statistics, data cleaning, organizing, and
collecting data, and submitting a final report to skillvertex portal; and in Feynn Labs AI
Product/Service Prototyping, Market Segmentation using Machine Learning and Data
Analysis(EV Market & Online Cab booking), AI Product / Service Business and Business
and Financial Modelling,.
In those five months, I gained sufficient knowledge and experience. I can't
emphasize enough how pleasant all of my coworkers are, and how they all
contributed to the positive atmosphere at work and made me feel welcome from
the first day of my internship from both the organizations.
OBJECTIVES OF THE INTERNSHIP
1. Improve my written and oral communication skills especially when working
collaboratively on a team.
2. Gain confidence in myself and become more flexible when addressing
and/unfamiliar tasks.
3. To understand the functioning and working conditions of an organization
4. To gain practical knowledge in applying data analysis skills, techniques and theory
through working with professionals.
5. To see if this kind of work is a possibility for my future career.
6. To see what skills and knowledge I still need to work in a professional environment.
7. To Acquire ability to work as part of a team and respond effectively to the ideas of
experts.
8. To gain good knowledge in Research studies, Project management skills and to
build network.
9. To practice problem based learning in an authentic supervised environment.
10. To solve problems in an effective/ creative manner in a challenging position.

1.2 ORGANIZATION PROFILE:

SkillVertex

SkillVertex is an edtech organization that aims to provide upskilling and training to students as
well as working professionals by delivering a diverse range of programs in accordance with their
needs and future aspirations. With respect to the emerging industrial requirements and
technologies, also assist in career development, additional counselling guidance and mentorship
in the respective domains.

Industrial simulation
You learn from industrial tutors and undergo practical learning/hands-on experience for a better
take on corporate beginnings
Advance programs are specifically curated to help people grow in their career, or switch fields of
work and become industry-ready. Courses designed with an expert-insight of providing
advanced knowledge, upskilling and real-time experience to college or university students and
entry level professionals. These programs are tutored by experts from the industry who have
hands-on expertise in their domain of work. Undergo intense training from scratch to advance in
your specified domain. Get to work on real-time projects. Attain hands-on experience and
practical knowledge in technical and managerial domains. Get assistance on additional career
growth support including placement assistance, resume building, personality development and
mock interviews.

Projects: Life Expectancy Calculation

Business Problem
Life Expectancy is affected by various factors. WHO wishes to predict life expectancy and determine
which factors has significant impact. From this project, WHO would be able to give a country its life
expectancy and suggestions on which factor to focus on to improve their life expectancy. Based on a
publicly available WHO dataset, this research analyses the variables that affect life expectancy. Data was
gathered during the years of 2000 and 2015. A country’s development status (developed versus
developing), GDP, population, schooling years, alcohol use, BMI, government health spending, health
spending per unit of GDP, various 16mmunization coverage, thinness disease, measles cases, HIV/AIDS
deaths, and the mortality rate of adults, children, and infants were among the factors that were examined.
Data were carefully examined (horizontally and vertically), cleansed, and changed during the processing
stage. Using the Bagged-trees technique, missing values were imputed. Box and whisker plots,
histograms, and multiple factor analyses (MFA) were used in exploratory data analysis (EDA) to explore
and mine the trends within the data. MFA is a method of unsupervised machine learning.

Steps:

1. Problem formulation: Clearly define the problem you want to solve. In this case, the goal
is to predict life health expectancy based on certain factors or variables.
2. Data collection: Gather relevant data that will help you build your predictive model. This
data may include factors like age, gender, lifestyle habits, socioeconomic status,
healthcare access, and any other variables you believe could influence life health
expectancy.
Dataset has been provided with us by the organization.

3. Data preprocessing: Clean the collected data by handling missing values, outliers, and
inconsistencies. Convert categorical variables into numerical representations, normalize
or scale numerical features, and perform any other necessary data transformations.
4. Exploratory data analysis (EDA): Perform EDA to understand the patterns, relationships,
and distributions within the data. Visualize the data using graphs, histograms, scatter
plots, etc., and calculate summary statistics. Identify any interesting insights or
correlations between variables that may help in predicting life health expectancy.
5. Feature selection/engineering: Select the most relevant features that have a significant
impact on life health expectancy. Use statistical techniques, correlation analysis, or
domain knowledge to determine which features to include in your model. Additionally,
create new features by combining or transforming existing ones if it helps improve
predictive performance.
6. Model selection: Choose an appropriate machine learning algorithm to build your
predictive model. Some common algorithms for regression tasks include linear
regression, decision trees, random forests, support vector machines (SVM), or neural
networks. Consider the nature of your data and the complexity of the problem when
selecting the model.
7. Model training: Split your data into training and testing sets. Use the training set to train
your chosen model on the selected features. The model will learn the underlying patterns
and relationships between the features and the target variable (life health expectancy).
8. Model evaluation: Evaluate the performance of your model using appropriate metrics
such as mean squared error (MSE), root mean squared error (RMSE), mean absolute
error (MAE), or R-squared. Compare the predicted values with the actual values from the
testing set. Adjust your model or try different algorithms if the performance is not
satisfactory.
9. Model optimization: Fine-tune your model to improve its performance. Adjust
hyperparameters (e.g., learning rate, regularization parameters) using techniques like grid
search or random search. Cross-validation can help assess the stability and generalization
of your model.
10. Model deployment: Once you are satisfied with your model's performance, deploy it in a
production environment. Make sure to package the model with any necessary
preprocessing steps and provide clear instructions on how to use it to predict life health
expectancy.
11. Monitoring and maintenance: Continuously monitor the performance of your deployed
model. Collect new data over time to keep your model up-to-date and retrain it
periodically to maintain its accuracy. Monitor for concept drift or changes in the data
distribution that may require model updates.

Results:
The results of a multiple linear regression model that passed the assumption tests
indicated that education (Coeff. Est: 1.15), total government health spending (Coeff. Est:
0.08), BMI (0.03), GDP (Coeff. Est: 0.00004), and diphtheria and polio vaccinations
(Coeff. Est: 0.03) and polio vaccinations (Coeff. Est: 0.02) vaccinations are positively
correlated significant variables (p Similar findings are supported by a partial study of the
longitudinal multilevel modeling's entire model, which also found that if a country had a
lower beginning life expectancy in the year 2000. Between 2000 and 2015, it would have
a faster rate of life expectancy improvement (intercept-slope corr = -0.55). Between 2000
to 2015, it suggests an improvement in life expectancy in developing nations that were
linked to lower life expectancy. The whole model also comes to the conclusion that the
average human life expectancy rises year by 0.25 years, with a degree of confidence
ranging from 0.16 years to 0.34 years (p 0.001).

Application:
We can apply this predicting model in various field to predict the life expectancy of humans with
minimal human labor. This can also be used by the insurance companies to have a predication
about its customers as well as other companies working on products affecting human life or to
maintain the work cycle running with the expected life span of its workers. This can be used by
doctors to understand the biological difference and cure measure for the affected patients. This
can also be used by the government to take decisions for human welfare. And to take appropriate
measures to control the population growth and other factors which negatively effects life span of
the people of the country. It also direct the utilization of the increase in human resources and
skillset acquired by people over many years. This could help make common people more aware
of their general health, and its improvement or deterioration over time. This may motivate them
to make healthier lifestyle choices. . And Machine learning is a promising field and with new
researches publishing every day. By this study, open the scope of ecologist, medical scientists
which allows common people to not to depend on expert and know their biological status and
condition of human being in desired region.

Future Scope:
This application has very bright future and large number of applicable cases, it must need to be
upgraded and taken into account in school as well as college syllabus for the welfare and
knowledge purpose. This provides insights in various factors and their levels required to keep the
life expectancy rate as high as expected. It can be used to suggest good health practices and life
style to the users based on their daily activities and provide suggestions for exercises for
improving their health. Pharmaceutical companies can check which diseases impact more people
and therefore impact life expectancy and based on this manufacture medicine. We can also say it
is a time machine which predict the life of someone who haven’t born yet on the factors of
his/her country’s Adult Mortality, Population, Under 5 Deaths, Thinness 1-5 Years, Alcohol,
HIV, Hepatitis B, GDP, Percentage Expenditure, and others. As the technology is growing faster
than ever, as the world is leaning towards more man power need it necessary to improve the
factor by which it can extend the life expectancy of its people.

References:
o Free and Open Machine Learning Release 1.0.1 by Maikel Mardjan Online Resources:
https://developer.ibm.com/tutorials/how-to-create-a-node-redstarterapplication/
o https://github.com/watson-developer-cloud/nodered-labs About API
o https://www.youtube.com/watch?v=s7wmiS2mSXY&feature =youtu.be Introduction to
Machine Learning
o https://developer.ibm.com/technologies/machinelearning/series/learningpath-machine-
learning-fordevelopers/ Data Collection
o https://www.kaggle.com/kumarajarshi/life-expectancy-who APPENDIX Source Code:
https://github.com/SmartPracticeschool/llSPS-INT-1519-Predicting-
o LifeExpectancyusing-MachineLearning/blob/master/Predicting%20Life%20Expectan
Feynn Labs

Feynn Labs Services is the B2B unit of Feynn Labs where we provide Artificial
Intelligence integrated services like Market Analysis and Segmentation, Smart
Regions Program, AI Development & Integration Services etc. to Businesses
using Data Analysis and Machine Learning.
Under their Smart Regions initiative,they aim to bring innovations in Deep Tech
such as loT, Machine Learning, Data Analysis, Blockchain etc. to a region and
meet challenges in the fields of housing, traffic, mobility, social and
environmental issues, employment, education etc.
Our primary mission is to build premier chain of institutes in India where students
will experiment with and apply what they learn hand in hand.
This vision is implemented with our “Project based Top-Down Learning”
approach, focusing on some of the frontier technologies like Artificial Intelligence
(AI), IoT, Augmented Reality(AR), Block chain, Quantum Computing etc

Clients: Services are aimed at Small Businesses and Startups who start with
limited budget and cant afford to perform full scale market analysis and
segmentation for their products and services.
Framework
It combines modern Machine Learning and Data Analysis tools with Expert Human

supervision to provide the best services to the clients. Their framework is structured uniquely

to bring forth the best segmentation results and hence boosting sales of our clients’ products.

Feynn Labs Ecosystem


Its Ecosystem represents the complete structure they aim to create with the
various functioning units of Feynn Labs. A completed Feynn Labs Ecosystem will
maximize and multiply “education, innovation, economy” of any region by 10
folds.

Milestone:
• 1000+ students trained
• 700+AI Product/Service Prototypes Formulated
• 15+ clients served
Projects involved during internship:
I. Case study of Market Segmentation,AI Product/Service Prototyping,
II. Market Segmentation using Machine Learning and Data Analysis (EV Market &
Online Cab booking),
III. AI Product / Service Business and Business and Financial Modelling,.
EDA
We start the Exploratory Data Analysis with some data Analysis drawn from the data without Principal
Component Analysis and with some Principal Component Analysis in the dataset obtained from the
combination of all the data we have. PCA is a statis- tical process that converts the observations of correlated
features into a set of linearly uncorrelated features with the help of orthogonal transformation. These new
trans- formed features are called the Principal Components. The process helps in reducing dimensions of the
data to make the process of classification/regression or any form of machine learning, cost-effective.

Comparison of cars in our data


For Electric Vehicle Market one of the most important key is Charging:
Correlation Matrix: A correlation matrix is simply a table that displays the cor- relation. It is best used in
variables that demonstrate a linear relationship between each other. Coefficients for different variables. The
matrix depicts the correlation be- tween all the possible pairs of values through the heatmap in the below
figure. The relationship between two variables is usually considered strong when their correlation coefficient
value is larger than 0.7.
LABS
Figure 5: Correlation Matrix for the dataset
Now we can see that the requirements of what type of cars are most needed for cus - tomers
and from the past 10 years there is a rapid growth of Electric vehicles usage in India

Figure 6: Electric Cars sales in India


Figure 7: Correlation matrix plot for loadings
LABS
Scree Plot: is a common method for determining the number of PCs to be retained via graphical
representation. It is a simple line segment plot that shows the eigenvalues for each individual PC. It shows the
eigenvalues on the y-axis and the number of fac- tors on the x-axis. It always displays a downward curve.
Most scree plots look broadly similar in shape, starting high on the left, falling rather quickly, and then
flattening out at some point. This is because the first component usually explains much of the variability, the
next few components explain a moderate amount, and the latter com- ponents only explain a small fraction of
the overall variability. The scree plot criterion looks for the “elbow” in the curve and selects all components
just before the line flat- tens out. The proportion of variance plot: The selected PCs should be able to describe
at least 80% of the variance.
Figure 8: Scree Plot for our Dataset

Extracting Segments
Dendrogram
This technique is specific to the agglomerative hierarchical method of clustering. The agglomerative
hierarchical method of clustering starts by considering each point as a separate cluster and starts joining points
to clusters in a hierarchical fashion based on their distances. To get the optimal number of clusters for
hierarchical clustering, we make use of a dendrogram which is a tree-like chart that shows the sequences of
merges or splits of clusters. If two clusters are merged, the dendrogram will join them in a graph and the
height of the join will be the distance between those clusters. As shown in Figure, we can chose the optimal
number of clusters based on hierarchical structure of the dendrogram. As highlighted by other cluster
validation metrics, four to five clusters can be considered for the agglomerative hierarchical as well.
Figure 9: Dendrogram Plot for our Dataset

Elbow Method
The Elbow method is a popular method for determining the optimal number of clusters. The method is based
on calculating the Within-Cluster-Sum of Squared Errors (WSS) for a different number of clusters (k) and
selecting the k for which change in WSS first starts to diminish. The idea behind the elbow method is that the
explained variation changes rapidly for a small number of clusters and then it slows down leading to an elbow
formation in the curve. The elbow point is the number of clusters we can use for our clustering algorithm.
The KElbowVisualizer function fits the KMeans model for a range of clusters values between 2 to 8. As
shown in Figure, the elbow point is achieved which is highlighted by the function itself. The function also
informs us about how much time was needed to plot models for various numbers of clusters through the green
line.
Figure 10: Evaluating the cluters using Distortion

Figure 11: Evaluating the cluters using silhouette


Figure 12: Evaluating the cluters using calinskiharabasz

Analysis and Approaches used for Segmentation


Clustering
Clustering is one of the most common exploratory data analysis techniques used to get an intuition about the
structure of the data. It can be defined as the task of iden- tifying subgroups in the data such that data points in
the same subgroup (cluster) are very similar while data points in different clusters are very different. In other
words, we try to find homogeneous subgroups within the data such that data points in each cluster are as
similar as possible according to a similarity measure such as euclidean- based distance or correlation-based
distance.
The decision of which similarity measure to use is application-specific. Clustering anal- ysis can be done on
the basis of features where we try to find subgroups of samples based on features or on the basis of samples
where we try to find subgroups of features based on samples.

K-Means Algorithm
K Means algorithm is an iterative algorithm that tries to partition the dataset into pre-defined distinct non-
overlapping subgroups (clusters) where each data point be- longs to only one group. It tries to make the intra-
cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns
data points to a cluster such that the sum of the squared distance between the data points and the cluster’s
centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The less
variation we have within clusters, the more homo- geneous (similar) the data points are within the same
cluster.
The way k means algorithm works is as follows:
• Specify number of clusters K.
• Initialize centroids by first shuffling the dataset and then randomly selecting K data
points for the centroids without replacement.
• Keep iterating until there is no change to the centroids. i.e assignment of data points
to clusters isn’t changing.

The approach k-means follows to solve the problem is expectation maximization The E-step is assigning the
data points to the closest cluster. The M-step is computing the centroid of each cluster. Below is a break down
of how we can solve it mathematically,
The objective function is:
(1)

And M-step is :

w
∂µk ik(xi − µk) = 0
Σm
⇒ µk = Σi=1m wikxi w

The k-means clustering algorithm performs the following tasks:

• Specify number of clusters K


• Initialize centroids by first shuffling the dataset and then randomly selecting K data
points for the centroids without replacement.
• Compute the sum of the squared distance between data points and all centroids.
• Assign each data point to the closest cluster (centroid).
• Compute the centroids for the clusters by taking the average of the all data points that
belong to each cluster.
• Keep iterating until there is no change to the centroids. i.e assignment of data points
to clusters isn’t changing.

According to the Elbow method, here we take K=4 clusters to train KMeans model. The derived clusters are
shown in the following figure
Prediction of Prices most used cars
Linear regression is a machine learning algorithm based on supervised learning. It per- forms a regression
task. Regression models targets prediction value based on indepen- dent variables. It is mostly used for finding
out the relationship between variables and forecasting. Here we use a linear regression model to predict the
prices of different Electric cars in different companies. X contains the independent variables and y is the
dependent Prices that is to be predicted. We train our model with a splitting of data into a 4:6 ratio, i.e. 40% of
the data is used to train the model.
LinearRegression().fit(Xtrain,ytrain) command is used to fit the data set into model. The values of intercept,
coefficient, and cumulative distribution function (CDF) are described in the figure.
After completion of training the model process, we test the remaining 60% of data on the model. The obtained
results are checked using a scatter plot between predicted values and the original test data set for the
dependent variable and acquired similar to a straight line as shown in the figure and the density function is
also normally dis- tributed.
The metrics of the algorithm, Mean absolute error, Mean squared error and mean square root error are
described in the below figure:

Profiling and Describing the Segments


Sorting the Top Speeds and Maximum Range in accordance to the Price with head () we can
view the Pie Chart.
Pie Chart:
Target Segments Analysis:
So from the analysis we can see that the optimum targeted segment should be belonging to the following
categories:

• EVs with 5 sitters dominate the market while EVs with 2 sitters are less in number.
• Top Speed & Range : With a large area of market the cost is dependent on Top
speeds and Maximum range of cars.
• Efficiency : Mostly the segments are with most efficiency.
• Price : From the above analysis, the price range is between 16,00,000 to 1,80,00,000.
• Maharashtra, Gujarat, Tamil Nadu, Karnataka and Andhra Pradesh are among the top
states with the majority of EV 2wheelers while Assam, Himachal Pradesh, Sikkim, J&K
with the least.
• Uttar Pradesh, Assam and Bihar are among the top states with the majority of EV 3-
wheelers while the remaining states don't seem to depend on the same.
• Maharashtra, Delhi, Karnataka, Kerala and Andhra Pradesh are among the top states
with the majority of EV 4-wheelers while the remaining states have less number of EV 4-
wheelers.
• Maharashtra, Gujarat, Karnataka, Kerala, Uttar Pradesh, Rajasthan, and Andhra
Pradesh are among the top states with the majority of EV charging stations sanctioned while
the remaining states have less number of the same.
• Rajasthan, Madhya Pradesh, Maharashtra, Karnataka, Uttar Pradesh are among the
top states with the majority of retail outlets for EV charging while the remaining states have
less number of the same.
• Tesla, Audi, Volkswagen, Nissan, Skoda tops the list of EVs with the maximum
number of models in the Indian automobile market.
• SUV and Hatchback body types form the majority while Station and MPV the
minority.
• Based on the number of seats, Tesla, Mercedes and Nissan have the maximum
number of seats and Smart the minimum.
• Based on accleration, EVs from Renault, Seat and Smart are the top performers while
Tesla, Lucid and Porsche dont make it to the same.
• Based on speed parameter, EVs from Tesla, Lucid and Porsche are the top performers
while Renault, Smart and SEAT do not make it to the same.
• Based on range (Km), Lucid, Lightyear and Tesla have the highest range and Smart
the lowest.

Customizing the Marketing Mix

The marketing mix refers to the set of actions, or tactics, that a company uses to pro- mote its brand or product
in the market. The 4Ps make up a typical marketing mix -Price, Product, Promotion and Place.
• Price: refers to the value that is put for a product. It depends on segment targeted,
ability of the companies to pay, ability of customers to pay supply - demand and a host of
other direct and indirect factors.
• Product: refers to the product actually being sold – In this case, the service. The
product must deliver a minimum level of performance; otherwise even the best work on the
other elements of the marketing mix won’t do any good.
• Place: refers to the point of sale. In every industry, catching the eye of the consumer
and making it easy for her to buy it is the main aim of a good distribution or ’place’ strategy.
Retailers pay a premium for the right location. In fact, the mantra of a successful retail
business is ’location, location, location’.
• Promotion: this refers to all the activities undertaken to make the product or service
known to the user and trade. This can include advertising, word of mouth, press reports,
incentives, commissions and awards to the trade. It can also include consumer schemes,
direct marketing, contests and prizes.

All the elements of the marketing mix influence each other. They make up the business plan for a company
and handle it right, and can give it great success. The marketing mix needs a lot of understanding, market
research and consultation with several people, from users to trade to manufacturing and several others.

Github Links:
Shubradip Ghosh: https://github.com/shubradip/customer-segmentation-

3RD PROJECT:
AI Product Service Prototype Development and
Business/Financial Modelling

-Step 1: Prototype Selection


Team Leads will have to choose one Prototype Idea from the Prototype reports of the all
teammates on following 3 criteria:

a. Feasibility: Product/Service can be developed in short term future. (2-3 years)


b. Viability: Product/Service should be relevant or able to survive in long term future. (20-30
years)
c. Monetization: Product/Service should be monetizable directly. (indirectly monetizable
Product/Service should be dropped for this Project)

-Step 2: Prototype Development


Small scale code implementation/model building of the Prototype.
Project report containing code implementation/mode building can SKIP this part.

.
-Step 3: Business Modelling
Developing a business model for the AI Product/Service

Reference links on Business Models:

https://templates.office.com/en-in/Search/results?query=business+plan

https://www.investopedia.com/terms/b/businessmodel.asp#:~:text=For%20instance%2C%20dir
ect%20sales%2C%20franchising,sporting%20organizations%20like%20the%20NBA.

https://alcorfund.com/insight/18-business-model-example-explained/
-Step 4: Financial Modelling (equation) with Machine Learning & Data Analysis

a. Identify which Market your product/service will be launched into


b. Collect some data /statistics regarding that Market Online.
c. Perform forecasts/predictions on that Market using regression models or time series
forecasting (alternately collect existing Statistics if you are unable to find appropriate data or
perform time series)
(https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-to-time-series-
analysis/
https://www.analyticsvidhya.com/blog/2021/10/machine-learning-for-stock-market-prediction-
with-step-by-step-implementation/)

d. Design Financial Equation corresponding to that Market Trend.

Description:
Suppose a Market is growing linearly, design a linear financial model y=mx(t)+c, where y =
total profit, m=pricing of your product, x(t)=total sales ( market as a function of time)
c=production, maintenance etc costs.
If a Market is growing exponentially, design an exponential market trend. (financial model will
be an exponential equation)

1.0 Problem Statement-


The difficulty of correctly estimating shoe demand in order to manage inventories and boost sales. While
avoiding overstocking, which may result in excess inventory and lower profit margins, retailers must make
sure they have the proper number of shoes available to match client demand.Predicting customer demand for
various shoe kinds, which might change based on elements including seasonality, fashion trends, and
consumer preferences, is difficult. As a result, there may be stockouts of popular shoe types or an
overabundance of less well-liked shoe styles on the marketplace.Retailers could improve their inventory
management after precise sales prediction models have been created, ensuring that they have the appropriate
quantity of shoes on hand to fulfill client demand while reducing the danger of stockouts or surplus inventory.
As a result, the company may see an uptick in sales, more customer satisfaction, and larger profit margins.

2.0 Business Needs Assessment-


Given the unfortunate pandemic that has caused many people to buy goods online rather than in stores,
there has been a significant decline for retail stores and vendors. So, it is vital for them to manage their
inventory appropriately and maximize their selling strategies by investing in more in-demand items and
offering promotions on products that are typically grouped together in order to increase their earnings to
avoid such things in the future. Also, the pandemic has significantly changed the tastes of the consumer.
With the use of this method, we intend to give small enterprises useful data insights and income-
generating possibilities.

3.0 Target Specification-


Here are some target specifications for the footwear market in India:

1. Demographics: All age groups, genders, and economic levels are represented in the target
population for the Indian footwear industry. The youth demographic, which includes college
students and young professionals, is the greatest consumer category, nonetheless.
2. Style and design: Indian shoppers are seeking fashionable, comfortable shoes that complement
their personal style and way of life. The market is flooded with popular designs, hues, and patterns
that draw inspiration from conventional Indian culture.
3. Comfort and quality: Indian consumers are increasingly searching for supportive, comfy footwear.
Consumers think carefully about quality and durability when choosing footwear.
4. Price range: The Indian footwear industry has both high-end and entry-level products. Although
customers are ready to pay more for luxury brands, there is still a sizable market for reasonably
priced footwear that offers good value.
5. Distribution channels: While offline retail establishments still hold the lion's share of the Indian
footwear market, online retail platforms are quickly gaining ground. In India, footwear businesses
are rapidly using e-commerce websites, social media networks, and mobile applications as
distribution methods.
4.0 External Search-
I took help from few websites for for analyzing the need of this system and also to find how this system is
used across the globe currently.Few of the websites are mentioned below:

1. Shoe Sales Prediction


2. Advantages of sales prediction
3. India Footwear Market

5.0 Benchmarking -
Several online retailers, like Adidas, Nike, and Fila, employ similar strategies to boost sales while also giving
customers a pleasant shopping experience. Benchmarking often entails comparing project procedures and
performance indicators to either successful recently completed projects or industry best standards and
practices. For this, it is necessary to continually look for ways to incorporate better methods that provide
better outcomes.

6.0 Applicable Patents-


1. Indian Patent No. 301150 - Method for generating customer profiles and providing personalized
recommendations for online shopping.
2. Indian Patent No. 343764 - System and method for predicting product demand and optimizing
inventory management.
3. Indian Patent No. 342021 - System and method for predicting customer behavior and providing
targeted marketing recommendations.
4. Indian Patent No. 335441 - System and method for predicting sales trends and optimizing pricing
strategies.
5. Indian Patent No. 345236 - System and method for analyzing customer feedback and improving
product recommendations.

7.0 Applicable Regulations-


Here are few regulations related to the sales forecasting model:

1. Data collection and Privacy of Regulations of Customers.


2. Employment Schemes and laws created by government
3. Quality Control Orders (QCO) for leather and non-leather footwear
4. Data privacy laws: The use of customer data to develop a sales forecasting model must
comply with data privacy laws and regulations, such as the Personal Data Protection Bill
(PDPB) and the General Data Protection Regulation (GDPR) in the European Union.
5. Intellectual property laws: The sales forecasting model may incorporate proprietary
6. Intellectual property laws: The sales forecasting model may incorporate proprietary
algorithms, software, or other intellectual property that is subject to copyright or patent
laws. The use of such intellectual property must comply with relevant laws and
regulations, and may require licensing agreements or other legal arrangements.
7. Antitrust laws: The use of a sales forecasting model must comply with antitrust laws and
regulations that prohibit anti-competitive practices, such as pricefixing or market
allocation agreements.
8. Consumer protection laws: The use of a sales forecasting model must comply with
consumer protection laws and regulations that prohibit false or misleading advertising or
other deceptive practices.
9. Factory License is a must-have requirement for production-based entities running a
footwear business under The Factory Act, 1948

8.0 Applicable Constraints-


There are few challenges which needs to be faced before developing the actual solution, few of them are as
below:

1. Lack of initial data to perform algorithms.


2. Convincing all vendors to use this technique of selling over traditional means.
3. Lack of technical knowledge of vendors.

9.0 Concept Generation and Development-


Following are some steps:-
1. Understand the problem: The first step is to understand the problem and what data you have
available. What type of shoes are being sold? How often are they sold? What factors affect sales,
such as seasonality or promotions? What data do you have available to work with?
2. Identify variables: Once you understand the problem and data available, identify the variables that
affect shoe sales. These may include time, seasonality, promotions, price, and inventory levels.
3. Gather data: Collect historical sales data for the shoes in question, as well as any other relevant
data, such as promotions or changes in pricing. You may also need to gather external data such as
weather or economic indicators if they have a significant impact on shoe sales.
4. Clean and prepare data: The data you collect may be incomplete or contain errors. Clean and
prepare the data so it can be used for analysis. This may include removing outliers, imputing
missing data, and normalizing the data.
5. Choose a forecasting model: There are several types of forecasting models that can be used for
time series data, including ARIMA, SARIMA, Prophet, and LSTM. Choose a model that is
appropriate for your data and problem.
6. Train the model: Once you have chosen a model, train it using historical data. This will enable the
model to learn the patterns and relationships in the data.
7. Evaluate the model: After training the model, evaluate its performance using metrics such as mean
squared error or mean absolute error. If the model is not performing well, adjust the parameters or
try a different model.
8. Make predictions: Once the model is trained and validated, use it to make predictions for future
shoe sales. These predictions can be used for inventory planning, pricing decisions, and marketing
campaigns.
9. Monitor performance: As new data becomes available, continue to monitor the model's
performance and update it as needed to ensure it remains accurate.
10.0 Final Product Prototype- Rehan
Roy:
https://github.com/Rehan20/Feynn-Labs-Internship/tree/main/Project%203

Shubradip Ghosh::
https://github.com/shubradip/customer-segmentation-
/blob/main/Time%20Series%20Forecasting%20Prediction%20for%20Shoe-Sales.py

Amit Pathak:
https://github.com/sap1996/final-project

A. Feasibility-

Within a few months, this project can be developed and made available to the public as SaaS.
B. Viability-

There will always be small firms that may utilise this service to improve their sales and data warehousing
skills as the shoe industry expands in India and throughout the world. So, it will be possible to live in the
long run, although advancements will be required when new technologies are developed.

C. Monetization-
This service can be immediately published as a service that businesses may utilise, making it directly
monetizable.

11.0 Business Model-


It is ideal to use a Subscription-based business model for this service, wherein certain initial services will
be provided free of charge to help customers stick around and expand the customer base. According to the
user demands and user kinds, they will subsequently be charged a membership fee in order to keep using
the service for their business, which will also help increase user conversion rates by offering customisation
options appropriate for each group.

1. Basic Plan: Access to fundamental features including tools for inventory management and demand
forecasting as well as historical sales data analysis may be included in this plan. It could have a
cheap monthly cost, making it appropriate for small enterprises.
2. Pro Plan: This plan may come with more sophisticated capabilities including real-time sales data
analysis, tailored customer suggestions, and targeted marketing tools. It can have a greater monthly
cost, making it appropriate for bigger companies or those seeking more sophisticated services.
3. Enterprise Plan: Designed for large companies with specific requirements, this plan may be altered
to add features like specialised support, API access, and bespoke machine learning models. Based
on the particular needs of the business, the price for this plan may be adjusted.

12.0 Conclusion-
In this study, a variety of forecasting techniques are utilised to estimate how much of the product will be sold
in the future. The prediction error will be used to choose the forecasting technique. The more precise the
forecasting process, the lower the forecast error. This study's particular goal was to determine the most
accurate quantitative forecasting technique for the shoe industry based on practical usability and accuracy.
This study found that because of factors like weather and particular festivals, many real-life forecasting
situations were more complicated and challenging. For production, facility monitoring, seasonal
employment, short-term and long-term planning, these predictions can offer useful information.

Monthly Internship Report:

JANUARY

ACTIVITIES:

Programmatic Activities/Orientation/Induction:
Daily lecture by industry expert on Skillvertex Datascience program. Learned various combination of
multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract
knowledge and insights from it.
First month Focusing on:
o data gathering, analysis and decision-making.
o finding patterns in data, through analysis, and make future predictions.
using Data Science,to help the companies by:
o Better decisions (should we choose A or B)
o Predictive analysis (what will happen next?)
o Pattern discoveries (find pattern, or maybe hidden information in the data)
o Documents/ReportsSummitted:
o Decision Tree and Random Forest
o Classification with Logistic Regression and Naive Bayes o Naive bayes doc o KNN o Regression
Using Module o Polynomial Regression from scratch o linear regression o preprocessing and EDA o
visualisation basics o Pandas basics o Numpy o Math-and-Random-Module o Math-and-Random-Module
o modules and packages o Functions and something extra o for and while loops, Useful Operators o
statements: IF ELSE o Comparison operator o List, Tuple, dictionaries and sets o Numbers, variables and
Strings o Minor project Loan mount Prediction
Future projects assigned:
A project on predicting Life expectancy and form a report

FEBRUARY

ACTIVITIES:
Programmatic Activities/Orientation/Induction:
Daily lecture by industry expert on Skillvertex Datascience program. Learned various combination of
multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract
knowledge and insights from it to make a final dashboard.
2 nd Project: •
Predicting Life expectancy
Business Problem
Life Expectancy is affected by various factors. WHO wishes to predict life expectancy and determine
which factors has significant impact. From this project, WHO would be able to give a country its life
expectancy and suggestions on which factor to focus on to improve their life expectancy. Based on a
publicly available WHO dataset, this research analyses the variables that affect life expectancy. Data was
gathered during the years of 2000 and 2015. A country's development status (developed versus
developing), GDP, population, schooling years, alcohol use, BMI, government health spending, health
spending per unit of GDP, various immunisation coverage, thinness disease, measles cases, HIV/AIDS
deaths, and the mortality rate of adults, children, and infants were among the factors that were examined.
Data were carefully examined (horizontally and vertically), cleansed, and changed during the processing
stage. Using the Bagged-trees technique, missing values were imputed. Box and whisker plots,
histograms, and multiple factor analyses (MFA) were used in exploratory data analysis (EDA) to explore
and mine the trends within the data. MFA is a method of unsupervised machine learning.

The results of a multiple linear regression model that passed the assumption tests indicated that education
(Coeff. Est: 1.15), total government health spending (Coeff. Est: 0.08), BMI (0.03), GDP (Coeff. Est:
0.00004), and diphtheria and polio vaccinations (Coeff. Est: 0.03) and polio vaccinations (Coeff. Est:
0.02) vaccinations are positively correlated significant variables (p Similar findings are supported by a
partial study of the longitudinal multilevel modeling's entire model, which also found that if a country had
a lower beginning life expectancy in the year 2000. Between 2000 and 2015, it would have a faster rate of
life expectancy improvement (intercept-slope corr = -0.55). Between 2000 to 2015, it suggests an
improvement in life expectancy in developing nations that were linked to lower life expectancy. The
whole model also comes to the conclusion that the average human life expectancy rises year by 0.25 years,
with a degree of confidence ranging from 0.16 years to 0.34 years (p 0.001).

Documents/ReportsSummitted:
• A report on life expectancy model . • Python Notebook coding file. • Dashboard

Feynn Labs internship report for the month of February:


ACTIVITIES:
A simple study task on market segmentation 10 steps of market segmentation.
1 Market Segmentation
1.1 Strategic and Tactical Marketing
1.2 Definitions of Market Segmentation
1.3 The Benefits of Market Segmentation
1.4 The Costs of Market Segmentation
2 Market Segmentation Analysis
2.1 The Layers of Market Segmentation Analysis
2.2 Approaches to Market Segmentation Analysis
2.2.1 Based on Organisational Constraints
2.2.2 Based on the Choice of (the) Segmentation Variable(s)
2.3 Data Structure and Data-Driven Market Segmentation Approaches
2.4 Market Segmentation Analysis
Step-by-Step Part II Ten Steps of Market Segmentation Analysis
3 Step 1: Deciding (not) to Segment
3.1 Implications of Committing to Market Segmentation
3.2 Implementation Barriers
3.3 Step 1 Checklist
4 Step 2: Specifying the Ideal Target Segment
4.1 Segment Evaluation Criteria
4.2 Knock-Out Criteria
4.3 Attractiveness Criteria
4.4 Implementing a Structured Process
4.5 Step 2
5 Step 3: Collecting Data
5.1 Segmentation Variables
5.2 Segmentation Criteria
5.2.1 Geographic Segmentation
5.2.2 Socio-Demographic Segmentation
5.2.3 Psychographic Segmentation
5.2.4 Behavioural Segmentation
5.3 Data from Survey Studies
5.3.1 Choice of Variables
5.3.2 Response Options
5.3.3 Response Styles
5.3.4 Sample Size
5.4 Data from Internal Sources
5.5 Data from Experimental Studies
6 Step 4: Exploring Data
6.1 A First Glimpse at the Data
6.2 Data Cleaning
6.3 Descriptive Analysis
6.4 Pre-Processing 6.4.1 Categorical Variables
6.4.2 Numeric Variables
6.5 Principal Components Analysis
7 Step 5: Extracting Segments
7.1 Grouping Consumers
7.2 Distance-Based Methods
7.2.1 Distance Measures
7.2.2 Hierarchical Methods
7.2.3 Partitioning Methods
7.2.4 Hybrid Approaches
7.3 Model-Based Methods
7.3.1 Finite Mixtures of Distributions
7.3.2 Finite Mixtures of Regressions
7.3.3 Extensions and Variations
7.4 Algorithms with Integrated Variable Selection
7.4.1 Biclustering Algorithms
7.4.2 Variable Selection Procedure for Clustering Binary Data (VSBD)
7.4.3 Variable Reduction: Factor-Cluster Analysis
7.5 Data Structure Analysis
7.5.1 Cluster Indices
7.5.2 Gorge Plots
7.5.3 Global Stability Analysis
7.5.4 Segment Level Stability Analysis
8 Step 6: Profiling Segments
8.1 Identifying Key Characteristics of Market Segments
8.2 Traditional Approaches to Profiling Market Segments
8.3 Segment Profiling with Visualisations
8.3.1 Identifying Defining Characteristics of Market Segments
8.3.2 Assessing Segment Separation
9 Step 7: Describing Segments
9.1 Developing a Complete Picture of Market Segments
9.2 Using Visualisations to Describe Market Segments
9.2.1 Nominal and Ordinal Descriptor Variables 9.2.2 Metric Descriptor Variables
9.3 Testing for Segment Differences in Descriptor Variables
9.4 Predicting Segments from Descriptor Variables
9.4.1 Binary Logistic Regression
9.4.2 Multinomial Logistic Regression
9.4.3 Tree-Based Methods
10 Step 8: Selecting (the) Target Segment(s)
10.1 The Targeting Decision
10.2 Market Segment Evaluation
11 Step 9: Customising the Marketing Mix
11.1 Implications for Marketing Mix Decisions.
11.2 Product
11.3 Price
11.4 Place
11.5 Promotion Documents Submitted: 1. Summary of fundamentals of Market Segmentation (what the
team has covered during study in the 10 days period) - Group Task 2. (Practice purpose) Replication of
McDonalds Case Study in Python (whichever part of the code is possible) - Individual Task (even for team
members) - with github link.

MARCH-APRIL
1. Problem Statement ( EV Market)

You are a team working under an Electric Vehicle Startup. The Startup is still deciding in which
vehicle/customer space it will be develop its EVs.

You have to analyse the Electric Vehicle market in India using Segmentation analysis and come up with a
feasible strategy to enter the market, targeting the segments most likely to use Electric vehicles.

(CUSTOMER/VEHICLE/B2B) SEGMENTS: Apart from Geographic, Demographic, Psychographic,


Behavioral segments, teams can consider different CATEGORY of Segments for the Segmentation Tasks,
based on AVAILABILITY OF DATA. Market Segmentation comes with wide scope of possibility and
Segments created can change based on different datasets collected.

DO NOTE that not every MARKET has Geographic, Demographic, Psychographic, Behavioral data
available easily and there is going to be lot of research required in DATA Collection Tasks.

Reports submitted:

-Data Analysis report and elaboration on how our team arrived at chosen geographic, demographic,
psycho graphic, behavioral factors, or other segments Interns came up with after analysis and research.

0. Fermi Estimation (Breakdown of Problem Statement)


1. Data Sources (Data Collection - all Team Members should perform Data Collection Task.)
2. Data Pre-processing (Steps and Libraries used)
3. Segment Extraction (ML techniques used)
4. Profiling and describing potential segments
5. Selection of target segment
6. Customizing the Marketing Mix
7. (for Business Markets)Potential customer base in the early market, thereby calculating the potential sale
(profit) in the early market (Potential Customer Base * Your Target Price Range = Potential Profit).
8. The MOST OPTIMAL MARKET SEGMENTS to open in the market as per your Market Research and
Segmentation
9. Link to github profile with codes and datasets well documented.

MAY-JUNE

ACTIVITIES:
Programmatic Activities/Orientation/Induction:
Small scale B2B problem solving ,communication medium Discord, Feynn
Labs,ML Internship program 2023(batch jan-2023). Learned various
combination of multiple disciplines that uses statistics, data analysis, and
machine learning to analyze data and to extract knowledge and insights from it
to make a final dashboard.

4 th Project:
Business Problem
Problem Statement (Online Vehicle Booking Market)
You are a team working under an Online Vehicle Booking Product Startup. Due
to heavy competition in Cab booking from Ola and Uber in India, the startup is
looking for an alternate segment which can generate them early foot in the
market and revenue.
You have to analyse the Vehicle market in India using Segmentation analysis
and come up with a feasible strategy to enter the market, targeting the segments
where there can be possible profit by offering Vehicle booking service.
(CUSTOMER/VEHICLE/B2B) SEGMENTS: Apart from Geographic,
Demographic, Psychographic, Behavioral segments, teams can consider
different CATEGORY of Segments for the Segmentation Tasks, based on
AVAILABILITY OF DATA. Market Segmentation comes with wide scope of
possibility and Segments created can change based on different datasets
collected.

DO NOTE that not every MARKET has Geographic, Demographic,


Psychographic, Behavioral data available easily and there is going to be lot of
research required in DATA Collection Tasks.

STRATEGY
- Analysis of which location in India is most suitable to create the early market
in accordance with Innovation Adoption Life Cycle. (Read more @
https://en.wikipedia.org/wiki/Technology_adoption_life_cycle)
- Which demographic, psychographic, behavioural or other factors your team
will target based on Data Analysis of available datasets. In the event of
unavailability of proper datasets, how your team will base your decisions as
accurate and unbiased as possible.
-Strategic pricing range of products with understanding of early market
psychographics.
DATA COLLECTION KEYWORDS: General Vehicle Type Data, Vehicle
Industry Data, Online Cab booking statistics, etc.

TEAM REPORT
-Data Analysis report and elaboration on how your team arrived at chosen
geographic, demographic, psycho graphic, behavioral factors, or other segments
Interns came up with after analysis and research.
- Team Members Name (only those who contributed)
0. Fermi Estimation (Breakdown of Problem Statement)
1. Data Sources (Data Collection - all Team Members should perform Data
Collection Task.)
2. Data Pre-processing (Steps and Libraries used)
3. Segment Extraction (ML techniques used)
4. Profiling and describing potential segments
5. Selection of target segment
6. Customizing the Marketing Mix
7. (for Business Markets)Potential customer base in the early market, thereby
calculating the potential sale (profit) in the early market (Potential Customer
Base * Your Target Price Range = Potential Profit).
8. The MOST OPTIMAL MARKET SEGMENTS to open in the market as per
your Market Research and Segmentation

9. Link to github profile with codes and datasets well documented.


2. Challenges
2.1. Communication and Collaboration
One of the key challenges faced in virtual internships is effective communication and collaboration.
Without physical proximity, interns often struggle to establish clear lines of communication with their
supervisors and team members. Virtual communication platforms may not fully replicate the benefits of
face-to-face interactions, leading to misunderstandings, delays, and reduced productivity.
2.2. Limited Hands-on Experience
Machine learning projects require extensive hands-on experience with data analysis, model development,
and testing. However, virtual interns may face limitations in accessing the necessary datasets, computing
resources, and real-world applications. These constraints hinder their ability to gain practical experience
and apply theoretical knowledge effectively.
2.3. Time Management and Work-Life Balance
Virtual internships often blur the boundaries between personal and professional life, making it challenging
for interns to manage their time effectively. With flexible work hours and potential distractions at home,
interns may struggle to maintain a healthy work-life balance, impacting their productivity and overall
satisfaction with the internship experience.
3. Achievements
Despite the aforementioned challenges, virtual internships on machine learning projects have also yielded
significant achievements. The following accomplishments highlight the positive outcomes that can be
attained:
3.1. Enhanced Remote Collaboration Skills
Navigating the communication barriers in virtual environments requires interns to develop strong remote
collaboration skills. Through the use of video conferences, messaging platforms, and collaborative tools,
interns have learned to adapt to remote work dynamics, improving their ability to collaborate effectively
with team members located in different time zones.
3.2. Independent Problem-Solving Abilities
Virtual internships often require interns to work independently and proactively seek solutions to
challenges. This fosters the development of critical thinking and problem-solving skills, as interns are
encouraged to explore resources, seek guidance from online communities, and find innovative approaches
to overcome hurdles encountered during their projects.
3.3. Exposure to Diverse Machine Learning Projects
Virtual internships provide interns with the opportunity to work on a wide range of machine learning
projects, often with different industries and organizations. This exposure enables interns to gain practical
insights into various domains, develop domain-specific knowledge, and broaden their understanding of
machine learning applications across different sectors.
4. Recommendations
Based on the challenges identified, the following recommendations are proposed to improve the virtual
internship experience on machine learning projects:
4.1. Clear Communication Channels: Establish dedicated communication channels that facilitate
frequent and effective interactions between interns, supervisors, and team members. Regular video
conferences, team meetings, and timely feedback can help mitigate the communication challenges
associated with virtual internships.
4.2. Provision of Resources: Ensure that interns have access to the necessary datasets, computing
resources, and tools required to carry out their projects effectively. This may involve providing remote
access to computing infrastructure, curated datasets, and virtual environments for experimentation.
4.3. Structured Guidance and Mentorship: Designate mentors or supervisors who can provide
structured guidance, support, and regular check-ins with interns. Mentorship programs can help address
the limited hands-on experience by providing interns with personalized guidance and direction throughout
their projects.
4.4. Training and Development Opportunities: Offer training programs and workshops focused on
enhancing remote collaboration skills, time management, and work-life balance. These programs can
equip interns with the necessary skills to excel in virtual work environments.
5. Conclusion
Virtual internships on machine learning projects present both challenges and achievements. While
communication barriers, limited hands-on experience, and time management issues pose significant
hurdles, interns have managed to develop remote collaboration skills, independent problem-solving
abilities, and gain exposure to diverse projects. By implementing the recommended measures,
organizations can enhance the virtual internship experience and ensure interns' growth and success in
machine learning domains.
ANNEXURE-A:
Time utilization of schedule:
S.No Name of the Date(s) of % of time People met.
. Department Visit spent (with designation)
1. DataScience and 05.01.2023 80% Mr. Mayank Gathole
Machine Learning to Academic Head,SkillVertex
05.03.2023
2. AI/ML 27-01-2023 80% Mr. Sanjay Basumatary,
to Founder of Feynn Labs.
27-06-2023
(project
extended up
to the
mentioned
date)
Part-B: Internship Completion Certific
30 | P a g e

You might also like