Shubradip Ghosh
Shubradip Ghosh
2023
Final Year Internship
In
SkillVertex,Bangalore & FeynnLabs,Guwahati.
Submitted by:
Shubradip Ghosh
Submitted to:
I, Shubradip Ghosh, hereby declare that the presented report of internship titled A
Report on Final Year Internship in SkillVertex & FeynnLabs is uniquely prepared by
me after the completion of total five months’ virtual internship from both the
organization,
I also confirm that this report is only prepared for my academic requirement for the
partial fulfilment of the required degree of MSc. In Biostatistics & Epidemiology and
has not been/will not be submitted elsewhere for any other purposes.
………………………………………………………………
Shubradip Ghosh,
RA2122021010009,
MSc. In Biostatistics & Epidemiology,
School of Public Health,
Signature
SRM Institute of Science & Technology.
I certify that the above declaration is true to the best of my knowledge and belief.
Organization Guide:
School of public health,
SRMIST,
Mentor:
Sanjay Basumatary,Founder @ Feynn Labs|work4ai.com
Mayank Gathole,Academic Head @ SkillVertex
Dr. Prakash M.
DEAN
Dr.Hari Singh,
School of Public Health,
SRMIST.
ACKNOWLEDGEMENT
I would first want to express my sincere gratitude to my mentor, Dr. Prakash M., an associate
professor, for his constant and invaluable advice and ideas throughout my internship in all areas. I
want to thank my campus dean, Dr. Hari Singh as well as the entire professors and staff of the
School of Public Health at SRMIST for their constant support.
I would like to extend my sincere gratitude to Mayank Gathole, Academic Head at SkillVertex, for
providing me with this opportunity.
It is with great pleasure that I extend my heartfelt thanks to Sanjay Basumatary, founder of Feynn
Labs|work4AI.com, who, despite his busy schedule, took the time to listen to me, give me advice,
and keep me on the right track.
I received a fantastic opportunity for learning and professional growth with the internships
I had with both organisations. I therefore feel myself to be a lucky person since I was
given the chance to join them. I am also appreciative of the opportunity to get to know so
many lovely people and professionals who guided me during my internship.
My father and friends' unwavering love, inspiration, and support have allowed me to finish
my obligations on time, and I am incredibly appreciative of them.
I want to thank everyone who supported me during my internship and send my best
wishes..
TABLE OF CONTENTS
2 Acknowledgement 02
4 Summary 06
5 Introduction 07-08
7 Description of department 10
References 15
11 learnings 31
12 challenges 32
13 Achievements 33
14 Conclusion 34
16 Annexure -A 35
This internship report describes the major projects I completed for academic and
nonacademic purposes. The purpose of this document is to identify and describe the data
collected, projects completed, and experience gained, with a focus on the intern's
accomplishments.
Finally, the internship has been a fantastic learning opportunity. My practical knowledge,
AI/ML work, communication skills,learning different software & statistical tools and data
analysis skills all are improved. I learned about my strengths, which will prove beneficial
in the future, as well as my weaknesses, which I can work on to improve.
I attained new skills and abilities, as well as the chance to improve my practical knowledge
of concepts learned in the School of Public Health. Many of my learning goals were
achieved. The work I did during my internship gave me new perspectives and inspired me
to pursue a career in statistical field. It taught me important lessons that will undoubtedly
aid me in my future employment, as well as how to prepare for my career path and how to
manage my time.
1.1INTRODUCTION:
I was hired by SkillVertex as an intern. I began my internship on January 5th, 2023 and
completed on March 5th , 2023.Further I was hired by Feynn Labs as an intern on 27th Jan
2023 and will complete my internship by the end of June,2023. My internship supervisor
was Mayank Gathole,Academic Head @ SkillVertex. When I joined the internship, I was
assigned a health expectancy project. I worked in python programming language and
health expectancy project for two months (January 5th to Mar 5th ) and I continued my
internship from another organization ,(FeynnLabs) there again I have been assigned with
several market segmentation projects for rest four months (Feb 27th to June 27th ).Both the
internships were virtual/remote basis. In market segmentation Project I worked under the
supervision of Sanjay Basumatary, Founder @ Feynn Labs | work4ai.com. I consider
myself lucky to have Sanjay Basumatary and Mayank Gathole as my supervisors, since
they would always find time for my questions and be willing to help.
I was provided with the opportunity to work on four different projects. Initially, I worked
on project called health expectancy project, which contains python programming
language,AI/ML tools, datascience, advance statistics, data cleaning, organizing, and
collecting data, and submitting a final report to skillvertex portal; and in Feynn Labs AI
Product/Service Prototyping, Market Segmentation using Machine Learning and Data
Analysis(EV Market & Online Cab booking), AI Product / Service Business and Business
and Financial Modelling,.
In those five months, I gained sufficient knowledge and experience. I can't
emphasize enough how pleasant all of my coworkers are, and how they all
contributed to the positive atmosphere at work and made me feel welcome from
the first day of my internship from both the organizations.
OBJECTIVES OF THE INTERNSHIP
1. Improve my written and oral communication skills especially when working
collaboratively on a team.
2. Gain confidence in myself and become more flexible when addressing
and/unfamiliar tasks.
3. To understand the functioning and working conditions of an organization
4. To gain practical knowledge in applying data analysis skills, techniques and theory
through working with professionals.
5. To see if this kind of work is a possibility for my future career.
6. To see what skills and knowledge I still need to work in a professional environment.
7. To Acquire ability to work as part of a team and respond effectively to the ideas of
experts.
8. To gain good knowledge in Research studies, Project management skills and to
build network.
9. To practice problem based learning in an authentic supervised environment.
10. To solve problems in an effective/ creative manner in a challenging position.
SkillVertex
SkillVertex is an edtech organization that aims to provide upskilling and training to students as
well as working professionals by delivering a diverse range of programs in accordance with their
needs and future aspirations. With respect to the emerging industrial requirements and
technologies, also assist in career development, additional counselling guidance and mentorship
in the respective domains.
Industrial simulation
You learn from industrial tutors and undergo practical learning/hands-on experience for a better
take on corporate beginnings
Advance programs are specifically curated to help people grow in their career, or switch fields of
work and become industry-ready. Courses designed with an expert-insight of providing
advanced knowledge, upskilling and real-time experience to college or university students and
entry level professionals. These programs are tutored by experts from the industry who have
hands-on expertise in their domain of work. Undergo intense training from scratch to advance in
your specified domain. Get to work on real-time projects. Attain hands-on experience and
practical knowledge in technical and managerial domains. Get assistance on additional career
growth support including placement assistance, resume building, personality development and
mock interviews.
Business Problem
Life Expectancy is affected by various factors. WHO wishes to predict life expectancy and determine
which factors has significant impact. From this project, WHO would be able to give a country its life
expectancy and suggestions on which factor to focus on to improve their life expectancy. Based on a
publicly available WHO dataset, this research analyses the variables that affect life expectancy. Data was
gathered during the years of 2000 and 2015. A country’s development status (developed versus
developing), GDP, population, schooling years, alcohol use, BMI, government health spending, health
spending per unit of GDP, various 16mmunization coverage, thinness disease, measles cases, HIV/AIDS
deaths, and the mortality rate of adults, children, and infants were among the factors that were examined.
Data were carefully examined (horizontally and vertically), cleansed, and changed during the processing
stage. Using the Bagged-trees technique, missing values were imputed. Box and whisker plots,
histograms, and multiple factor analyses (MFA) were used in exploratory data analysis (EDA) to explore
and mine the trends within the data. MFA is a method of unsupervised machine learning.
Steps:
1. Problem formulation: Clearly define the problem you want to solve. In this case, the goal
is to predict life health expectancy based on certain factors or variables.
2. Data collection: Gather relevant data that will help you build your predictive model. This
data may include factors like age, gender, lifestyle habits, socioeconomic status,
healthcare access, and any other variables you believe could influence life health
expectancy.
Dataset has been provided with us by the organization.
3. Data preprocessing: Clean the collected data by handling missing values, outliers, and
inconsistencies. Convert categorical variables into numerical representations, normalize
or scale numerical features, and perform any other necessary data transformations.
4. Exploratory data analysis (EDA): Perform EDA to understand the patterns, relationships,
and distributions within the data. Visualize the data using graphs, histograms, scatter
plots, etc., and calculate summary statistics. Identify any interesting insights or
correlations between variables that may help in predicting life health expectancy.
5. Feature selection/engineering: Select the most relevant features that have a significant
impact on life health expectancy. Use statistical techniques, correlation analysis, or
domain knowledge to determine which features to include in your model. Additionally,
create new features by combining or transforming existing ones if it helps improve
predictive performance.
6. Model selection: Choose an appropriate machine learning algorithm to build your
predictive model. Some common algorithms for regression tasks include linear
regression, decision trees, random forests, support vector machines (SVM), or neural
networks. Consider the nature of your data and the complexity of the problem when
selecting the model.
7. Model training: Split your data into training and testing sets. Use the training set to train
your chosen model on the selected features. The model will learn the underlying patterns
and relationships between the features and the target variable (life health expectancy).
8. Model evaluation: Evaluate the performance of your model using appropriate metrics
such as mean squared error (MSE), root mean squared error (RMSE), mean absolute
error (MAE), or R-squared. Compare the predicted values with the actual values from the
testing set. Adjust your model or try different algorithms if the performance is not
satisfactory.
9. Model optimization: Fine-tune your model to improve its performance. Adjust
hyperparameters (e.g., learning rate, regularization parameters) using techniques like grid
search or random search. Cross-validation can help assess the stability and generalization
of your model.
10. Model deployment: Once you are satisfied with your model's performance, deploy it in a
production environment. Make sure to package the model with any necessary
preprocessing steps and provide clear instructions on how to use it to predict life health
expectancy.
11. Monitoring and maintenance: Continuously monitor the performance of your deployed
model. Collect new data over time to keep your model up-to-date and retrain it
periodically to maintain its accuracy. Monitor for concept drift or changes in the data
distribution that may require model updates.
Results:
The results of a multiple linear regression model that passed the assumption tests
indicated that education (Coeff. Est: 1.15), total government health spending (Coeff. Est:
0.08), BMI (0.03), GDP (Coeff. Est: 0.00004), and diphtheria and polio vaccinations
(Coeff. Est: 0.03) and polio vaccinations (Coeff. Est: 0.02) vaccinations are positively
correlated significant variables (p Similar findings are supported by a partial study of the
longitudinal multilevel modeling's entire model, which also found that if a country had a
lower beginning life expectancy in the year 2000. Between 2000 and 2015, it would have
a faster rate of life expectancy improvement (intercept-slope corr = -0.55). Between 2000
to 2015, it suggests an improvement in life expectancy in developing nations that were
linked to lower life expectancy. The whole model also comes to the conclusion that the
average human life expectancy rises year by 0.25 years, with a degree of confidence
ranging from 0.16 years to 0.34 years (p 0.001).
Application:
We can apply this predicting model in various field to predict the life expectancy of humans with
minimal human labor. This can also be used by the insurance companies to have a predication
about its customers as well as other companies working on products affecting human life or to
maintain the work cycle running with the expected life span of its workers. This can be used by
doctors to understand the biological difference and cure measure for the affected patients. This
can also be used by the government to take decisions for human welfare. And to take appropriate
measures to control the population growth and other factors which negatively effects life span of
the people of the country. It also direct the utilization of the increase in human resources and
skillset acquired by people over many years. This could help make common people more aware
of their general health, and its improvement or deterioration over time. This may motivate them
to make healthier lifestyle choices. . And Machine learning is a promising field and with new
researches publishing every day. By this study, open the scope of ecologist, medical scientists
which allows common people to not to depend on expert and know their biological status and
condition of human being in desired region.
Future Scope:
This application has very bright future and large number of applicable cases, it must need to be
upgraded and taken into account in school as well as college syllabus for the welfare and
knowledge purpose. This provides insights in various factors and their levels required to keep the
life expectancy rate as high as expected. It can be used to suggest good health practices and life
style to the users based on their daily activities and provide suggestions for exercises for
improving their health. Pharmaceutical companies can check which diseases impact more people
and therefore impact life expectancy and based on this manufacture medicine. We can also say it
is a time machine which predict the life of someone who haven’t born yet on the factors of
his/her country’s Adult Mortality, Population, Under 5 Deaths, Thinness 1-5 Years, Alcohol,
HIV, Hepatitis B, GDP, Percentage Expenditure, and others. As the technology is growing faster
than ever, as the world is leaning towards more man power need it necessary to improve the
factor by which it can extend the life expectancy of its people.
References:
o Free and Open Machine Learning Release 1.0.1 by Maikel Mardjan Online Resources:
https://developer.ibm.com/tutorials/how-to-create-a-node-redstarterapplication/
o https://github.com/watson-developer-cloud/nodered-labs About API
o https://www.youtube.com/watch?v=s7wmiS2mSXY&feature =youtu.be Introduction to
Machine Learning
o https://developer.ibm.com/technologies/machinelearning/series/learningpath-machine-
learning-fordevelopers/ Data Collection
o https://www.kaggle.com/kumarajarshi/life-expectancy-who APPENDIX Source Code:
https://github.com/SmartPracticeschool/llSPS-INT-1519-Predicting-
o LifeExpectancyusing-MachineLearning/blob/master/Predicting%20Life%20Expectan
Feynn Labs
Feynn Labs Services is the B2B unit of Feynn Labs where we provide Artificial
Intelligence integrated services like Market Analysis and Segmentation, Smart
Regions Program, AI Development & Integration Services etc. to Businesses
using Data Analysis and Machine Learning.
Under their Smart Regions initiative,they aim to bring innovations in Deep Tech
such as loT, Machine Learning, Data Analysis, Blockchain etc. to a region and
meet challenges in the fields of housing, traffic, mobility, social and
environmental issues, employment, education etc.
Our primary mission is to build premier chain of institutes in India where students
will experiment with and apply what they learn hand in hand.
This vision is implemented with our “Project based Top-Down Learning”
approach, focusing on some of the frontier technologies like Artificial Intelligence
(AI), IoT, Augmented Reality(AR), Block chain, Quantum Computing etc
Clients: Services are aimed at Small Businesses and Startups who start with
limited budget and cant afford to perform full scale market analysis and
segmentation for their products and services.
Framework
It combines modern Machine Learning and Data Analysis tools with Expert Human
supervision to provide the best services to the clients. Their framework is structured uniquely
to bring forth the best segmentation results and hence boosting sales of our clients’ products.
Milestone:
• 1000+ students trained
• 700+AI Product/Service Prototypes Formulated
• 15+ clients served
Projects involved during internship:
I. Case study of Market Segmentation,AI Product/Service Prototyping,
II. Market Segmentation using Machine Learning and Data Analysis (EV Market &
Online Cab booking),
III. AI Product / Service Business and Business and Financial Modelling,.
EDA
We start the Exploratory Data Analysis with some data Analysis drawn from the data without Principal
Component Analysis and with some Principal Component Analysis in the dataset obtained from the
combination of all the data we have. PCA is a statis- tical process that converts the observations of correlated
features into a set of linearly uncorrelated features with the help of orthogonal transformation. These new
trans- formed features are called the Principal Components. The process helps in reducing dimensions of the
data to make the process of classification/regression or any form of machine learning, cost-effective.
Extracting Segments
Dendrogram
This technique is specific to the agglomerative hierarchical method of clustering. The agglomerative
hierarchical method of clustering starts by considering each point as a separate cluster and starts joining points
to clusters in a hierarchical fashion based on their distances. To get the optimal number of clusters for
hierarchical clustering, we make use of a dendrogram which is a tree-like chart that shows the sequences of
merges or splits of clusters. If two clusters are merged, the dendrogram will join them in a graph and the
height of the join will be the distance between those clusters. As shown in Figure, we can chose the optimal
number of clusters based on hierarchical structure of the dendrogram. As highlighted by other cluster
validation metrics, four to five clusters can be considered for the agglomerative hierarchical as well.
Figure 9: Dendrogram Plot for our Dataset
Elbow Method
The Elbow method is a popular method for determining the optimal number of clusters. The method is based
on calculating the Within-Cluster-Sum of Squared Errors (WSS) for a different number of clusters (k) and
selecting the k for which change in WSS first starts to diminish. The idea behind the elbow method is that the
explained variation changes rapidly for a small number of clusters and then it slows down leading to an elbow
formation in the curve. The elbow point is the number of clusters we can use for our clustering algorithm.
The KElbowVisualizer function fits the KMeans model for a range of clusters values between 2 to 8. As
shown in Figure, the elbow point is achieved which is highlighted by the function itself. The function also
informs us about how much time was needed to plot models for various numbers of clusters through the green
line.
Figure 10: Evaluating the cluters using Distortion
K-Means Algorithm
K Means algorithm is an iterative algorithm that tries to partition the dataset into pre-defined distinct non-
overlapping subgroups (clusters) where each data point be- longs to only one group. It tries to make the intra-
cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns
data points to a cluster such that the sum of the squared distance between the data points and the cluster’s
centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The less
variation we have within clusters, the more homo- geneous (similar) the data points are within the same
cluster.
The way k means algorithm works is as follows:
• Specify number of clusters K.
• Initialize centroids by first shuffling the dataset and then randomly selecting K data
points for the centroids without replacement.
• Keep iterating until there is no change to the centroids. i.e assignment of data points
to clusters isn’t changing.
The approach k-means follows to solve the problem is expectation maximization The E-step is assigning the
data points to the closest cluster. The M-step is computing the centroid of each cluster. Below is a break down
of how we can solve it mathematically,
The objective function is:
(1)
And M-step is :
w
∂µk ik(xi − µk) = 0
Σm
⇒ µk = Σi=1m wikxi w
According to the Elbow method, here we take K=4 clusters to train KMeans model. The derived clusters are
shown in the following figure
Prediction of Prices most used cars
Linear regression is a machine learning algorithm based on supervised learning. It per- forms a regression
task. Regression models targets prediction value based on indepen- dent variables. It is mostly used for finding
out the relationship between variables and forecasting. Here we use a linear regression model to predict the
prices of different Electric cars in different companies. X contains the independent variables and y is the
dependent Prices that is to be predicted. We train our model with a splitting of data into a 4:6 ratio, i.e. 40% of
the data is used to train the model.
LinearRegression().fit(Xtrain,ytrain) command is used to fit the data set into model. The values of intercept,
coefficient, and cumulative distribution function (CDF) are described in the figure.
After completion of training the model process, we test the remaining 60% of data on the model. The obtained
results are checked using a scatter plot between predicted values and the original test data set for the
dependent variable and acquired similar to a straight line as shown in the figure and the density function is
also normally dis- tributed.
The metrics of the algorithm, Mean absolute error, Mean squared error and mean square root error are
described in the below figure:
• EVs with 5 sitters dominate the market while EVs with 2 sitters are less in number.
• Top Speed & Range : With a large area of market the cost is dependent on Top
speeds and Maximum range of cars.
• Efficiency : Mostly the segments are with most efficiency.
• Price : From the above analysis, the price range is between 16,00,000 to 1,80,00,000.
• Maharashtra, Gujarat, Tamil Nadu, Karnataka and Andhra Pradesh are among the top
states with the majority of EV 2wheelers while Assam, Himachal Pradesh, Sikkim, J&K
with the least.
• Uttar Pradesh, Assam and Bihar are among the top states with the majority of EV 3-
wheelers while the remaining states don't seem to depend on the same.
• Maharashtra, Delhi, Karnataka, Kerala and Andhra Pradesh are among the top states
with the majority of EV 4-wheelers while the remaining states have less number of EV 4-
wheelers.
• Maharashtra, Gujarat, Karnataka, Kerala, Uttar Pradesh, Rajasthan, and Andhra
Pradesh are among the top states with the majority of EV charging stations sanctioned while
the remaining states have less number of the same.
• Rajasthan, Madhya Pradesh, Maharashtra, Karnataka, Uttar Pradesh are among the
top states with the majority of retail outlets for EV charging while the remaining states have
less number of the same.
• Tesla, Audi, Volkswagen, Nissan, Skoda tops the list of EVs with the maximum
number of models in the Indian automobile market.
• SUV and Hatchback body types form the majority while Station and MPV the
minority.
• Based on the number of seats, Tesla, Mercedes and Nissan have the maximum
number of seats and Smart the minimum.
• Based on accleration, EVs from Renault, Seat and Smart are the top performers while
Tesla, Lucid and Porsche dont make it to the same.
• Based on speed parameter, EVs from Tesla, Lucid and Porsche are the top performers
while Renault, Smart and SEAT do not make it to the same.
• Based on range (Km), Lucid, Lightyear and Tesla have the highest range and Smart
the lowest.
The marketing mix refers to the set of actions, or tactics, that a company uses to pro- mote its brand or product
in the market. The 4Ps make up a typical marketing mix -Price, Product, Promotion and Place.
• Price: refers to the value that is put for a product. It depends on segment targeted,
ability of the companies to pay, ability of customers to pay supply - demand and a host of
other direct and indirect factors.
• Product: refers to the product actually being sold – In this case, the service. The
product must deliver a minimum level of performance; otherwise even the best work on the
other elements of the marketing mix won’t do any good.
• Place: refers to the point of sale. In every industry, catching the eye of the consumer
and making it easy for her to buy it is the main aim of a good distribution or ’place’ strategy.
Retailers pay a premium for the right location. In fact, the mantra of a successful retail
business is ’location, location, location’.
• Promotion: this refers to all the activities undertaken to make the product or service
known to the user and trade. This can include advertising, word of mouth, press reports,
incentives, commissions and awards to the trade. It can also include consumer schemes,
direct marketing, contests and prizes.
All the elements of the marketing mix influence each other. They make up the business plan for a company
and handle it right, and can give it great success. The marketing mix needs a lot of understanding, market
research and consultation with several people, from users to trade to manufacturing and several others.
Github Links:
Shubradip Ghosh: https://github.com/shubradip/customer-segmentation-
3RD PROJECT:
AI Product Service Prototype Development and
Business/Financial Modelling
.
-Step 3: Business Modelling
Developing a business model for the AI Product/Service
https://templates.office.com/en-in/Search/results?query=business+plan
https://www.investopedia.com/terms/b/businessmodel.asp#:~:text=For%20instance%2C%20dir
ect%20sales%2C%20franchising,sporting%20organizations%20like%20the%20NBA.
https://alcorfund.com/insight/18-business-model-example-explained/
-Step 4: Financial Modelling (equation) with Machine Learning & Data Analysis
Description:
Suppose a Market is growing linearly, design a linear financial model y=mx(t)+c, where y =
total profit, m=pricing of your product, x(t)=total sales ( market as a function of time)
c=production, maintenance etc costs.
If a Market is growing exponentially, design an exponential market trend. (financial model will
be an exponential equation)
1. Demographics: All age groups, genders, and economic levels are represented in the target
population for the Indian footwear industry. The youth demographic, which includes college
students and young professionals, is the greatest consumer category, nonetheless.
2. Style and design: Indian shoppers are seeking fashionable, comfortable shoes that complement
their personal style and way of life. The market is flooded with popular designs, hues, and patterns
that draw inspiration from conventional Indian culture.
3. Comfort and quality: Indian consumers are increasingly searching for supportive, comfy footwear.
Consumers think carefully about quality and durability when choosing footwear.
4. Price range: The Indian footwear industry has both high-end and entry-level products. Although
customers are ready to pay more for luxury brands, there is still a sizable market for reasonably
priced footwear that offers good value.
5. Distribution channels: While offline retail establishments still hold the lion's share of the Indian
footwear market, online retail platforms are quickly gaining ground. In India, footwear businesses
are rapidly using e-commerce websites, social media networks, and mobile applications as
distribution methods.
4.0 External Search-
I took help from few websites for for analyzing the need of this system and also to find how this system is
used across the globe currently.Few of the websites are mentioned below:
5.0 Benchmarking -
Several online retailers, like Adidas, Nike, and Fila, employ similar strategies to boost sales while also giving
customers a pleasant shopping experience. Benchmarking often entails comparing project procedures and
performance indicators to either successful recently completed projects or industry best standards and
practices. For this, it is necessary to continually look for ways to incorporate better methods that provide
better outcomes.
Shubradip Ghosh::
https://github.com/shubradip/customer-segmentation-
/blob/main/Time%20Series%20Forecasting%20Prediction%20for%20Shoe-Sales.py
Amit Pathak:
https://github.com/sap1996/final-project
A. Feasibility-
Within a few months, this project can be developed and made available to the public as SaaS.
B. Viability-
There will always be small firms that may utilise this service to improve their sales and data warehousing
skills as the shoe industry expands in India and throughout the world. So, it will be possible to live in the
long run, although advancements will be required when new technologies are developed.
C. Monetization-
This service can be immediately published as a service that businesses may utilise, making it directly
monetizable.
1. Basic Plan: Access to fundamental features including tools for inventory management and demand
forecasting as well as historical sales data analysis may be included in this plan. It could have a
cheap monthly cost, making it appropriate for small enterprises.
2. Pro Plan: This plan may come with more sophisticated capabilities including real-time sales data
analysis, tailored customer suggestions, and targeted marketing tools. It can have a greater monthly
cost, making it appropriate for bigger companies or those seeking more sophisticated services.
3. Enterprise Plan: Designed for large companies with specific requirements, this plan may be altered
to add features like specialised support, API access, and bespoke machine learning models. Based
on the particular needs of the business, the price for this plan may be adjusted.
12.0 Conclusion-
In this study, a variety of forecasting techniques are utilised to estimate how much of the product will be sold
in the future. The prediction error will be used to choose the forecasting technique. The more precise the
forecasting process, the lower the forecast error. This study's particular goal was to determine the most
accurate quantitative forecasting technique for the shoe industry based on practical usability and accuracy.
This study found that because of factors like weather and particular festivals, many real-life forecasting
situations were more complicated and challenging. For production, facility monitoring, seasonal
employment, short-term and long-term planning, these predictions can offer useful information.
JANUARY
ACTIVITIES:
Programmatic Activities/Orientation/Induction:
Daily lecture by industry expert on Skillvertex Datascience program. Learned various combination of
multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract
knowledge and insights from it.
First month Focusing on:
o data gathering, analysis and decision-making.
o finding patterns in data, through analysis, and make future predictions.
using Data Science,to help the companies by:
o Better decisions (should we choose A or B)
o Predictive analysis (what will happen next?)
o Pattern discoveries (find pattern, or maybe hidden information in the data)
o Documents/ReportsSummitted:
o Decision Tree and Random Forest
o Classification with Logistic Regression and Naive Bayes o Naive bayes doc o KNN o Regression
Using Module o Polynomial Regression from scratch o linear regression o preprocessing and EDA o
visualisation basics o Pandas basics o Numpy o Math-and-Random-Module o Math-and-Random-Module
o modules and packages o Functions and something extra o for and while loops, Useful Operators o
statements: IF ELSE o Comparison operator o List, Tuple, dictionaries and sets o Numbers, variables and
Strings o Minor project Loan mount Prediction
Future projects assigned:
A project on predicting Life expectancy and form a report
FEBRUARY
ACTIVITIES:
Programmatic Activities/Orientation/Induction:
Daily lecture by industry expert on Skillvertex Datascience program. Learned various combination of
multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract
knowledge and insights from it to make a final dashboard.
2 nd Project: •
Predicting Life expectancy
Business Problem
Life Expectancy is affected by various factors. WHO wishes to predict life expectancy and determine
which factors has significant impact. From this project, WHO would be able to give a country its life
expectancy and suggestions on which factor to focus on to improve their life expectancy. Based on a
publicly available WHO dataset, this research analyses the variables that affect life expectancy. Data was
gathered during the years of 2000 and 2015. A country's development status (developed versus
developing), GDP, population, schooling years, alcohol use, BMI, government health spending, health
spending per unit of GDP, various immunisation coverage, thinness disease, measles cases, HIV/AIDS
deaths, and the mortality rate of adults, children, and infants were among the factors that were examined.
Data were carefully examined (horizontally and vertically), cleansed, and changed during the processing
stage. Using the Bagged-trees technique, missing values were imputed. Box and whisker plots,
histograms, and multiple factor analyses (MFA) were used in exploratory data analysis (EDA) to explore
and mine the trends within the data. MFA is a method of unsupervised machine learning.
The results of a multiple linear regression model that passed the assumption tests indicated that education
(Coeff. Est: 1.15), total government health spending (Coeff. Est: 0.08), BMI (0.03), GDP (Coeff. Est:
0.00004), and diphtheria and polio vaccinations (Coeff. Est: 0.03) and polio vaccinations (Coeff. Est:
0.02) vaccinations are positively correlated significant variables (p Similar findings are supported by a
partial study of the longitudinal multilevel modeling's entire model, which also found that if a country had
a lower beginning life expectancy in the year 2000. Between 2000 and 2015, it would have a faster rate of
life expectancy improvement (intercept-slope corr = -0.55). Between 2000 to 2015, it suggests an
improvement in life expectancy in developing nations that were linked to lower life expectancy. The
whole model also comes to the conclusion that the average human life expectancy rises year by 0.25 years,
with a degree of confidence ranging from 0.16 years to 0.34 years (p 0.001).
Documents/ReportsSummitted:
• A report on life expectancy model . • Python Notebook coding file. • Dashboard
MARCH-APRIL
1. Problem Statement ( EV Market)
You are a team working under an Electric Vehicle Startup. The Startup is still deciding in which
vehicle/customer space it will be develop its EVs.
You have to analyse the Electric Vehicle market in India using Segmentation analysis and come up with a
feasible strategy to enter the market, targeting the segments most likely to use Electric vehicles.
DO NOTE that not every MARKET has Geographic, Demographic, Psychographic, Behavioral data
available easily and there is going to be lot of research required in DATA Collection Tasks.
Reports submitted:
-Data Analysis report and elaboration on how our team arrived at chosen geographic, demographic,
psycho graphic, behavioral factors, or other segments Interns came up with after analysis and research.
MAY-JUNE
ACTIVITIES:
Programmatic Activities/Orientation/Induction:
Small scale B2B problem solving ,communication medium Discord, Feynn
Labs,ML Internship program 2023(batch jan-2023). Learned various
combination of multiple disciplines that uses statistics, data analysis, and
machine learning to analyze data and to extract knowledge and insights from it
to make a final dashboard.
4 th Project:
Business Problem
Problem Statement (Online Vehicle Booking Market)
You are a team working under an Online Vehicle Booking Product Startup. Due
to heavy competition in Cab booking from Ola and Uber in India, the startup is
looking for an alternate segment which can generate them early foot in the
market and revenue.
You have to analyse the Vehicle market in India using Segmentation analysis
and come up with a feasible strategy to enter the market, targeting the segments
where there can be possible profit by offering Vehicle booking service.
(CUSTOMER/VEHICLE/B2B) SEGMENTS: Apart from Geographic,
Demographic, Psychographic, Behavioral segments, teams can consider
different CATEGORY of Segments for the Segmentation Tasks, based on
AVAILABILITY OF DATA. Market Segmentation comes with wide scope of
possibility and Segments created can change based on different datasets
collected.
STRATEGY
- Analysis of which location in India is most suitable to create the early market
in accordance with Innovation Adoption Life Cycle. (Read more @
https://en.wikipedia.org/wiki/Technology_adoption_life_cycle)
- Which demographic, psychographic, behavioural or other factors your team
will target based on Data Analysis of available datasets. In the event of
unavailability of proper datasets, how your team will base your decisions as
accurate and unbiased as possible.
-Strategic pricing range of products with understanding of early market
psychographics.
DATA COLLECTION KEYWORDS: General Vehicle Type Data, Vehicle
Industry Data, Online Cab booking statistics, etc.
TEAM REPORT
-Data Analysis report and elaboration on how your team arrived at chosen
geographic, demographic, psycho graphic, behavioral factors, or other segments
Interns came up with after analysis and research.
- Team Members Name (only those who contributed)
0. Fermi Estimation (Breakdown of Problem Statement)
1. Data Sources (Data Collection - all Team Members should perform Data
Collection Task.)
2. Data Pre-processing (Steps and Libraries used)
3. Segment Extraction (ML techniques used)
4. Profiling and describing potential segments
5. Selection of target segment
6. Customizing the Marketing Mix
7. (for Business Markets)Potential customer base in the early market, thereby
calculating the potential sale (profit) in the early market (Potential Customer
Base * Your Target Price Range = Potential Profit).
8. The MOST OPTIMAL MARKET SEGMENTS to open in the market as per
your Market Research and Segmentation