Data Science Part 1
Data Science Part 1
Note 1: Depending on the type of data given to an AI System it is classified into three domains :
1) Data Science
2) Computer Vision
3) Natural Language Processing
1) Internet Search
There are many search engines such as Google, Yahoo, Bing, AOL etc. that make use of data
science algorithms to deliver the best result for our search query in a fraction of second.
3) Website Recommendation
Lots of companies into e-commerce business like Amazon, Flipkart, eBay, Twitter, Linkdin,
Netflix etc, help the customers find relevant products from billions of products available with
them based on the user’s past experience, preferable search and interests. For example if
we search for Bluetooth speaker, we will probably get recommendation for a bunch of
speakers of popular brands available online to compare with. All this is possible only using
Data Science as a tool.
4) Image Recognition
When we upload a picture, an automatic tag recognition system used by applications like
Facebook, suggest us people to tag. It is possible just because of Data Science.
NOTE : THERE ARE OTHERS APPLICATIONS TOO, YOU CAN ALSO HAVE A LOOK TO THOSE.
Scenario
Every restaurant prepares food in bulk as they expect a good crowd to come and enjoy their food.
However, if the expectation is not met, a good amount of food gets wasted which eventually
becomes a loss for the restaurant as they have either to dump it or give it to hungry people free. And
if this daily loss taken into account for a year, it becomes a quite a big amount (financial loss).
Now our goal is to develop an AI model /machine that can predict the quantity of food to be
prepared by restaurant to minimize the wastage of food
This is the first stage of AI project Cycle. In this stage we closely examine the various factors that
cause the problem in order to build an AI enabled project.
After clearly explaining the 4W, the goal of the project would be ‘To predict the quantity of food
dishes to be prepared for everyday consumption in restaurant buffets’
Our next step is to acquire data for training and testing for our AI model. For that we need to collect
data as per our project (data feature).
By looking at the problem statement, the data features that will be considered for the preparation of
food for the next day buffet for consumption are as follows:
After creating the system map flow, we get to know the dependency of different factor on each
other. Hence we extract the meaning data from the acquired data and the following data need to
prepared for the model.
As the data, which is need to be collected, is a continuous data for a certain period and there is also
dependency factor involved between different data, we will use a REGRESSION model for our
project.
For example if we have collected a continuous data for 30 days, we will train the model for the first
20 days (training data) and then is evaluated for the next 10 days (testing data).
The next stage is to test the model if its working properly or not. The following are the steps
followed to test our model based on the above scenario.
Step 1: We feed the data to the trained model. In this example, Name of the dish and the quantity
produced are fed to the trained model.
Step 2: To feed the data of quantity of unconsumed food of the same dish on previous days.
Step 3: The model then works upon the entries based on the training it got in the modelling stage.
Step 4: The model predicts the quantity of food to be prepared for the next day.
Step 5: The predicted quantity is now compared with the testing data. From the testing data, the
quantity of food to be produced for the next day should be total quantity minus the unconsumed
quantity.
Step 6: The model is tested with different dataset at least 10 times during training
Step 7: Now the predicted values and actual values are compared to check the efficiency of the
model.
Step 8: The model is said to be accurate if the difference between the predicted value and actual
value are similar. If no, then for better efficiency accuracy, either the model selection is changed or it
is trained on more data.