Data Management For Production Quality Deep Learn Models
Data Management For Production Quality Deep Learn Models
Suman Saha
Amity University Kolkata.
Enrolment no. : A 91413 8122 008
Machine Learning – 2
DSC 308
From Slow Progress to Wide
Adoption
•1940s – 1980s, very slow progress due to:
Computation hardware capacity limitation
Number of observations with labeled results in the dataset
Lack efficient algorithm to estimate the parameters in the model
•1980s – 2010, a few applications in real word due to:
Moore’s Law + GPU : the development of powerful and sophisticated
algorithms
Internet generated large labeled dataset
Efficient algorithm for optimization (i.e Data set and Backpropagation)
Better activation functions (i.e. Relu)
•
2010-Now, very broad application in various areas:
Near-human-level image classification
Near-human-level
Much improved handwriting
speech transcription
recognition
Much improved machine translation
•Now we know working neural network models usually contains many layers (i.e. the depth of
the network is deep), and to achieve near-human-level accuracy the deep neural network need
huge training dataset (for example millions of labeled pictures for image classification
problem).
•Deep Learning Models:
Tremendous increase in developing ML algorithms
Large scale data sets though may be annotated are time consuming, complexity
Complexity: Data collection methods, Data pruning and Data labelling via crowd sourcing workflow of search ML Algorithms
Implementation of weak supervision models and not the class models to build a higher nuanced ML Algorithm
Weakly supervised learning is used to optimize datasets for deep learning production quality because it allows for efficient
creation of large, high-quality training datasets, even when fully labeled data is scarce or expensive to obtain. This is crucial in
real-world scenarios where getting perfect, manually labeled data is often impractical.
• Adaptability:
• Weak supervision is well-suited for situations where data distributions can shift or new information becomes available. It allows for
easy adaptation and refinement of the training data as the model's understanding of the problem evolves.
• Cost-Effectiveness:
• By automating or reducing the need for human labeling, weakly supervised learning can significantly reduce the cost of building
and deploying deep learning models.
• Challenges include poor explainability, poor traceability, data dependencies, data set incompleteness, and data management
problems
• Data and DL forms a symbiotic relationship in which DL is useless without data and data management is almost impossible to overcome without DL.
• The sheer volume and variety of data ingested by modern analytical pipelines considerably enhances the links between data integration and machine learning
(Dong and Rekatsinas, 2018).
• Data management systems are increasingly using AI models like machine learning to automate parts of data life cycle tasks. Examples include data cataloging and
inferring the schema of raw data (Halevy et al., 2016).
• Established companies and newcomers alike prefer to use data-driven tactics to develop, compete, and gain value from deep and up-to-date information in most
industries (Manyika et al., 2011). However, in the current scenario organizations struggle with collecting, integrating, and managing the data. Instead of solving
these data issues, DL will only make them more noticeable.
• An online recommender system used by an electric vendor. DL components in recommender systems are trained on user reviews and the purchase history.
Predicts users’ interest and recommends electric products based on previous customer reviews and purchase history.
• Online recommender services help the company boost sales by leveraging the power of data. Many customers tend to look at the website for their
recommendations. Personalized recommendations from the system thus increase customer satisfaction and thus customer retention.
• Graph based ML algorithms implementation : Decision Tree , Random forest
• Deep Learning is used to forecast weather and thereby predicting the wind power that can be generated in the future. The power companies have strict
requirements in terms of the amount of power they are going to deliver, and penalties should be paid if they can.
• A DL component is incorporated in the system to predict the quality of the resulting product based on all the measurements in the machine and measurements
that goes in. Further, there are also images of what happens at the start of the machine, and microscope images. The DL system serve as a foundation to control
the manufacturing process.
• Phases of Deep learn model in data set fitting (i) Data Collection, (ii) Data Exploration, (iii) Data Preprocessing, (iv) Dataset Preparation, (v) Data Testing, (vi)
Deployment, (vii) Post-deployment. Further, the codes such as problem description, implications, empirical basis, examples etc. were formed .
• Data augmentation techniques are task-specific. For example, data augmentation strategies that are appropriate for time series classification may not be
appropriate for detecting time series anomalies (Hu et al., 2019). These two techniques of manipulation perform in diverse contexts: augmentation outperforms
weighing when only a small quantity of data is available, but weighting outperforms augmentation when dealing with class imbalance issues. As a result,
depending on the application parameters.
Intermediate layers
Relu (i.e. rectified linear unit) is usually a good choice which has the following good properties: (1) fast computation; (2) non-linear; (3) reduced likelihood of the gradient
to vanish; (4) Unconstrained response
Sigmoid, studied in the past, not as good as Relu in deep learning, due to the gradient vanishing problem when there are many layers
Last layer which connects to the output
Binary classification: sigmoid with binary cross entropy as loss function
Multiple class, single-label classification: softmax with categorical cross entropy for loss function
Continuous responses: identity function (i.e. y = x)
Data: Require large well-labeled dataset
[23, 55, 5, 78, 9, 0, 0, 0, 0, 0] [0.3, 0.9, 0.1, 0.2, …, 0.7, 0.8] It is low dimension compared with
[78, 55, 8, 17, 12, 234, 33, 9, 14, 78] [0.2, 0.7, 0.3, 0.7, …, 0.4, 0.3] OHE, real number, meaningful, and
[65, 36, 0, 0, 0, 0, 0, 0, 0, 0] [0.5, 0.8, 0.4, 0.6, …, 0.5, 0.9] can be learned on specific data or
… …
use pre-trained embeddings.
10 columns
RNN Extension – LSTM
Simple RNN layer is a good starting point, but the performance is usually not that good because long-
term dependencies are impossible to learn due to vanishing gradient problem in the optimization process.
LSTM Long Short Term Memory RNN model introduce a “carry out” mechanism
such that useful information from the front words can be carried to later
words like a conveyor belt without suffering the vanishing gradient problem
we see in the simple RNN case.
[0.4,0.3,0.7,0.2]