0% found this document useful (0 votes)
9 views9 pages

Data Management For Production Quality Deep Learn Models

The document discusses the evolution and challenges of data management for production-quality deep learning models, highlighting the transition from slow progress to widespread adoption due to advancements in hardware and algorithms. It emphasizes the importance of weak supervision in optimizing datasets, cost-effectiveness, and adaptability in real-world scenarios, while also addressing the symbiotic relationship between data and deep learning. Additionally, it outlines various phases of deep learning model development, data encoding techniques, and the use of LSTM for better handling of long-term dependencies in data.

Uploaded by

suman.struc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

Data Management For Production Quality Deep Learn Models

The document discusses the evolution and challenges of data management for production-quality deep learning models, highlighting the transition from slow progress to widespread adoption due to advancements in hardware and algorithms. It emphasizes the importance of weak supervision in optimizing datasets, cost-effectiveness, and adaptability in real-world scenarios, while also addressing the symbiotic relationship between data and deep learning. Additionally, it outlines various phases of deep learning model development, data encoding techniques, and the use of LSTM for better handling of long-term dependencies in data.

Uploaded by

suman.struc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Management for production quality Deep Learn Models, :

Challenges and Solutions

Suman Saha
Amity University Kolkata.
Enrolment no. : A 91413 8122 008
Machine Learning – 2
DSC 308
From Slow Progress to Wide
Adoption
•1940s – 1980s, very slow progress due to:
 Computation hardware capacity limitation
 Number of observations with labeled results in the dataset
 Lack efficient algorithm to estimate the parameters in the model
•1980s – 2010, a few applications in real word due to:
 Moore’s Law + GPU : the development of powerful and sophisticated
 algorithms
 Internet generated large labeled dataset
Efficient algorithm for optimization (i.e Data set and Backpropagation)
 Better activation functions (i.e. Relu)
•
2010-Now, very broad application in various areas:
 Near-human-level image classification
 Near-human-level
Much improved handwriting
speech transcription
recognition
 Much improved machine translation
•Now we know working neural network models usually contains many layers (i.e. the depth of
the network is deep), and to achieve near-human-level accuracy the deep neural network need
huge training dataset (for example millions of labeled pictures for image classification
problem).
•Deep Learning Models:
 Tremendous increase in developing ML algorithms

 Question on how to leverage production quality data.

 Large scale data sets though may be annotated are time consuming, complexity

 Complexity: Data collection methods, Data pruning and Data labelling via crowd sourcing workflow of search ML Algorithms

 Implementation of weak supervision models and not the class models to build a higher nuanced ML Algorithm

 Weak supervision to the unlabeled data provided a bias/noise aware DL model.

 Weakly supervised learning is used to optimize datasets for deep learning production quality because it allows for efficient
creation of large, high-quality training datasets, even when fully labeled data is scarce or expensive to obtain. This is crucial in
real-world scenarios where getting perfect, manually labeled data is often impractical.
• Adaptability:
• Weak supervision is well-suited for situations where data distributions can shift or new information becomes available. It allows for
easy adaptation and refinement of the training data as the model's understanding of the problem evolves.
• Cost-Effectiveness:
• By automating or reducing the need for human labeling, weakly supervised learning can significantly reduce the cost of building
and deploying deep learning models.
• Challenges include poor explainability, poor traceability, data dependencies, data set incompleteness, and data management
problems

• Data and DL forms a symbiotic relationship in which DL is useless without data and data management is almost impossible to overcome without DL.
• The sheer volume and variety of data ingested by modern analytical pipelines considerably enhances the links between data integration and machine learning
(Dong and Rekatsinas, 2018).
• Data management systems are increasingly using AI models like machine learning to automate parts of data life cycle tasks. Examples include data cataloging and
inferring the schema of raw data (Halevy et al., 2016).
• Established companies and newcomers alike prefer to use data-driven tactics to develop, compete, and gain value from deep and up-to-date information in most
industries (Manyika et al., 2011). However, in the current scenario organizations struggle with collecting, integrating, and managing the data. Instead of solving
these data issues, DL will only make them more noticeable.
• An online recommender system used by an electric vendor. DL components in recommender systems are trained on user reviews and the purchase history.
Predicts users’ interest and recommends electric products based on previous customer reviews and purchase history.
• Online recommender services help the company boost sales by leveraging the power of data. Many customers tend to look at the website for their
recommendations. Personalized recommendations from the system thus increase customer satisfaction and thus customer retention.
• Graph based ML algorithms implementation : Decision Tree , Random forest
• Deep Learning is used to forecast weather and thereby predicting the wind power that can be generated in the future. The power companies have strict
requirements in terms of the amount of power they are going to deliver, and penalties should be paid if they can.
• A DL component is incorporated in the system to predict the quality of the resulting product based on all the measurements in the machine and measurements
that goes in. Further, there are also images of what happens at the start of the machine, and microscope images. The DL system serve as a foundation to control
the manufacturing process.
• Phases of Deep learn model in data set fitting (i) Data Collection, (ii) Data Exploration, (iii) Data Preprocessing, (iv) Dataset Preparation, (v) Data Testing, (vi)
Deployment, (vii) Post-deployment. Further, the codes such as problem description, implications, empirical basis, examples etc. were formed .
• Data augmentation techniques are task-specific. For example, data augmentation strategies that are appropriate for time series classification may not be
appropriate for detecting time series anomalies (Hu et al., 2019). These two techniques of manipulation perform in diverse contexts: augmentation outperforms
weighing when only a small quantity of data is available, but weighting outperforms augmentation when dealing with class imbalance issues. As a result,
depending on the application parameters.
 Intermediate layers
 Relu (i.e. rectified linear unit) is usually a good choice which has the following good properties: (1) fast computation; (2) non-linear; (3) reduced likelihood of the gradient
to vanish; (4) Unconstrained response
 Sigmoid, studied in the past, not as good as Relu in deep learning, due to the gradient vanishing problem when there are many layers
 Last layer which connects to the output
 Binary classification: sigmoid with binary cross entropy as loss function

 Multiple class, single-label classification: softmax with categorical cross entropy for loss function
 Continuous responses: identity function (i.e. y = x)
 Data: Require large well-labeled dataset

 Computation: intensive matrix-matrix operation

 Structure of fully connected feedforward NN


 Size of the NN: total number of parameters
 Depth: total number of layers (this is where deep learning comes from)
 Width of a particular layer: number of nodes (i.e. neurons) in that layer
  Optimization methods (SGD)
Activation function
 Intermediate layers  Batch size
 Last layer connecting to outputs  Learning rate
 Epoch

 Loss function  Deal with overfitting


 Classification (i.e. categorical response)  Dropout
 Regression (i.e. continuous response)  Regularization (L1 or L2)
Analyzing Text – Encoding/Embedding Production
Quality Data
Categorical integers can not be used directly to algorithm as there is no mathematical relationship among these categories.
We have to use either Encoding or Embedding.

250 columns! The OHE will explode the


One Hot One_Column One_Column_dummy dimension if we have 10,000
Encoding 23 0, 0, 0, …, 1, 0, 0, …, 0, 0 unique words in vocabulary and
a few hundreds words in the
each review in the training
Original data frame OHE data dataset!
[23, 55, 5, 78, 9, 0, 0, 0, 0, 0] frame
[0, 0, 0, …, 1, 0, …, 0, 0, 0, 0, 0]
[78, 55, 8, 17, 12, 234, 33, 9, 14, 78] [0, 0, 0, …, 0, 0, …, 0, 0, 0, 0, 0]
[65, 36, 0, 0, 0, 0, 0, 0, 0, 0] [0, 0, 0, …, 0, 0, …, 0, 0, 0, 0, 0] It is binary, sparse, without
… … consider the relationship among
words. OHE is generally not
10 columns 250x10 columns! used in text related models.

Configurable (here use 4) Dense embedding utilizes the


One_Column inherit relationship of words and
Dense Embedding_Column dramatically reduce the embedded
Embedding 23 0.3, 0.9, 0.1, 0.2 dimension. The dimension is a
vector space and can be configured
such as 4 for this example.
Original data frame OHE data frame Word2vec has 300 for the vector.

[23, 55, 5, 78, 9, 0, 0, 0, 0, 0] [0.3, 0.9, 0.1, 0.2, …, 0.7, 0.8] It is low dimension compared with
[78, 55, 8, 17, 12, 234, 33, 9, 14, 78] [0.2, 0.7, 0.3, 0.7, …, 0.4, 0.3] OHE, real number, meaningful, and
[65, 36, 0, 0, 0, 0, 0, 0, 0, 0] [0.5, 0.8, 0.4, 0.6, …, 0.5, 0.9] can be learned on specific data or
… …
use pre-trained embeddings.
10 columns
RNN Extension – LSTM
Simple RNN layer is a good starting point, but the performance is usually not that good because long-
term dependencies are impossible to learn due to vanishing gradient problem in the optimization process.

LSTM Long Short Term Memory RNN model introduce a “carry out” mechanism
such that useful information from the front words can be carried to later
words like a conveyor belt without suffering the vanishing gradient problem
we see in the simple RNN case.
[0.4,0.3,0.7,0.2]

Final RNN output


To FFNN layer

Embedding [0.2,0.4,0.1,0.7] [0.7,0.1,0.5,0.4] [0.4,0.2,0.9,0.3] [0.6,0.1,0.8,0.4] [0.3,0.2,0.9,0.0]


Raw Text Input
This movie is great !
Implementation:
layer_simple_rnn() 
layer_lstm()

You might also like