The document provides an overview of Transfer Learning (TL), highlighting its definition, advantages, and strategies for application in machine learning. It discusses various categories of TL methods, including inductive, unsupervised, and transductive transfer, as well as approaches for knowledge transfer. Additionally, it covers methodologies such as feature extraction, fine-tuning, and the use of pretrained models, alongside applications in text, computer vision, and speech recognition.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
7 views19 pages
Unit - Ichp 4
The document provides an overview of Transfer Learning (TL), highlighting its definition, advantages, and strategies for application in machine learning. It discusses various categories of TL methods, including inductive, unsupervised, and transductive transfer, as well as approaches for knowledge transfer. Additionally, it covers methodologies such as feature extraction, fine-tuning, and the use of pretrained models, alongside applications in text, computer vision, and speech recognition.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19
UNIT-I
Transfer Learning Fundamentals
1. Introduction to Transfer Learning(TL) • Traditional ML trains every model in isolation based on the specific domain, data and task
• TL is a method of reusing the model or knowledge for another related task.
• Definition: Situation where what has been learned in one setting is exploited to improve generalization in another setting • Ex: Task T1: identifying objects in images within a restaurant domain • Task T2: identifying objects in images from a Park or Café • TL enables us to utilize knowledge from previously learned tasks and apply them to newer, related ones. • If we have more data for task T1, we may utilize the learnings and generalize them for task T2. • In case of image classification, certain low level features, such as edges, shapes, and lighting can be shared across tasks Advantages of TL • Improved baseline performance: When we augment the knowledge of an isolated learner with the knowledge from a source model, the baseline performance might improve due to this knowledge transfer • Model development time: will be less when the source model helps in learning the target task, as compared to a target model that learns from a scratch. • Improved final performance: will be attained by leveraging TL.
Note that one of the gains are possible. The
above diagrams shows better baseline performance (higher start), efficiency gains (higher slope) and better final performance (higher asymptote) 2. Transfer Learning Strategies • A domain, D, is defined as two-element tuple consisting of feature space, X, and marginal probability, P(X), where X is a sample data point. • X={x1,x2,….xn} Thus D={x,P(X)} • A task, T can be defined as two-element tuple consisting of label space, Y, and objective function, f. The objective function can be denoted as P(Y|X) from a probabilistic point of view. Thus T={Y,P(Y|X) • Using this frameworks TL can be defined as a process aimed at improving the target objective function, fr ( or target task, Tt), in the target domain, Dt, using knowledge from the Ts source task in the Ds domain. This leads to the following scenarios: • Feature Space: The feature spaces of source and target domains are different from each other , such as Xs ! =Xt. For instance , if our tasks are related to document classification, this scenario refers to source and target tasks in different languages. • Marginal Probability: The marginal probabilities of source and target domains are different from each other , such as P(Xs) != P(Xt). Also known as domain adaptation. • Label Space: The label space of source and target domains are different from each other , such as Ys != Yt. • Conditional Probabilities: The conditional probabilities of source and target domains are different from each other , such as P(Ys |Xs) != P(Yt|Xt). 2. Transfer Learning Strategies • During transfer learning, the following three important questions must be answered: • What to transfer: • First important step. • Try to seek which part of the knowledge can be transferred from the source to the target in order to improve the performance of the target task. • Try to identify which part of the knowledge is source specific and what is common between the source and target • When to transfer: • Transfer the knowledge for the sake of it may make matters worst than improving (negative transfer) • Aim at utilizing transfer learning to improve target task performance and not degrade them. • Need to be careful about when to transfer and when not to. • How to transfer • Identify the ways of actually transferring the knowledge across domains/tasks. • This involves changes to existing algorithms and different techniques. 2. Transfer Learning Strategies – Transfer Categories TL methods can be categorized based on the type of traditional ML algorithms involved such as: Inductive transfer: ▪ Here the source and target domains are the same, yet the source and target tasks are different from each other. ▪ The algorithms try to utilize the inductive biases of the source domain to help improve the target task. ▪ Depending upon whether the source domain contains labeled data or not, this can further be divided into two subcategories namely: Multitask learning and self-taught learning respectively Unsupervised transfer: ▪ Similar to inductive transfer, with a focus on the unsupervised tasks in the target domain. ▪ The source and target domains are similar, but the tasks are different. In this scenario, labelled data is unavailable in either of the domains. Transductive transfer: ▪ In this scenario, there are similarities between the source and target tasks but the corresponding domains are different. ▪ The source domain has a lot of labeled data while the target domain has none. ▪ Further classified into subcategories, where either the features spaces are different or the marginal probabilities 2. Transfer Learning Strategies – Transfer Categories – What to transfer - Approaches • Instance transfer: • Certain instances from the source domain can be reused along with the target data to improve results • Feature-representation transfer • Aims to minimize domain divergence and reduce error rates by identifying good feature representations that can be utilized from the source to target domains. • Either Supervised or unsupervised methods may be applied for feature-representation based transfers • Parameter transfer • This approach works on the assumption that the models for related tasks share some parameters or prior distribution of hyperparameters. • We may apply additional weightage to the loss of target domain to improve overall performance. • Relational-knowledge transfer • Unlike other three approaches, this method attempts to handle non-IID data, such as data that is not independent and identically distributed. • In this data, each data point has a relationship with other data points. • Social network data utilizes the relational knowledge transfer techniques. Transfer Learning and Deep Learning • Inductive Learning: • The objective of inductive learning algorithms is to infer a mapping from a set of training examples. • In case of classification, the model learns mapping between input features and class labels • To generalize well on unseen data, its algorithm works with a set of assumptions related to the distribution of the training data. These set of assumptions are known as inductive bias. • The inductive bias can be characterized by multiple factors, such as the hypothesis space it restricts to and the search process through the hypothesis space. • Thus these biases impact how and what is learned by the model on the given task and domain • Inductive Transfer: • Utilizes the inductive biases of the source task to assist the target task. This can be done in different ways, such as by adjusting the inductive bias of the target task by limiting the model space, limiting the model space, narrowing down the hypothesis space or making adjustments to the search process itself with the help of knowledge from the source task 3. Transfer Learning Methodologies • Training time and the amount of data required for deep learning systems is orders of magnitudes than traditional ML systems • Domains • Computer Vision • Natural Language Processing Transfer Learning Methodologies- Feature Extraction
DL architectures are layered
architectures that learn different features at different layers. These layers are finally connected to last layer to get the final output. This layered architecture allows us to utilize a pretrained network (such as Inception V3 or VGG) without its final layer as a fixed feature extractor for other tasks. If we utilize AlexNet without its final classification layer, it helps us to transform images from a new domain task into a 4096 dimensional vector, thus enables us to extract features from a new domain task, utilizing the knowledge from a source domain task. Transfer Learning Methodologies- Fine Tuning • Here we do not just replace the final layer, but we also selectively retrain some of the previous layers. • Deep neural networks are highly configurable architectures with various hyper parameters. • Usually, the initial layers capture generic features while the later ones focus on the specific task at hand. • Using this insight, we may freeze certain layers while training(fix the weights) or fine tune rest of them to suit our needs. • This will help us achieve better performance with less training time. Transfer Learning Methodologies- Pretrained Models • There are various deep learning networks with state-of-art performance that have been developed and tested across domains such as computer vision and NLP. • One of the fundamental requirement for TL is the presence of models that perform well on source tasks. • In most cases, people share the details of these networks for others to use, • These pre-trained networks/models form the basis of transfer learning • Pretrained models are usually shared in the form of millions of parameters/weights the model achieved while being trained to a stable state. • Python package used to download pretrained models: keras, Tensorflow, Berkley’s Model Zoo • Pretrained networks available: Xception, VGG16, InceptionV3 Applications • Transfer Learning with Text data • Texts are transformed or vectorized using different techniques • Embedding such as word2vec have been prepared using different training datasets. These are used in different tasks such as sentiment analysis and document classification by transferring the knowledge from source tasks • Transfer Learning with Computer Vision • Used in various computer vision tasks such as object identification using different CNN architectures. • Lower layers acts as conventional computer vision feature extractors such as edge detectors while the final layers work toward task specific features • These helped in utilizing the state of the art models such as VGG, AlexNet and Inceptions for target tasks such as style transfer and face detection that were different from what these models were trained for • Transfer Learning with Speech/Audio • Automatic Speech recognition(ASR) models developed for English have been successfully used to improve speech recognition performance of other language such as German. • Automated speaker identification is another example 4. Types of deep transfer Learning • Domain Adaptation • It is usually referred to in scenarios where the marginal probabilities between the source and target domains are different such as P(Xs) != P(Xt) • There is an inherent shift or drift in the data distribution of the source and target domains that requires tweaks to transfer the learning • For ex, a corpus of moview reviews labeled as +ve or –ve woud be different from a corpus of product-review sentiments • A classifier trained on movie-review sentiment would see a different distribution if utilized to classify product reviews • Thus domain adaptation techniques are used in these scenarios • Domain Confusion • Different layers in a deep learning network capture different set of features. • We can utilize this fact to learn domain-invariant features and improve their transferability across domains • In stead of allowing the model to learn any representation, we nudge the representations of both domains to be as similar as possible. • This can be achieved by applying certain preprocessing steps directly to the representations themselves. • The basic idea behind this technique is to add another objective to the source model to encourage similarity by confusing the domain itself. Types of deep transfer Learning • Multitask Learning • It is slightly different from TL. In this several tasks are learned simultaneously without distinction between source and targets. In this case, the learner receives information about multiple tasks at once. In TL, the learner has no idea about the target task initially. Types of deep transfer Learning • One-Shot Learning • DL systems are data hungry, such that they need many training examples to learn the weights • It infer the required output based on just one or a few training examples • Helpful for real-world scenarios where it is not possible to have labelled data for every possible class and in scenarios where new classes can be added often Types of deep transfer Learning • Zero-shot Learning • Relies on no labelled examples to learn a task • These methods make clever adjustments during the training stage itself to exploit additional information to understand unseen data • Used in machine translation 5. Challenges of transfer learning • Negative Transfer • Drop in performance • No improvement • Reason • Source task is not sufficiently related to target task • Solution • Bayesian approaches • Clustering based solutions • Transfer bounds • Quantifying the transfer about the quality and its viability • Solution • Kolmogorov complexity • Graph based approach