Class PPT - Unit2
Class PPT - Unit2
2. Feature Selection: While developing the machine learning model, only a few variables in the
dataset are useful for building the model, and the rest features are either redundant or
irrelevant. If we input the dataset with all these redundant and irrelevant features, it may
negatively impact and reduce the overall performance and accuracy of the model. Hence it is
very important to identify and select the most appropriate features from the data and remove the
irrelevant or less important features, which is done with the help of feature selection in
machine learning.
Need for Feature Engineering in Machine
Learning
•1. Imputation
•Feature engineering deals with inappropriate data, missing values, human interruption,
general errors, insufficient data sources, etc. Missing values within the dataset highly affect the
performance of the algorithm, and to deal with them "Imputation" technique is used.
Imputation is responsible for handling irregularities within the dataset.
•For example, removing the missing values from the complete row or complete column by a
huge percentage of missing values. But at the same time, to maintain the data size, it is required
to impute the missing data, which can be done as:
o For numerical data imputation, a default value can be imputed in a column, and missing values
can be filled with means or medians of the columns.
o For categorical data imputation, missing values can be interchanged with the maximum
occurred value in a column.
2.Handling Outliers
•Outliers are the deviated values or data points that are observed too away from other data points in
such a way that they badly affect the performance of the model. Outliers can be handled with this feature
engineering technique. This technique first identifies the outliers and then remove them out.
•Standard deviation can be used to identify the outliers. For example, each value within a space has a
definite to an average distance, but if a value is greater distant than a certain value, it can be considered
as an outlier. Z-score can also be used to detect outliers.
3.Log transform
•Logarithm transformation or log transform is one of the commonly used mathematical
techniques in machine learning. Log transform helps in handling the skewed data, and it makes the
distribution more approximate to normal after transformation. It also reduces the effects of outliers on
the data, as because of the normalization of magnitude differences, a model becomes much robust.
4.Binning
•In machine learning, overfitting is one of the main issues that degrade the performance of the model
and which occurs due to a greater number of parameters
•and noisy data. However, one of the popular techniques of feature engineering, "binning", can be used to
normalize the noisy data. This process involves segmenting
•different features into bins.
5.Feature Split
•As the name suggests, feature split is the process of splitting features intimately into two or more parts
and performing to make new features. This technique helps the algorithms to better understand and
learn the patterns in the dataset.
•The feature splitting process enables the new features to be clustered and binned, which results in
extracting useful information and improving the performance of the data models.
6.One hot encoding
•One hot encoding is the popular encoding technique in machine learning. It is a technique that converts
the categorical data in a form so that they can be easily understood by machine learning algorithms and
hence can make a good prediction. It enables group the of categorical data without losing any information.
Tools for feature engineering
•Featuretools
•Featuretools is one of the most widely used libraries for feature engineering automation. It supports a wide range of
operations such as selecting features and constructing new ones with relational databases, etc. In addition, it offers
simple conversions utilizing max, sum, mode, and other terms. But one of its most important functionalities is the
possibility to build features using deep feature synthesis (DFS).
•Feature Selector
•As the name suggests, Feature Selector is a Python library for choosing features. It determines attribute significance
based on missing data, single unique values, collinear or insignificant features. For that, it uses “lightgbm” tree-based
learning methods. The package also includes a set of visualization techniques that can provide more information
about the dataset
•PyCaret
•PyCaret is a Python-based open-source library. Although it is not a dedicated tool for automated feature engineering, it
does allow for the automatic generation of features before model training. Its advantage is that it lets you replace
hundreds of code lines with just a handful, thus increasing productivity and exponentially speeding up the
experimentation cycle.
Benefits Drawbacks
Models with engineered features result in faster data Making a proper feature list requires deep analysis
processing. and understanding of the business context and
processes.
Less complex models are easier to maintain. Feature engineering is often time- consuming.
2. Cross-sectional data: Data of one or more variables, collected at the same point in time.
•These are some of the terms and concepts associated with time series data analysis:
Dependence: Dependence refers to the association of two observations with the same variable at prior time points.
Stationarity: This parameter measures the mean or average value of the series. If a value remains constant over the given
time period, if there are spikes throughout the data, or if these values tend toward infinity, then it is not
stationarity.
Differencing: Differencing is a technique to make the time series stationary and to control the correlations that arise
automatically. That said, not all time series analyses need differencing and doing so can produce inaccurate estimates.
Curve fitting: Curve fitting as a regression method is useful for data not in a linear relationship. In such cases, the
mathematical equation for curve fitting ensures that data that falls too much on the fringes to have any real impact is
“regressed” onto a curve with a distinct formula that systems can use and interpret.
•Identifying Cross Sectional Data vs Time Series Data
•The opposite of time series data is cross-sectional data. This is when various
entities such as individuals and organizations are observed at a single point in
time to draw inferences. Both forms of data analysis have their own value,
and sometimes businesses use both forms of analysis to draw better
conclusions.
•Time series data can be found in nearly every area of business and
organizational application affected by the past. This ranges from economics,
social sciences, and anthropology to climate change, business, finance,
operations, and even epidemiology.
•In a time series, time is often the independent variable, and the goal is to
make a forecast for the future.
•The most prominent advantage of time series analysis is that—because data
points in a time series are collected in a linear manner at adjacent time
periods—it can potentially make correlations between observations. This
feature sets time series data apart from cross-sectional data.
•Time Series Analysis Techniques
•As we have seen above, time series analysis can be an ambitious goal for
organizations. In order to gain accurate results from model-fitting, one of several
mathematical models may be used in time series analysis such as:
Box-Jenkins autoregressive integrated moving average (ARIMA) models
•The Box-Jenkins models of both the ARIMA and multivariate varieties use the past behaviour
of a variable to decide which model is best to analyse it. The assumption is that any time
series data for analysis can be characterized by a linear function of its past values, past
errors, or both. When the model was first developed, the data used was from a gas furnace
and its variable behaviour over time.
•In contrast, the Holt-Winters exponential smoothing model is best suited to analyzing
time series data that exhibits a defining trend and varies by seasons. Such mathematical
models are a combination of several methods of measurement; the Holt-Winters method uses
weighted averages which can seem simple enough, but these values are layered on the
equations for exponential smoothing.
•Applications of Time Series Analysis
•Time series analysis models yield two outcomes:
Obtain an understanding of the underlying forces and structure that produced
the observed data patterns. Complex, real-world scenarios very rarely fall into
set patterns, and time series analysis allows for their study—along with all of
their variables as observed over time. This application is usually meant to
understand processes that happen gradually and over a period of time such
as the impact of climate change on the rise of infection rates.
Fit a mathematical model as accurately as possible so the process can move
into forecasting, monitoring, or even certain feedback loops. This is a use-
case for businesses that look to operate at scale and need all the input
they can get to succeed.
•From a practical standpoint, time series analysis in organizations are mostly used for:
Economic forecasting
Sales forecasting
Utility studies
Budgetary analysis
Yield projections
Census analysis
Inventory studies
Workload projections
•Time series in Financial and Business Domain
• Most financial, investment and business decisions are taken into consideration on the basis
of future changes and demands forecasts in the financial domain.
•Time series analysis and forecasting essential processes for explaining the dynamic and
influential behaviour of financial markets. Via examining financial data, an expert can predict
required forecasts for important financial applications in several areas such as risk evolution,
option pricing & trading, portfolio construction, etc.
•For example, time series analysis has become the intrinsic part of financial analysis and
can be used in predicting interest rates, foreign currency risk, volatility in stock markets
and many more. Policymakers and business experts use financial forecasting to make
decisions about production, purchases, market sustainability, allocation of resources, etc.
•In investment, this analysis is employed to track the price fluctuations and price of a
security over time. For instance, the price of a security can be recorded;
•For the short term, such as the observation per hour for a business day,
and For the long term, such as observation at the month end for five years.
Data cleansing filters out noise, removes outliers, or applies various averages to gain a better overall perspective of
data. It means zoning in on the signal by filtering out the noise. The process of time series analysis removes all the
noise and allows businesses to truly get a clearer picture of what is happening day-to-day.
Provides Understanding of Data
The models used in time series analysis do help to interpret the true meaning of the data in a data set, making life
easier for data analysts. Autocorrelation patterns and seasonality measures can be applied to predict when a certain
data point can be expected. Furthermore, stationarity measures can gain an estimate of the value of said data point.
Forecasting Data
Time series analysis can be the basis to forecast data. Time series analysis is inherently equipped to uncover
patterns in data which form the base to predict future data points. It is this forecasting aspect of time series analysis
that makes it extremely popular in the business area. Where most data analytics use past data to retroactively gain
insights, time series analysis helps predict the future. It is this very edge that helps management make better business
decisions.
•Disadvantages of Time Series Analysis
•Time series analysis is not perfect. It can suffer from generalization
from a single study where more data points and models were
warranted. Human error could misidentify the correct data model,
which can have a snowballing effect on the output.
•It could also be difficult to obtain the appropriate data points. A major
point of difference between time-series analysis and most other
statistical problems is that in a time series, observations are not always
independent.
Time series and Trend analysis
A time series consists of a set of observations measured at specified,
usually equal, time interval.
Trend Seasonal
1 2 3 4 5 6 7 8 9 10 11 12 13
Year
7000
6000
5000
3000
2000
1000
0
4 5 6 7 7 8 9 0 1 2 2 3 4 5 6 7 7 8 9 0 1 2 2 3 4 5 6 7 7 8
1 98 198 198 198 198 198 198 199 199 199 199 199 199 199 199 199 199 199 199 200 200 200 200 200 200 200 200 200 200 200
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
08 06 04 02 12 10 08 06 04 02 12 10 08 06 04 02 12 10 08 06 04 02 12 10 08 06 04 02 12 10
3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/ 3/
.
Look out
While trend estimates are often reliable, in some instances
the usefulness of estimates is reduced by:
•.
This graph shows the amazing trend of the $A vs $UA during an 18 hour period on
November 8, 2000
2. Seasonal Variation
Examples include:
• Air conditioner sales in Summer
• Heater sales in Winter
• Flu cases in Winter
• Airline tickets for flights during school vacations
Monthly Retail Sales in NSW Retail
Department Stores
3. Cyclical variation
Example include:
• Floods
• Wars
• Changes in interest rates
• Economic depressions or recessions
• Changes in consumer spending
Cyclical variation
This chart represents an economic cycle, but we know
it doesn’t always go like this. The timing and length of
each phase is not predictable.
4. Irregular variation
It usually occurs randomly and may be linked to events that also occur
randomly.
•Data Stream is a continuous, fast-changing, and ordered chain of data transmitted at a very
high speed. It is an ordered sequence of information for a specific interval. The sender’s data is
transferred from the sender’s side and immediately shows in data streaming at the receiver’s
side. Streaming does not mean downloading the data or storing the information on storage
devices.
•Sources of Data Stream
•There are so many sources of the data stream, and a few widely used sources are listed below:
•Internet traffic
•Sensors data
•Real-time ATM transaction
•Live event data
•Call records
•Satellite data
•Audio listening
•Watching videos
•Real-time surveillance systems
•Online transactions
4.3 Characteristics of Data Stream in Data Mining
Data Stream in Data Mining should have the following characteristics:
Continuous Stream of Data: The data stream is an infinite continuous stream resulting in
big data. In data streaming, multiple data streams are passed simultaneously.
Time Sensitive: Data Streams are time-sensitive, and elements of data streams carry
timestamps with them. After a particular time, the data stream loses its significance and is
relevant for a certain period.
Data Volatility: No data is stored in data streaming as It is volatile. Once the data
mining and analysis are done, information is summarized or discarded.
Concept Drifting: Data Streams are very unpredictable. The data changes or evolves with
time, as in this dynamic world, nothing is constant.
Data Stream is generated through various data stream generators. Then, data mining
techniques are implemented to extract knowledge and patterns from the data streams.
Therefore, these techniques need to process multi-dimensional, multi-level, single pass, and
online data streams.
Incremental Feature Selection: Grafting incrementally selects features one at a time, taking into account their contributions to
the classifier's performance.
Adaptive Feature Selection: It dynamically adjusts the set of selected features as new data arrives, ensuring that only the most
relevant features are retained.
Efficiency: Grafting is efficient because it avoids exhaustive search over feature subsets and only evaluates the utility of adding
or removing one feature at a time.
Performance Improvement: By selecting informative features during the learning process, Grafting aims to improve
classification accuracy while potentially reducing the computational complexity of the model.
Thresholds: The algorithm relies on a predefined threshold for evaluating whether adding a feature is beneficial. This
threshold can be set based on domain knowledge or through cross- validation.
Grafting is particularly useful in scenarios where you have a large number of features and limited computational resources or
when dealing with data streams where the feature set may evolve over time. It strikes a balance between maintaining model
performance and reducing feature dimensionality, which can be beneficial for both efficiency and interpretability of machine
learning models. Keep in mind that the specific implementation and parameter settings of the Grafting Algorithm may vary
depending on the machine learning framework and problem domain.
2 The Alpha-Investing algorithm
It is a statistical method used for sequential hypothesis testing, primarily in the context of multiple hypothesis testing or feature selection. It was introduced as an enhancement
to the Sequential Bonferroni method, aiming to control the Family-Wise Error Rate (FWER) while being more powerful and efficient in adaptive and sequential settings.
Here's a high-level overview of the Alpha-Investing algorithm:
Initialization: Start with an empty set of selected hypotheses (features) and set an initial significance level (alpha). This alpha level represents
the desired FWER control and guides the decision-making process.
Sequential Testing: As you encounter new hypotheses (features) or updates to existing ones, perform hypothesis tests (e.g., p-value tests) to
assess their significance. The tests are often related to whether a feature is associated with an outcome of interest.
Alpha Update: After each hypothesis test, update the alpha level dynamically based on the test results and the number of hypotheses tested
so far. Alpha-Investing adjusts the significance level to maintain FWER control while adapting to the increasing number of tests.
Decision Rules: Make decisions on whether to reject or retain each hypothesis based on the adjusted alpha level. Common decision rules include
rejecting a hypothesis if its p-value is less than the current alpha.
Continue or Terminate: Continue the process as long as you encounter new hypotheses or updates to existing ones. You can choose a stopping
criterion, such as reaching a fixed number of hypotheses or achieving a certain level of significance control.
Output: The selected hypotheses at the end of the process are considered statistically significant, and the others are rejected or not selected.
Adaptivity: Alpha-Investing adapts its significance level as more hypotheses are tested. This adaptivity helps maintain better statistical power
compared to fixed significance levels like Bonferroni correction.
FWER Control: It controls the Family-Wise Error Rate, which is the probability of making at least one false discovery among all the hypotheses
tested. This makes it suitable for applications where controlling the overall error rate is critical.
Efficiency: Alpha-Investing is often more efficient than other multiple testing correction methods like Bonferroni correction because it tends to
use higher alpha levels for early tests and lower alpha levels for later tests.
•Selective: It allows for the selection of relevant features or hypotheses from a large pool while controlling the
overall error rate.
•Alpha-Investing is commonly used in fields like bioinformatics, genomics, finance, and any domain where multiple
hypothesis testing or feature selection is necessary and maintaining a strong control over the FWER is important.
It offers a balance between adaptivity and statistical rigor, making it a valuable tool in the data analysis toolkit.
3 The Online Streaming Feature Selection Algorithm
The feature selection in the context of data streams and online learning often involves adapting traditional
feature selection methods to handle streaming data.
Here is a conceptual outline of how feature selection can be performed in an online streaming setting:
Data Stream Ingestion: Start by ingesting your streaming data,which arrives continuously over
time. This data can be in the form of individual instances or mini-batches.
Initialization: Initialize your feature selection process by setting up the necessary data structures and variables.
•Termination: Decide on a stopping criterion for the feature selection process. This could be a fixed time duration, a certain number of
data points processed, or a change in model performance.
•Final Feature Set: The selected features at the end of the streaming feature selection process are considered the final set for modelling
or analysis.
It's important to note that the exact algorithm and methodology used for feature selection in a streaming context can vary based on
the specific problem, data, and goals. The choice of feature importance measure, update frequency, and stopping criteria should be tailored
to your particular application.
•Unsupervised streaming feature selection in social media data presents a unique set of challenges and opportunities. Unlike traditional
feature selection in batch data, where you have a fixed dataset, social media data arrives continuously, often with varying topics, trends,
and user behaviour. Here's an approach to unsupervised streaming feature selection in social media:
•Data Ingestion:
•Stream social media data from platforms like Twitter, Facebook, or Instagram.
•Online Clustering:
•Implement an online clustering algorithm like Online K-Means, Mini-Batch K-Means, or DBSCAN.
•Cluster the incoming data based on the extracted features. The number of clusters can be determined using heuristics or adaptively based
on data characteristics.
Feature Ranking and Selection:
•Rank features within each cluster based on their importance scores. Select a fixed number or percentage of top-ranked features from each
cluster. Alternatively, you can use dynamic thresholds to adaptively select features based on their importance scores within each cluster.
Dynamic Updating:
•Continuously update the clustering and feature selection process as new data arrives.
•Periodically recluster the data to adapt to changing trends and topics in social media discussions.
•Use unsupervised evaluation metrics such as silhouette score, Davies-Bouldin index, or within-cluster sum of squares to assess the quality
of clusters and feature importance.
•Incorporate anomaly detection techniques to identify unusual or emerging topics or trends in the data. Anomalies may indicate the need for
adaptive feature selection.
Modeling or Analysis:
•Utilize the selected features for various downstream tasks such as sentiment analysis, topic modeling, recommendation systems, or
anomaly detection.
Regular Maintenance:
•Regularly review and update the feature selection process as the social media landscape evolves. Consider adding new features or
modifying existing ones based
•Unsupervised streaming feature selection in social media data requires a flexible and adaptive approach due to the dynamic nature of social
media content. It aims to extract relevant features that capture the current themes and trends in the data without requiring labeled
training data. Keep in mind that the choice of clustering algorithm and feature importance metric should be tailored to your specific
social media data and objectives.
•Non-linear methods for streaming feature construction are essential for extracting meaningful patterns and representations from streaming
data where the relationships among features may not be linear. These methods transform the input data into a new feature space, often with
higher dimensionality, to capture complex and non-linear relationships that may exist in the data. Here are some non-linear methods
commonly used for streaming feature construction:
•Kernel Methods:
•Kernel Trick: Apply the kernel trick to transform data into a higher-dimensional space without explicitly computing the feature vectors.
Common kernels include the Radial Basis Function (RBF) kernel and polynomial kernels.
•Online Kernel Methods: Adapt kernel methods to streaming data by updating kernel matrices incrementally as new data arrives. Online kernel
principal component analysis (KPCA) and online kernel support vector machines (SVM) are examples.
•Neural Networks:
•Deep Learning: Utilize deep neural networks, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), for
feature extraction. Deep architectures can capture intricate non-linear relationships in the data.
•Online Learning: Implement online learning techniques to continuously update neural network parameters as new data streams in. This
enables real-time feature construction.
•Autoencoders:
•Variational Autoencoders (VAEs): VAEs can be used to learn non-linear representations and reduce dimensionality. They are useful for capturing
latent variables and complex patterns in streaming data.
Online Autoencoders: Design autoencoders that update their weights as new data arrives, allowing them to adapt to changing
data distributions.
Manifold Learning:
t-Distributed Stochastic Neighbor Embedding (t-SNE): t-SNE is a dimensionality reduction technique that can reveal non-
linear relationships in high-dimensional data. It can be adapted to streaming data by updating the t-SNE embedding.
Isomap:
Isomap is another manifold learning method that can be used for non-linear feature construction in streaming data by
incrementally updating the geodesic distances between data points.
Random Features:
Random Fourier Features: Use random Fourier features to approximate kernel methods' non-linear transformations in a
computationally efficient manner. This can be suitable for streaming data when kernel-based methods are too slow.
Locally Linear Embedding (LLE) and Spectral Embedding: These dimensionality reduction techniques aim to preserve
local relationships, making them suitable for capturing non-linear structures in data streams.
Feature Mapping:
Apply non-linear feature mappings, such as polynomial expansions or trigonometric transformations, to create new features
that capture non-linear relationships among the original features.
Ensemble Techniques:
•Online Clustering and Density Estimation:
•Clustering and density estimation methods, such as DBSCAN and Gaussian Mixture Models (GMM), can be used to create features that represent the
underlying non-linear structures in streaming data.
•When selecting a non-linear feature construction method for streaming data, consider factors such as computational efficiency, scalability, and the adaptability
of the method to evolving data distributions. The choice of method should align with the specific characteristics and requirements of your streaming data
application.
•Locally Linear Embedding (LLE) is a dimensionality reduction technique commonly used for nonlinear manifold learning and feature extraction. While it was
originally developed for batch data, it can be adapted for data streams with some modifications and considerations. Here's an overview of how LLE can be
applied to data streams:
Ingest the streaming data and preprocess it as it arrives, including cleaning, normalization, and transformation.
2. Sliding Window:
Implement a sliding window mechanism to maintain a fixed-size buffer of the most recent data points. This buffer will be used for performing LLE on
the data stream.
For each incoming data point, determine its local neighborhood by considering a fixed number of nearest neighbors within the sliding window.
Construct local linear models for each data point based on its neighbors. This involves finding weights that best reconstruct the data point as a
linear combination of its neighbors.
1. Local Reconstruction Weights:
Calculate the reconstruction weights for each data point in the local neighborhood. These weights represent the contribution of each neighbor to
the reconstruction of the data point.
2. Global Embedding:
Combine the local linear models and reconstruction weights to compute a global embedding for the entire dataset. This embedding represents the
lower-dimensional representation of the data stream.
3. Continuous Update:
Continuously update the sliding window and recompute the LLE embedding as new data points arrive. The old data points are removed
from the window, and the new ones are added.
4. Memory Management:
Manage memory efficiently to ensure that the sliding window remains within a predefined size limit. You may need to adjust the window size
dynamically based on available memory and computational resources.
5. Hyperparameter Tuning:
Tune hyperparameters such as the number of nearest neighbors, the dimensionality of the embedding space, and any regularization terms
based on the specific characteristics of your data stream.
Periodically evaluate the quality of the LLE embedding using appropriate metrics, such as reconstruction error or visual inspection. Monitoring the
quality helps ensure that the embedding captures meaningful patterns in the data stream.
7. Application:
Use the lower-dimensional representation obtained through LLE for downstream tasks such as clustering, visualization, or classification,
depending on your specific objectives.
•Adapting LLE to data streams requires careful management of the sliding window and efficient computation of the local linear models.
Additionally, choosing an appropriate neighborhood size and dimensionality for the embedding is crucial for achieving meaningful results.
Consider the computational resources available and the real-time constraints of your application when implementing LLE for data streams.
•Kernel learning for data streams is an area of machine learning that focuses on adapting kernel methods, which are originally
designed for batch data, to the streaming data setting. Kernel methods are powerful techniques for dealing with non- linear relationships and
high-dimensional data. Adapting them to data streams requires efficient processing and storage of data as it arrives in a sequential and
potentially infinite manner. Here are some key considerations and techniques for kernel learning in data streams:
Traditional kernel methods, such as Support Vector Machines (SVM) and Kernel Principal Component Analysis (KPCA), can be
adapted to data streams using online learning techniques.
Online SVM and Online KPCA algorithms update model parameters incrementally as new data arrives.
A key challenge in kernel methods for data streams is efficiently updating the kernel matrix as new data points arrive. Techniques
like the Nyström approximation and random Fourier features can be employed to approximate kernel matrices and update
them incrementally.
3. Memory Management:
Efficiently manage memory to ensure that the kernel matrix doesn't grow too large as data accumulates. This may involve storing
only a subset of the most recent data points or employing methods like forgetting mechanisms.
1. Streaming Feature Selection:
Apply feature selection techniques to the input data to reduce dimensionality before applying kernel methods. This can help in
maintaining computational efficiency.
Tune kernel hyperparameters (e.g., the kernel width or the regularization parameter in SVM) adaptively based on the streaming data to maintain model
performance.
Monitor the data stream for concept drift, which occurs when the data distribution changes over time. When drift is detected, consider retraining or
adapting the kernel model.
4. Kernel Approximations:
Use kernel approximations such as Random Kitchen Sinks or Fastfood to approximate kernel operations with linear time complexity, making them
suitable for streaming data.
Utilize parallel or distributed computing frameworks to handle large-scale streaming data and kernel computations efficiently.
Consider ensemble methods like Online Random Forest or Online Boosting, which combine multiple models with kernels to adapt to
changing data.
Continuously monitor the performance of the kernel learning model using appropriate evaluation metrics, such as classification accuracy, mean
squared error, or others relevant to your task.
1. Resource Constraints:
Adapt your kernel learning approach to resource constraints, such as processing power and memory, which may be limited in
streaming environments.
•Kernel learning for data streams is an active area of research, and various algorithms and techniques have been proposed to address the unique
challenges posed by streaming data. The choice of approach should be based on the specific requirements and constraints of your streaming data
application.
•Using neural networks for data streams, where data arrives continuously and in a potentially infinite sequence, presents unique challenges and
opportunities. Neural networks are powerful models for various machine learning tasks, including classification, regression, and sequence
modelling. Adapting them to data streams requires specialized techniques to handle the dynamic nature of the data. Here's an overview of
considerations when using neural networks for data streams:
1. Online Learning:
Implement online learning techniques, also known as incremental or streaming learning, where the neural network is updated
incrementally as new data arrives. This is crucial for maintaining model performance in a changing data distribution.
2. Sliding Window:
Use a sliding window mechanism to manage the memory and computational resources. Maintain a fixed-size window of the most recent data
points for training and updating the model.
3. Model Architecture:
Choose neural network architectures that are amenable to online learning. Feedforward neural networks (multilayer perceptrons),
recurrent neural networks (RNNs), and online versions of convolutional neural networks (CNNs) can be adapted for data streams.
1. Mini-Batch Learning:
Train neural networks in mini-batches as new data points arrive. This helps in utilizing efficient gradient descent algorithms, such as stochastic
gradient descent (SGD) or variants like ADAM, RMSprop, and AdaGrad.
Implement mechanisms to detect concept drift, which occurs when the data distribution changes over time. When drift is detected, consider
retraining or adapting the neural network.
3. Memory-efficient Models:
Explore memory-efficient neural network architectures designed for streaming data, such as online memory networks, which adapt to the
limited memory capacity of the sliding window.
4. Feature Engineering:
Perform feature engineering to extract relevant information from the data stream. Preprocessing steps like text tokenization, feature scaling,
or dimensionality reduction may be necessary.
5. Regularization:
Apply regularization techniques, such as dropout or weight decay, to prevent overfitting, especially when data is limited in the sliding window.
6. Hyperparameter Tuning:
Tune hyperparameters adaptively based on the streaming data, such as learning rates or network architectures.
7. Ensemble Methods:
Consider ensemble techniques that combine multiple neural networks or models to improve robustness and adaptability in the presence of
concept drift.
1. Model Evaluation:
Continuously monitor and evaluate the neural network's performance using appropriate evaluation metrics relevant to your task,
such as accuracy, F1-score, or mean squared error.
Incorporate anomaly detection methods, including neural network-based approaches, to identify unusual or unexpected patterns in
the data stream.
Utilize parallel or distributed computing frameworks to handle the computational load when processing large-scale data streams.
4. Resource Constraints:
Adapt your neural network approach to resource constraints, such as processing power and memory, which may be limited in
streaming environments.
•Adapting neural networks to data streams is an active area of research, and various approaches, architectures, and libraries are available to
address the challenges of streaming data. The choice of approach should be tailored to the specific requirements of your streaming data
application.
•In this subsection, we review feature selection with streaming instances where the set of features is fixed, while new instances are
consistently and continuously arriving.
each iteration, it keeps adding on a feature and evaluates the performance to check whether it is improving the
performance or not. The process continues until the addition of a new variable/feature does not improve the
performance of the model.
• Backward elimination - Backward elimination is also an iterative approach, but it is the opposite of forward
selection. This technique begins the process by considering all the features and removes the least significant
feature. This elimination process continues until removing the features does not improve the performance of the
model.
• Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature selection methods,
which evaluates each feature set as brute-force. It means this method tries & make each possible combination of
features and return the best performing feature set.
•Recursive Feature Elimination
•Recursive feature elimination is a recursive greedy optimization approach, where features are selected by
recursively taking a smaller and smaller subset of features. Now, an estimator is trained with each set of
features, and the importance of each feature is determined using coef_attribute or through a
feature_importances_attribute.
•Filter Methods
•In Filter Method, features are selected on the basis of statistics measures. This method does not
depend on the learning algorithm and chooses the features as a pre-processing step. The filter
method filters out the irrelevant feature and redundant columns from the
•model by using different metrics through ranking. The advantage of using filter methods is that it
needs low computational time and does not overfit
•Some common techniques of Filter methods are as follows:
•Information Gain
•Chi-square Test
•Fisher's Score
•Missing Value Ratio