Unit V
Unit V
CLASS : I M.SC CS
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
Unit V :
Looking Inside Machine Learning: The Impact of Machine Learning on Applications - Data
Preparation-The Machine Learning Cycle.
Welcome to the exciting world of machine learning! In this digital age, where data is
abundant and technology continues to evolve at a rapid pace, machine learning has
emerged as a game-changing force across various industries. From healthcare and
finance to marketing and transportation, the impact of machine learning can be seen far
and wide.
But what exactly is machine learning? How does it work? And why is it so important in
today’s world? In this blog post, we will delve into these questions and explore the
fascinating applications of machine learning in different sectors. So, fasten your
seatbelts as we embark on a journey through the realm of artificial intelligence!
But before we dive deeper into its applications, let’s start by understanding what
machine learning really means. At its core, machine learning is a branch of artificial
intelligence that enables computer systems to learn from data without being explicitly
programmed. It leverages algorithms and statistical models to analyze vast amounts of
information, identify patterns, make predictions or decisions based on those patterns.
With the exponential growth in data generation over recent years – thanks to
advancements in technology – traditional methods of analysis have become inadequate
for extracting meaningful insights from complex datasets. That’s precisely where
machine learning steps in: by automating analytical model building processes and
continuously improving their performance with experience.
The importance of harnessing the power of machines that can learn cannot be
overstated. Machine learning allows organizations to leverage their existing data
resources more effectively while enabling them to uncover hidden patterns or
correlations that were previously unrecognized. This empowers businesses with
valuable insights that drive informed decision-making leading to increased efficiency,
productivity gains, cost savings, improved customer experiences – ultimately giving
them a competitive edge.
Now that you have an idea about what exactly machine learning entails let’s move
forward towards exploring various types of ML methodologies available!
Machine learning is a field of artificial intelligence that allows computers to learn and
make predictions or decisions without being explicitly programmed. It works by
analyzing data, identifying patterns, and using algorithms to create models that can be
used for future tasks.
Machine learning encompasses various types, each with its unique approach.
Supervised learning utilizes labeled data to make predictions. Unsupervised learning
identifies patterns in unlabeled data. Semi-supervised learning combines both
approaches, while reinforcement learning focuses on training models through trial and
error.
Choosing and building the right machine learning model requires careful consideration
of various factors such as data size, complexity, and desired outcome. It involves
assessing different algorithms and techniques to find the best fit for your specific use
case.
Machine learning algorithms and techniques are the building blocks of intelligent
systems. From supervised learning to reinforcement learning, each approach has its
unique way of extracting insights from data. Let’s explore how these methods drive
innovation across industries!
Supervised machine learning works by using labeled data to train an algorithm. The
algorithm learns patterns and relationships between the input features and their
corresponding output labels, enabling it to make predictions on new, unseen data
accurately.
Semi-supervised learning
How does semi-supervised learning work? It combines elements of both supervised and
unsupervised learning, using a small labeled dataset along with a larger unlabeled
dataset. This allows the algorithm to learn from limited labeled data while leveraging the
vast amount of unlabeled data available.
Reinforcement learning
Training and optimizing ML models involves feeding data into algorithms, refining them
through iterations, and fine-tuning parameters to maximize performance. It’s an ongoing
process that requires continuous evaluation and improvement for optimal results.
Machine learning applications for enterprises are vast. From customer segmentation
and personalized marketing to predictive maintenance and fraud detection, ML is
revolutionizing the way businesses operate. It enhances decision-making processes,
optimizes operations, and drives innovation across various industries.
Machine learning has found its way into various industries, revolutionizing processes
and driving innovation. In healthcare, it aids in disease diagnosis and personalized
treatments. In finance, it helps detect fraud and predict market trends. And in
manufacturing, it optimizes production efficiency and quality control. The applications
are endless!
As machine learning becomes more prevalent, ethical considerations arise. Issues such
as data privacy, bias and discrimination, and accountability must be addressed to
ensure responsible use of AI technology.
Technological singularity
AI Impact on Jobs
The rise of AI has sparked concerns about job displacement and automation. However,
it’s important to remember that while some roles may change or be replaced, new
opportunities will also arise as AI technology continues to evolve.
Privacy
Privacy concerns are a significant aspect of machine learning. As more data is collected
and analyzed, questions arise about the protection of personal information.
Safeguarding privacy must be a priority to ensure ethical and responsible use of
machine learning technologies.
Bias and discrimination are significant concerns when it comes to machine learning.
Algorithms can inadvertently perpetuate biases present in the data, leading to unfair
outcomes for certain groups of people. It is crucial to address and mitigate these issues
for a more equitable future.
Accountability
Artificial Intelligence makes it easy to learn about people’s behavior that helps in
building a highly personalized experience for them. If you wonder about how you can
apply this kind of new technology in the nearest future, and start achieving long-term
benefits from it, then I will try to help you. For this time, I want to focus your attention
especially on AI Machine Learning as well as machine learning applications. We will
together discuss some tips on how to use ML (machine learning) in mobile applications
and how important it is for technologies in different industries. Let’s start!
Data mining allows big data analysis and helps to discover useful patterns and
connections within significant data sets. It consists of data storage, maintenance, and
data analysis. Machine Learning provides not only a set of tools but also the necessary
learning algorithms that help to find all the possible connections within data sets.
Imagine, you want to develop a mobile application for the travel industry or already
have. If you have appropriate traffic, then probably a ton of people daily use it. To speak
clearly, it is impossible for human power both to analyze all the possible variations and
to identify complicated customer behavior patterns.
Therefore, you can gather all the data about your customers, including:
Once all these data are collected in your database, it is time to apply Machine Learning.
As a result, you can analyze the data and receive valuable insights related to your
mobile app users. For example, you might learn characteristic features of people under
a specific age and who live in a specific geographical location. This helps you organize
a strategy, show the users very personalized offers, and reach better results. You can
build a general user test, and figure out the targeted destination that will increase the
conversion for users.
Above we learned that in the case of machine learning, we deal with terms like data,
model, training, decision, and experience. ML programs are fed a huge amount of data
to train. In the training stage, it learns the rules of the problem and gathers experience.
Due to the experience, it becomes easier to make a decision when a new problem
arises. On the other hand, while working on statements related to new data and
problems, it adapts to the new situations. We may say that like humans, it “learns while
working”.
A very important aspect of creating an ML system is the process of building and training
the models. These models cover collections of pre-processed data and chosen
algorithms that work well on that data to figure out the output. Sometimes, it is a
complex procedure to create a model.
In some general cases, we start with the fundamental rules and chosen algorithms. The
system is to be fed with a lot of useful data. The data is the previous entries we have
collected in the system. With the help of this data, first, we create several “candidate
models” that are to be used and tried on new datasets, while their performance is
observed. According to the results of the success rate, the better model is chosen and
deployed to deliver fresh use cases. Due to this new incoming data, the chosen model
takes a decision and collects more experience as well as adapts itself to the ever-
expanding use cases.
This stage is the preparation of a plan for AIML integration. Firstly, there is a need to
decide how much you tend to acquire from the integration. Doing things one by one will
be even better.
However, if you have enough budget, you can implement the integration of all changes
at once. Once you already identified the main inclusion and improvements in your app
and estimated your financial abilities, then it’s time to prioritize what requirements are
more essential to be done first.
This stage offers a feasibility test that helps to understand whether or not the future
implementations are going to benefit the business, as well as improve the user
experience or increase engagement. A successful update is the one that makes the
existing users happy and attracts more people towards the products. If an update is
increasing your business efficiency, then there is no point in putting in money for it.
3. Involve AI ML Experts
One of the most important things is to choose the resources, that will carry out the
development and up-gradation process. Once you don't work with the right specialists,
accomplishing your expectations becomes more difficult. Accordingly, you should wisely
make the selections.
5. Implementation
The most critical point on this stage is to carefully deploy and test the implementations
before making all the changes live. While adding AIML capabilities in your app, an
important suggestion is to consider putting a strong analytics system in place. Such an
approach helps you analyze the impact of new integration and get insights for future
decisions.
Technologies and digital solutions that back your application must be chosen right. Your
security tools, data storage aids, optimization solutions, backup software, and many
more techniques should be strong and robust, which will help to keep your application
consistent. Without this, an extreme decline may take place in performance.
How to Understand ML
1. Supervised learning — In this case, we feed the machine learning algorithms with a
great amount of data that is labeled (for instance, marked with the results). Such data
includes the segments or labels attached to the entries. In supervised learning, we lead
the system on how to recognize any new input according to the data we have already
delivered.
2. Unsupervised learning — here the data is neither labeled nor classified. In this case,
the system is not aware of the success or failure of the outcome as it doesn’t have any
possible guidance. In the case of unsupervised learning, the system itself tries to sort
available data and bring patterns based on the given information. Then the system
stores these patterns. In this case, when a new input appears, it should match with the
already stored patterns as well as assign the chosen pattern on it.
The main reason that data preparation processes are applied to raw data is to ensure
that the information is of good quality. Processing data with business intelligence and
other analytical applications will result in quality output. Raw data is often riddled with
missing values, inaccurate entries, and other mistakes. When multiple data sets are in
play, varying formats can duplicate or ignore values. Therefore, correcting all these
errors, validating their quality, and consolidating data sets is the first step to data
processing.
During data preparation, ensuring that data is ready for analytics serves as a starting
point. This is a base to derive actionable insights necessary for intelligent business
decisions.
An organization’s raw data can be enriched to be more useful in several ways. For
example, it can be done by merging internal and external data sets, or data sets can be
balanced. Data preparation is regularly used by business intelligence and data
management teams to streamline the process of analysis and ensure seamless self-
service with business intelligence applications.
For data scientists, the most significant benefit of data preparation is that they can
spend their time doing their job of analyzing and mining instead of cleaning and
structuring data. Prepared data can be fed instantly to multiple users for deployment in
various recurring analytics procedures. There is a range of other benefits of data
preparation:
It ensures that data utilized in any analytical application is clean and capable of
providing reliable output.
Data preparation helps identify potential problem areas with data that would otherwise
go undetected.
It prevents any repetition of data when preparing it for use across applications.
It helps a business ensure a better return on investment from business intelligence and
its analytics initiatives.
Data preparation should be the norm, particularly in big data environments where
information is stored in multiple ways. These storage facilities, such as data lakes, can
store data in structured, unstructured, or semi-structured forms. This means that data
often remains in its natural condition until it is used for a specific analytical purpose.
These can be predictive analytics or machine learning or can even be further advanced
methods that require large swathes of data.
With data and its processes increasingly moving to the cloud, data preparation also
makes a move simultaneously. These benefits can be huge.
Better scalability
When data preparation moves to the cloud, it scales with the business. A business need
not worry about scaling up its infrastructure or planning for the long-term evolution
requirements of the company.
Future-ready
With cloud-based data preparation, any upgrades to capabilities and patch fixes for
bugs are incorporated and functioning when released into the market. For businesses,
this means staying ahead of the curve with innovations. It also avoids delays in going to
need and any additional costs that may be incurred.
When data preparation happens in the cloud, it remains an ongoing process. It does not
need technical installations and facilitates better teamwork for quicker output. With
quality, cloud-native data preparation tools, a business can also benefit from intuitive
graphical user interfaces that ease the process of data preparation.
Here are six points to ensure that the organization has the ideal start to data
preparation. An organization will likely need to hire data scientists or expert analysts to
help them with this step.
It’s best to look at data preparation and analysis as closely related. An organization
cannot prepare its data well unless they know what kind of analytics it is prepped for.
Knowing this from the start sets a suitable base for data preparation.
Set goals for data preparation. When users know what accuracy levels are needed and
what quality metrics are desirable, management can arrive at a projected cost for the
project. This will help create a plan for every use case the organization has.
Prioritize data sources based on the analytical processes that are planned. Ensure that
all differences are resolved when multiple sources bring in all data. This forms an
important starting point for preparing data.
Evaluate the skills and tools on hand for the job of data preparation. Self-service data
preparation tools are often believed to be the only available option. Several other
devices and technologies work in tandem with existing skill sets and data requirements.
Keep a close watch on data preparation costs. There is much expense involved in
obtaining licenses, processing data, and storage resources an organization will need.
Knowing estimates and providing a workable leeway is essential to keeping the process
within a company’s budget.
There are several steps to produce the best possible outcomes for the data preparation
process.
Access
Every organization has various sources of business data. Some of these sources can
be data from endpoints, customers, the marketing department, and other sources that
are associated with these domains. The first step to data preparation is identifying all
the required data and their related repositories. The identification must include all
necessary sources for the kind of analysis an organization has in mind. The
organization must have a plan that outlines all the questions that require answers from
the planned data analysis.
Ingest
When the data is identified, it is introduced into the analysis tools. The information on
hand will probably be a mix of structured and semi-structured data located across
different repositories. An important step is bringing all of the data from various
repositories into one. Access and ingest are flexible, as steps vary mainly depending on
the requirement. These two data preparation steps require business and technological
expertise and are ideally handled by a small, efficient team.
Cleanse
Cleansing data is done to ensure that the data set that is being worked with provides
accurate answers when it is analyzed. Small data sets can be analyzed manually, but
larger ones will require automation, the readily available software tools. Data engineers
use applications coded in the Python language where custom processing is needed.
Ingested data can have its share of problems–values can go missing or out of range,
and there can be nulls or whitespaces where there should be values.
Format
Cleansing of data is followed by formatting. At this stage, issues such as varying data
formats and abbreviations are addressed. Any data variables deemed unnecessary for
analysis will be deleted from the data set. This is the stage of data preparation that is
best automated. Both cleansing and formatting are ideally saved as a repetitive formula
that data scientists and professionals can apply to similar data sets any time they need.
For example, if a company requires a monthly assessment of marketing and support
data, the sources are most likely to remain the same and require the same kind of
cleansing and formatting each time. Having a saved formula helps move things faster.
As a process, data preparation is complicated. Data sets created from several source
systems will have multiple quality, accuracy, and consistency issues that need to be
addressed. The data will also have to be reworked to make it user-friendly and remove
all irrelevant data. This can be a long-drawn process.
Here are seven main challenges that are often seen with data preparation:
There can be insufficient or inadequate data profiling. When data is not profiled
correctly, it can lead to several errors, anomalies, and issues that can result in poor
results during analytics.
Without proper data profiling, an organization can have missing or incomplete data.
Missing values is just one form of incomplete data. There are several more that need to
be addressed right from the beginning.
Data sets can also contain invalid values. This can result from spelling errors, typos, or
the wrong numbers input. These invalid entries must be spotted early on and fixed to
ensure analytical accuracy.
When data sets are brought together, name and address standardization is a must.
Often these details are stored in various systems in different formats. If not corrected,
they can affect the way the information is viewed.
There are many other inconsistencies in data that users will find across enterprise
systems. These inconsistencies can happen from any one of the multiple source
systems that are worked with. Differences can be related to terminology, specific
identifiers, and the like, which can challenge data preparation.
While data enrichment is needed, knowing what to add to it can be complex and will
require solid skills and business analytics skills.
Always ensure that the organization has a clear understanding of the data consumer.
Who is the end-user of the data, and what information are they looking for with these
sources?
Know where the data is coming from and the sources that generated it.
Never get rid of the raw data. With the raw data, a data engineer can always recreate
data transformations. Also, never move data or delete it after it has been saved.
If possible, always store all data and their raw and processed results. Also, know the
compliance laws of the region the organization operates in.
Document every stage of the data pipeline. Make versions of the data, the analysis
codes, and the application that transforms the information.
Always make sure that there are clear demarcations between online and offline
analyses. This is to prevent the ingest step from impacting any user-related services.
Constantly monitor data pipelines for any inconsistency found in data sets.
Bring in a proactive form of data governance. Because information technology
constantly requires security and compliance, ensure the presence of governance
capabilities such as data masking or retention, lineage, and any role-based
permissions.
Work on creating a data preparation pipeline. The best way to do this is to understand
the data well with consumer needs and then create a workable data preparation
pipeline.
With quality data preparation, organizations can be confident that their data for any
process or gathering insight will help. It instills confidence in the process, the system,
and the output, which can be highly beneficial for any business.
Machine learning has given the computer systems the abilities to automatically learn
without being explicitly programmed. But how does a machine learning system work?
So, it can be described using the life cycle of machine learning. Machine learning life
cycle is a cyclic process to build an efficient machine learning project. The main
purpose of the life cycle is to find a solution to the problem or project.
Machine learning life cycle involves seven major steps, which are given below:
Gathering Data
Data preparation
Data Wrangling
Analyse Data
Deployment
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is
to identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the most
important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will
be the prediction.
Collect data
By performing the above task, we get a coherent set of data, also called as a dataset. It
will be used in further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a
step where we put our data into a suitable place and prepare it to use in our machine
learning training.
In this step, first, we put all data together, and then randomize the ordering of data.
Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
Data pre-processing:
3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the next
step. It is one of the most important steps of the complete process. Cleaning of data is
required to address the quality issues.
It is not necessary that data we have collected is always of our use as some of the data
may not be useful. In real-world applications, collected data may have various issues,
including:
Missing Values
Duplicate data
Invalid data
Noise
It is mandatory to detect and remove the above issues because it can negatively affect
the quality of the outcome.
4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step
involves:
Building models
The aim of this step is to build a machine learning model to analyze the data using
various analytical techniques and review the outcome. It starts with the determination of
the type of the problems, where we select the machine learning techniques such as
Classification, Regression, Cluster analysis, Association, etc. then build the model using
prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build the
model.
5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.
We use datasets to train the model using various machine learning algorithms. Training
a model is required so that it can understand the various patterns, rules, and, features.
6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset
to it.
Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.
7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the model
in the real-world system.
If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying
the project, we will check whether it is improving its performance using available data or
not. The deployment phase is similar to making the final report for a project.