0% found this document useful (0 votes)
10 views18 pages

Unit V

AI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views18 pages

Unit V

AI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

DEPARTMENT OF COMPUTER SCIENCE

CLASS : I M.SC CS
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

Unit V :
Looking Inside Machine Learning: The Impact of Machine Learning on Applications - Data
Preparation-The Machine Learning Cycle.

Introduction to Machine Learning

Welcome to the exciting world of machine learning! In this digital age, where data is
abundant and technology continues to evolve at a rapid pace, machine learning has
emerged as a game-changing force across various industries. From healthcare and
finance to marketing and transportation, the impact of machine learning can be seen far
and wide.

But what exactly is machine learning? How does it work? And why is it so important in
today’s world? In this blog post, we will delve into these questions and explore the
fascinating applications of machine learning in different sectors. So, fasten your
seatbelts as we embark on a journey through the realm of artificial intelligence!

But before we dive deeper into its applications, let’s start by understanding what
machine learning really means. At its core, machine learning is a branch of artificial
intelligence that enables computer systems to learn from data without being explicitly
programmed. It leverages algorithms and statistical models to analyze vast amounts of
information, identify patterns, make predictions or decisions based on those patterns.

With the exponential growth in data generation over recent years – thanks to
advancements in technology – traditional methods of analysis have become inadequate
for extracting meaningful insights from complex datasets. That’s precisely where
machine learning steps in: by automating analytical model building processes and
continuously improving their performance with experience.

The importance of harnessing the power of machines that can learn cannot be
overstated. Machine learning allows organizations to leverage their existing data
resources more effectively while enabling them to uncover hidden patterns or
correlations that were previously unrecognized. This empowers businesses with
valuable insights that drive informed decision-making leading to increased efficiency,
productivity gains, cost savings, improved customer experiences – ultimately giving
them a competitive edge.

Now that you have an idea about what exactly machine learning entails let’s move
forward towards exploring various types of ML methodologies available!

What is machine learning and how does it work?

Machine learning is a field of artificial intelligence that allows computers to learn and
make predictions or decisions without being explicitly programmed. It works by
analyzing data, identifying patterns, and using algorithms to create models that can be
used for future tasks.

Why is machine learning important?

Machine learning is important because it enables computers to learn and make


decisions without explicit programming. It has the potential to revolutionize industries by
improving efficiency, accuracy, and decision-making processes.

What are the different types of machine learning?

Machine learning encompasses various types, each with its unique approach.
Supervised learning utilizes labeled data to make predictions. Unsupervised learning
identifies patterns in unlabeled data. Semi-supervised learning combines both
approaches, while reinforcement learning focuses on training models through trial and
error.

How to choose and build the right machine learning model

Choosing and building the right machine learning model requires careful consideration
of various factors such as data size, complexity, and desired outcome. It involves
assessing different algorithms and techniques to find the best fit for your specific use
case.

Machine Learning Algorithms and Techniques

Machine learning algorithms and techniques are the building blocks of intelligent
systems. From supervised learning to reinforcement learning, each approach has its
unique way of extracting insights from data. Let’s explore how these methods drive
innovation across industries!

Supervised machine learning


It’s all about guidance. With labeled data and a clear objective in mind, algorithms are
trained to make predictions or classify new instances. The teacher-student relationship
paves the way for accurate and reliable results.

How does supervised machine learning work?

Supervised machine learning works by using labeled data to train an algorithm. The
algorithm learns patterns and relationships between the input features and their
corresponding output labels, enabling it to make predictions on new, unseen data
accurately.

Unsupervised machine learning

Unsupervised machine learning is a powerful technique that allows computers to


identify patterns and relationships in data without any predefined labels. It enables
businesses to uncover valuable insights and make informed decisions based on the
hidden structure within their datasets.

How does unsupervised machine learning work?

Unsupervised machine learning works by analyzing data without any predetermined


labels or targets. It seeks to identify patterns, relationships, and clusters within the
dataset, allowing for valuable insights and discoveries to emerge organically.

Semi-supervised learning

How does semi-supervised learning work? It combines elements of both supervised and
unsupervised learning, using a small labeled dataset along with a larger unlabeled
dataset. This allows the algorithm to learn from limited labeled data while leveraging the
vast amount of unlabeled data available.

How does semi-supervised learning work?

Semi-supervised learning leverages a combination of labeled and unlabeled data. It


starts with a small amount of labeled data to create initial models, which are then used
to label the remaining unlabeled data. This iterative process helps improve accuracy
and efficiency in training machine learning models.

Reinforcement learning

Reinforcement learning is a type of machine learning that enables algorithms to learn by


trial and error through interacting with an environment. It focuses on maximizing
rewards, making it ideal for applications like robotics, gaming, and autonomous
vehicles. Exciting possibilities lie ahead!
How does reinforcement learning work?

Reinforcement learning is a fascinating branch of machine learning that involves training


an agent to make decisions based on rewards and punishments. Through trial and
error, the agent learns optimal strategies to maximize its cumulative reward. It’s like
teaching a computer how to play a game by letting it explore different moves and learn
from the consequences. The potential applications for reinforcement learning are vast,
ranging from autonomous vehicles to robotics and even healthcare. It’s an exciting field
with endless possibilities!

Training and optimizing ML models

Training and optimizing ML models involves feeding data into algorithms, refining them
through iterations, and fine-tuning parameters to maximize performance. It’s an ongoing
process that requires continuous evaluation and improvement for optimal results.

Applications of Machine Learning in Various Industries

Machine learning is revolutionizing various industries, with applications ranging from


healthcare to finance. It is used in fraud detection, personalized marketing, predictive
maintenance, and more. The possibilities are endless as businesses harness the power
of machine learning to gain a competitive edge.

Machine learning applications for enterprises

Machine learning applications for enterprises are vast. From customer segmentation
and personalized marketing to predictive maintenance and fraud detection, ML is
revolutionizing the way businesses operate. It enhances decision-making processes,
optimizes operations, and drives innovation across various industries.

Machine learning examples in industry

Machine learning has found its way into various industries, revolutionizing processes
and driving innovation. In healthcare, it aids in disease diagnosis and personalized
treatments. In finance, it helps detect fraud and predict market trends. And in
manufacturing, it optimizes production efficiency and quality control. The applications
are endless!

Advantages, Disadvantages, and Ethics of Machine Learning

Advantages, Disadvantages, and Ethics of Machine Learning: Harnessing the power of


machine learning can lead to increased efficiency and accuracy in decision-making.
However, it is important to address concerns such as data privacy, bias, and
accountability. Let’s dive into the intricacies of this transformative technology!
What are the advantages and disadvantages of machine learning?

Advantages: Machine learning enables automation, improves accuracy and efficiency,


identifies patterns and insights, enhances decision-making, and boosts innovation.

Disadvantages: Potential bias, lack of interpretability, data privacy concerns, high


implementation costs and complexity, reliance on quality data.

Importance of human-interpretable machine learning

The Need for Human-Interpretable Machine Learning: Understanding the decisions


made by AI algorithms is crucial for building trust and ensuring transparency. Human-
interpretable machine learning helps us comprehend how AI models arrive at their
conclusions, empowering us to make informed decisions based on reliable insights.

Ethics of machine learning

As machine learning becomes more prevalent, ethical considerations arise. Issues such
as data privacy, bias and discrimination, and accountability must be addressed to
ensure responsible use of AI technology.

Future Trends and Impact of Machine Learning

What is the future of machine learning? As technology continues to advance, we can


expect machine learning to play an even greater role in various industries. From
technological singularity to AI’s impact on jobs, there are many factors to consider.
Privacy concerns and issues of bias and discrimination also come into play.
Accountability will be crucial as we navigate this evolving landscape. Machine learning
is shaping our world, and it’s important for us to stay informed about its potential impact.

What is the future of machine learning?

Machine learning is poised to revolutionize countless industries. As technology


continues to advance, the potential for machine learning applications will only grow,
making our lives more efficient and innovative. Stay tuned for exciting developments in
this rapidly evolving field.

Technological singularity

A hypothetical point in the future where artificial intelligence surpasses human


capabilities, leading to exponential growth and profound societal changes. The potential
impact is both exciting and uncertain, raising questions about our role in a world
dominated by machines.

AI Impact on Jobs
The rise of AI has sparked concerns about job displacement and automation. However,
it’s important to remember that while some roles may change or be replaced, new
opportunities will also arise as AI technology continues to evolve.

Privacy

Privacy concerns are a significant aspect of machine learning. As more data is collected
and analyzed, questions arise about the protection of personal information.
Safeguarding privacy must be a priority to ensure ethical and responsible use of
machine learning technologies.

Bias and discrimination

Bias and discrimination are significant concerns when it comes to machine learning.
Algorithms can inadvertently perpetuate biases present in the data, leading to unfair
outcomes for certain groups of people. It is crucial to address and mitigate these issues
for a more equitable future.

Accountability

Accountability is a crucial aspect of machine learning. As AI systems become more


autonomous, it becomes essential to hold them accountable for their actions and
decisions. Transparency and responsibility are key in ensuring the ethical use of
machine learning technology.

The Impact of Machine Learning on Applications

Artificial Intelligence makes it easy to learn about people’s behavior that helps in
building a highly personalized experience for them. If you wonder about how you can
apply this kind of new technology in the nearest future, and start achieving long-term
benefits from it, then I will try to help you. For this time, I want to focus your attention
especially on AI Machine Learning as well as machine learning applications. We will
together discuss some tips on how to use ML (machine learning) in mobile applications
and how important it is for technologies in different industries. Let’s start!

Data mining allows big data analysis and helps to discover useful patterns and
connections within significant data sets. It consists of data storage, maintenance, and
data analysis. Machine Learning provides not only a set of tools but also the necessary
learning algorithms that help to find all the possible connections within data sets.

Imagine, you want to develop a mobile application for the travel industry or already
have. If you have appropriate traffic, then probably a ton of people daily use it. To speak
clearly, it is impossible for human power both to analyze all the possible variations and
to identify complicated customer behavior patterns.
Therefore, you can gather all the data about your customers, including:

their gender and location,

Facebook connected accounts,

how they fill out their profile,

how often they visit your app,

how often they go on vacation, etc.

Once all these data are collected in your database, it is time to apply Machine Learning.
As a result, you can analyze the data and receive valuable insights related to your
mobile app users. For example, you might learn characteristic features of people under
a specific age and who live in a specific geographical location. This helps you organize
a strategy, show the users very personalized offers, and reach better results. You can
build a general user test, and figure out the targeted destination that will increase the
conversion for users.

Above we learned that in the case of machine learning, we deal with terms like data,
model, training, decision, and experience. ML programs are fed a huge amount of data
to train. In the training stage, it learns the rules of the problem and gathers experience.
Due to the experience, it becomes easier to make a decision when a new problem
arises. On the other hand, while working on statements related to new data and
problems, it adapts to the new situations. We may say that like humans, it “learns while
working”.

A very important aspect of creating an ML system is the process of building and training
the models. These models cover collections of pre-processed data and chosen
algorithms that work well on that data to figure out the output. Sometimes, it is a
complex procedure to create a model.

In some general cases, we start with the fundamental rules and chosen algorithms. The
system is to be fed with a lot of useful data. The data is the previous entries we have
collected in the system. With the help of this data, first, we create several “candidate
models” that are to be used and tried on new datasets, while their performance is
observed. According to the results of the success rate, the better model is chosen and
deployed to deliver fresh use cases. Due to this new incoming data, the chosen model
takes a decision and collects more experience as well as adapts itself to the ever-
expanding use cases.

How to use AI ML in apps


AI and Machine learning are very trendy aspects of technology and there are many
directions to choose from. To choose the best way to go may depend on how much
power and flexibility the developers want, or how specific their use case is. We can
either choose from ready-made AI offerings (from Google Cloud or AWS platforms) or
deploy own custom models. Below you may find the stages to follow after you
understand what the AI ML can do and identify the areas where it may improve the
application:

1. Estimate the situation and prioritize the additions

This stage is the preparation of a plan for AIML integration. Firstly, there is a need to
decide how much you tend to acquire from the integration. Doing things one by one will
be even better.

However, if you have enough budget, you can implement the integration of all changes
at once. Once you already identified the main inclusion and improvements in your app
and estimated your financial abilities, then it’s time to prioritize what requirements are
more essential to be done first.

2. Making changes and usefulness

This stage offers a feasibility test that helps to understand whether or not the future
implementations are going to benefit the business, as well as improve the user
experience or increase engagement. A successful update is the one that makes the
existing users happy and attracts more people towards the products. If an update is
increasing your business efficiency, then there is no point in putting in money for it.

3. Involve AI ML Experts

One of the most important things is to choose the resources, that will carry out the
development and up-gradation process. Once you don't work with the right specialists,
accomplishing your expectations becomes more difficult. Accordingly, you should wisely
make the selections.

4. Data integration and security

While implementing Machine Learning, your application needs a better data


organization model. The old data may affect your ML deployment efficiency. So, after
you plan what capabilities and features you should add in the app, your next step is to
focus on databases. Accurately organized data and attentive integration support in
keeping the application performance-oriented and provide high-quality in the long-term.
Another critical issue that can’t be ignored is security. To keep your application strong
and robust, you should come up with the right plan to integrate security, following the
standards and needs of your product.

5. Implementation

The most critical point on this stage is to carefully deploy and test the implementations
before making all the changes live. While adding AIML capabilities in your app, an
important suggestion is to consider putting a strong analytics system in place. Such an
approach helps you analyze the impact of new integration and get insights for future
decisions.

6. Strong technological aids

Technologies and digital solutions that back your application must be chosen right. Your
security tools, data storage aids, optimization solutions, backup software, and many
more techniques should be strong and robust, which will help to keep your application
consistent. Without this, an extreme decline may take place in performance.

How to Understand ML

Generally, ML is categorized in supervised and unsupervised learning ways:

1. Supervised learning — In this case, we feed the machine learning algorithms with a
great amount of data that is labeled (for instance, marked with the results). Such data
includes the segments or labels attached to the entries. In supervised learning, we lead
the system on how to recognize any new input according to the data we have already
delivered.

2. Unsupervised learning — here the data is neither labeled nor classified. In this case,
the system is not aware of the success or failure of the outcome as it doesn’t have any
possible guidance. In the case of unsupervised learning, the system itself tries to sort
available data and bring patterns based on the given information. Then the system
stores these patterns. In this case, when a new input appears, it should match with the
already stored patterns as well as assign the chosen pattern on it.

Reinforcement learning is sometimes considered to be a type of unsupervised learning.


Neither in this case, the input data is not labeled. However, when it achieves success,
the data is delivered back to the system in order to signify that the result is successful
which improves future outcomes.

What is data preparation?


The process of cleaning data by reformatting, correcting errors, and combining data
sets is known as data preparation. Ensuring that data is of good quality includes
standardization of data formats, enrichment of source data, and elimination of outliers.
Data preparation is essential for data professionals because it removes any bias with
insufficient quality data and ensures that any insights derived from it are accurate and
reliable.

Why does an organization need data preparation?

The main reason that data preparation processes are applied to raw data is to ensure
that the information is of good quality. Processing data with business intelligence and
other analytical applications will result in quality output. Raw data is often riddled with
missing values, inaccurate entries, and other mistakes. When multiple data sets are in
play, varying formats can duplicate or ignore values. Therefore, correcting all these
errors, validating their quality, and consolidating data sets is the first step to data
processing.

During data preparation, ensuring that data is ready for analytics serves as a starting
point. This is a base to derive actionable insights necessary for intelligent business
decisions.

An organization’s raw data can be enriched to be more useful in several ways. For
example, it can be done by merging internal and external data sets, or data sets can be
balanced. Data preparation is regularly used by business intelligence and data
management teams to streamline the process of analysis and ensure seamless self-
service with business intelligence applications.

Benefits of data preparation

For data scientists, the most significant benefit of data preparation is that they can
spend their time doing their job of analyzing and mining instead of cleaning and
structuring data. Prepared data can be fed instantly to multiple users for deployment in
various recurring analytics procedures. There is a range of other benefits of data
preparation:

It ensures that data utilized in any analytical application is clean and capable of
providing reliable output.

Data preparation helps identify potential problem areas with data that would otherwise
go undetected.

It helps management-level employees, executives, and operations-related professionals


make better business decisions.
It brings down the cost of analytics and data management.

It prevents any repetition of data when preparing it for use across applications.

It helps a business ensure a better return on investment from business intelligence and
its analytics initiatives.

Data preparation in the cloud

Data preparation should be the norm, particularly in big data environments where
information is stored in multiple ways. These storage facilities, such as data lakes, can
store data in structured, unstructured, or semi-structured forms. This means that data
often remains in its natural condition until it is used for a specific analytical purpose.
These can be predictive analytics or machine learning or can even be further advanced
methods that require large swathes of data.

With data and its processes increasingly moving to the cloud, data preparation also
makes a move simultaneously. These benefits can be huge.

Better scalability

When data preparation moves to the cloud, it scales with the business. A business need
not worry about scaling up its infrastructure or planning for the long-term evolution
requirements of the company.

Future-ready

With cloud-based data preparation, any upgrades to capabilities and patch fixes for
bugs are incorporated and functioning when released into the market. For businesses,
this means staying ahead of the curve with innovations. It also avoids delays in going to
need and any additional costs that may be incurred.

Faster access and collaborative use of data

When data preparation happens in the cloud, it remains an ongoing process. It does not
need technical installations and facilitates better teamwork for quicker output. With
quality, cloud-native data preparation tools, a business can also benefit from intuitive
graphical user interfaces that ease the process of data preparation.

How to get started implementing data preparation

Here are six points to ensure that the organization has the ideal start to data
preparation. An organization will likely need to hire data scientists or expert analysts to
help them with this step.
It’s best to look at data preparation and analysis as closely related. An organization
cannot prepare its data well unless they know what kind of analytics it is prepped for.
Knowing this from the start sets a suitable base for data preparation.

Set goals for data preparation. When users know what accuracy levels are needed and
what quality metrics are desirable, management can arrive at a projected cost for the
project. This will help create a plan for every use case the organization has.

Prioritize data sources based on the analytical processes that are planned. Ensure that
all differences are resolved when multiple sources bring in all data. This forms an
important starting point for preparing data.

Evaluate the skills and tools on hand for the job of data preparation. Self-service data
preparation tools are often believed to be the only available option. Several other
devices and technologies work in tandem with existing skill sets and data requirements.

Always anticipate failures during data preparation. It is critical to building in the


capability to handle errors when setting up a data preparation process. This will reduce
the chances of going wrong or any downtime should a problem arise.

Keep a close watch on data preparation costs. There is much expense involved in
obtaining licenses, processing data, and storage resources an organization will need.
Knowing estimates and providing a workable leeway is essential to keeping the process
within a company’s budget.

Steps of data preparation

There are several steps to produce the best possible outcomes for the data preparation
process.

Access

Every organization has various sources of business data. Some of these sources can
be data from endpoints, customers, the marketing department, and other sources that
are associated with these domains. The first step to data preparation is identifying all
the required data and their related repositories. The identification must include all
necessary sources for the kind of analysis an organization has in mind. The
organization must have a plan that outlines all the questions that require answers from
the planned data analysis.

Ingest
When the data is identified, it is introduced into the analysis tools. The information on
hand will probably be a mix of structured and semi-structured data located across
different repositories. An important step is bringing all of the data from various
repositories into one. Access and ingest are flexible, as steps vary mainly depending on
the requirement. These two data preparation steps require business and technological
expertise and are ideally handled by a small, efficient team.

Cleanse

Cleansing data is done to ensure that the data set that is being worked with provides
accurate answers when it is analyzed. Small data sets can be analyzed manually, but
larger ones will require automation, the readily available software tools. Data engineers
use applications coded in the Python language where custom processing is needed.
Ingested data can have its share of problems–values can go missing or out of range,
and there can be nulls or whitespaces where there should be values.

Format

Cleansing of data is followed by formatting. At this stage, issues such as varying data
formats and abbreviations are addressed. Any data variables deemed unnecessary for
analysis will be deleted from the data set. This is the stage of data preparation that is
best automated. Both cleansing and formatting are ideally saved as a repetitive formula
that data scientists and professionals can apply to similar data sets any time they need.
For example, if a company requires a monthly assessment of marketing and support
data, the sources are most likely to remain the same and require the same kind of
cleansing and formatting each time. Having a saved formula helps move things faster.

Challenges of data preparation

As a process, data preparation is complicated. Data sets created from several source
systems will have multiple quality, accuracy, and consistency issues that need to be
addressed. The data will also have to be reworked to make it user-friendly and remove
all irrelevant data. This can be a long-drawn process.

Here are seven main challenges that are often seen with data preparation:

There can be insufficient or inadequate data profiling. When data is not profiled
correctly, it can lead to several errors, anomalies, and issues that can result in poor
results during analytics.

Without proper data profiling, an organization can have missing or incomplete data.
Missing values is just one form of incomplete data. There are several more that need to
be addressed right from the beginning.
Data sets can also contain invalid values. This can result from spelling errors, typos, or
the wrong numbers input. These invalid entries must be spotted early on and fixed to
ensure analytical accuracy.

When data sets are brought together, name and address standardization is a must.
Often these details are stored in various systems in different formats. If not corrected,
they can affect the way the information is viewed.

There are many other inconsistencies in data that users will find across enterprise
systems. These inconsistencies can happen from any one of the multiple source
systems that are worked with. Differences can be related to terminology, specific
identifiers, and the like, which can challenge data preparation.

While data enrichment is needed, knowing what to add to it can be complex and will
require solid skills and business analytics skills.

Setting up, maintaining, and enhancing data prep processes is necessary to


standardize the process and ensure that it can be used repetitively.

Data preparation principles and best practices

Interestingly, several functional programming principles can be applied to data


preparation. While it is not a rule that available programming languages be used to
automate data preparation, such languages are the norm. These are some of the data
preparation principles and best practices to follow:

Always ensure that the organization has a clear understanding of the data consumer.
Who is the end-user of the data, and what information are they looking for with these
sources?

Know where the data is coming from and the sources that generated it.

Never get rid of the raw data. With the raw data, a data engineer can always recreate
data transformations. Also, never move data or delete it after it has been saved.

If possible, always store all data and their raw and processed results. Also, know the
compliance laws of the region the organization operates in.

Document every stage of the data pipeline. Make versions of the data, the analysis
codes, and the application that transforms the information.

Always make sure that there are clear demarcations between online and offline
analyses. This is to prevent the ingest step from impacting any user-related services.

Constantly monitor data pipelines for any inconsistency found in data sets.
Bring in a proactive form of data governance. Because information technology
constantly requires security and compliance, ensure the presence of governance
capabilities such as data masking or retention, lineage, and any role-based
permissions.

Work on creating a data preparation pipeline. The best way to do this is to understand
the data well with consumer needs and then create a workable data preparation
pipeline.

With quality data preparation, organizations can be confident that their data for any
process or gathering insight will help. It instills confidence in the process, the system,
and the output, which can be highly beneficial for any business.

Machine learning Life cycle

Machine learning has given the computer systems the abilities to automatically learn
without being explicitly programmed. But how does a machine learning system work?
So, it can be described using the life cycle of machine learning. Machine learning life
cycle is a cyclic process to build an efficient machine learning project. The main
purpose of the life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:

Gathering Data

Data preparation

Data Wrangling

Analyse Data

Train the model

Test the model

Deployment
1. Gathering Data:

Data Gathering is the first step of the machine learning life cycle. The goal of this step is
to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the most
important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will
be the prediction.

This step includes the below tasks:

Identify various data sources

Collect data

Integrate the data obtained from different sources

By performing the above task, we get a coherent set of data, also called as a dataset. It
will be used in further steps.
2. Data preparation

After collecting the data, we need to prepare it for further steps. Data preparation is a
step where we put our data into a suitable place and prepare it to use in our machine
learning training.

In this step, first, we put all data together, and then randomize the ordering of data.

This step can be further divided into two processes:

Data exploration:

It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.

A better understanding of data leads to an effective outcome. In this, we find


Correlations, general trends, and outliers.

Data pre-processing:

Now the next step is preprocessing of data for its analysis.

3. Data Wrangling

Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the next
step. It is one of the most important steps of the complete process. Cleaning of data is
required to address the quality issues.

It is not necessary that data we have collected is always of our use as some of the data
may not be useful. In real-world applications, collected data may have various issues,
including:

Missing Values

Duplicate data

Invalid data

Noise

So, we use various filtering techniques to clean the data.

It is mandatory to detect and remove the above issues because it can negatively affect
the quality of the outcome.
4. Data Analysis

Now the cleaned and prepared data is passed on to the analysis step. This step
involves:

Selection of analytical techniques

Building models

Review the result

The aim of this step is to build a machine learning model to analyze the data using
various analytical techniques and review the outcome. It starts with the determination of
the type of the problems, where we select the machine learning techniques such as
Classification, Regression, Cluster analysis, Association, etc. then build the model using
prepared data, and evaluate the model.

Hence, in this step, we take the data and use machine learning algorithms to build the
model.

5. Train Model

Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.

We use datasets to train the model using various machine learning algorithms. Training
a model is required so that it can understand the various patterns, rules, and, features.

6. Test Model

Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset
to it.

Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.

7. Deployment

The last step of machine learning life cycle is deployment, where we deploy the model
in the real-world system.

If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying
the project, we will check whether it is improving its performance using available data or
not. The deployment phase is similar to making the final report for a project.

You might also like