0% found this document useful (0 votes)
23 views8 pages

AI and ML Notes

Uploaded by

fatimahasnain410
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views8 pages

AI and ML Notes

Uploaded by

fatimahasnain410
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

AI and ML

Artificial Intelligence (AI)

Artificial Intelligence, often abbreviated as AI, is a branch of computer science that aims to
create intelligent systems that can reason, learn, and act autonomously. It seeks to mimic human
intelligence processes, such as learning, problem-solving, perception, and language
understanding.

From self-driving cars to medical diagnosis, AI-powered systems are becoming increasingly
sophisticated and impactful. The core goal of AI is to develop machines that can think and act
like humans, enabling them to perform tasks that were once thought to be the exclusive domain
of human intelligence.

Machine Learning (ML)

Machine Learning, a subset of AI, focuses on algorithms that allow computers to learn from data
without explicit programming. It involves training models on large datasets to recognize patterns
and make predictions or decisions. ML algorithms can be broadly categorized into three main
types:

 Supervised Learning: In supervised learning, the algorithm is trained on a labeled


dataset, where each data point is associated with a correct output. The goal is to learn a
mapping function that can accurately predict the output for new, unseen data. Examples
of supervised learning tasks include regression and classification.
 Unsupervised Learning: Unsupervised learning involves training algorithms on
unlabeled data. The goal is to discover hidden patterns and structures within the data.
Common unsupervised learning techniques include clustering and dimensionality
reduction.
 Reinforcement Learning: Reinforcement learning is a type of machine learning where a
system learns to make decisions by interacting with an environment. The system receives
rewards or penalties based on its actions and learns to maximize the cumulative reward
over time

Importance and Issues of Ethics in AI

AI has the potential to revolutionize various aspects of our lives, but it also raises significant
ethical concerns. Here are the key points to consider:

Importance of Ethics in AI

 Fairness and Bias: AI systems trained on biased data can perpetuate discrimination and
inequality. Ethical AI ensures that algorithms are fair and unbiased.
 Transparency and Explainability: AI models often operate as black boxes, making it
difficult to understand how they arrive at decisions. Ethical AI requires transparency and
explainability to build trust.
 Privacy and Security: AI systems often collect and process large amounts of personal
data. Ethical AI safeguards user privacy and ensures data security.
 Accountability: Determining who is responsible for the actions and consequences of AI
systems is a complex ethical challenge. Ethical AI requires clear guidelines for
accountability.

Key Ethical Issues in AI

 Bias and Discrimination: AI systems trained on biased data can perpetuate


discrimination and inequality as AI can inadvertently learn and amplify biases present in
data. This is especially concerning in areas like hiring, lending, and criminal justice.
 Transparency and Explainability: AI models often operate as black boxes, making it
difficult to understand how they arrive at decisions
 Job Displacement: AI and automation have the potential to displace workers in certain
industries leading to widespread unemployment.
 Autonomous Weapons: The development of autonomous weapons raises concerns about
the potential for unintended harm and lack of human control.
 Accountability: Determining who is responsible for the actions and consequences of AI
systems is a complex ethical challenge
 Deep fakes and Misinformation: The creation and spread of deepfakes and
misinformation can erode trust in information and undermine social discourse.

Addressing Ethical Challenges in AI

 Ethical AI Frameworks: Develop guidelines and frameworks to promote ethical AI


development and usage.
 Data Quality and Diversity: Ensure that training data is diverse and representative to
mitigate bias.
 Transparency and Explainability: Create tools and techniques to make AI models more
transparent and understandable.
 Human Oversight: Implement human oversight to monitor AI systems and intervene
when necessary.
 International Cooperation: Foster international collaboration to address global ethical
challenges in AI

Machine learning algorithms are broadly categorized into two main tasks based on the type of
prediction they make: regression and classification.

a. Regression

Regression is a supervised learning technique used to predict continuous numerical values. It


involves training a model to learn the relationship between a dependent variable (what you want
to predict) and one or more independent variables (the factors influencing the prediction).
Key aspects of regression:

 Prediction of continuous values: Examples include predicting house prices, stock


prices, or weather patterns.
 Training on labeled data: The training data needs labels indicating the target value for
each data point.
 Common regression algorithms: Linear regression, polynomial regression, decision
trees, and support vector machines (SVMs) can be used for regression tasks.

b. Classification

Classification is another supervised learning technique. However, unlike regression,


classification predicts discrete categories or classes. Here, the model learns to identify patterns in
data and classify new data points into one of the predefined categories.

Key aspects of classification:

 Prediction of categorical values: Examples include classifying emails as spam or not


spam, identifying different types of images (e.g., cat vs. dog), or diagnosing diseases.
 Training on labeled data: The training data needs labels indicating the category each
data point belongs to.
 Common classification algorithms: Decision Trees, Random Forest, Support Vector
Machines (SVMs), Naive Bayes, and K-Nearest Neighbors (KNN) are popular choices
for classification.

9. Applications of Regression

Regression has a wide range of applications across various fields:

 Computer Science: Predicting CPU usage, network traffic, or user behavior in


applications.
 Engineering: Predicting material properties, machine failure, or system performance.
 Data Analytics: Forecasting sales trends, customer churn, or market demand.
 Financial Sector: Risk assessment for loans, predicting stock prices, or customer
creditworthiness.
 Healthcare: Predicting disease progression, treatment outcome, or patient survival rates.

10. Applications of Classification

Classification is another powerful tool with diverse applications:

 Computer Science: Spam filtering, image recognition, fraud detection in online


transactions.
 Engineering: Fault detection in machinery, anomaly detection in sensor data.
 Data Analytics: Customer segmentation, sentiment analysis of social media data, credit
card fraud detection.
 Financial Sector: Customer creditworthiness assessment, identifying fraudulent
transactions.
 Healthcare: Disease diagnosis, image analysis for medical imaging (e.g., X-rays),
identifying drug targets.

Data Loading

Definition: Data loading refers to the process of importing data from various sources into a
system or software for analysis and processing.

Key Points:

 Sources: Data can come from diverse sources like databases, spreadsheets, APIs, or real-
time streams.
 Formats: Data can be in various formats such as CSV, JSON, XML, or binary.
 Tools and Libraries:
o Python: Pandas, NumPy
o R: readr, readxl
o SQL: Database connectors
 Challenges:
o Data Quality: Inconsistent formats, missing values, outliers.
o Data Volume: Large datasets can be computationally expensive to load.
o Data Velocity: Real-time data streams require efficient loading mechanisms.

Data Summarization

Definition: Data summarization involves condensing large datasets into smaller, more
manageable forms while preserving essential information.

Key Points:

 Techniques:
o Descriptive Statistics: Mean, median, mode, standard deviation, quartiles.
o Frequency Distributions: Counting occurrences of values.
o Aggregation: Grouping data and calculating summary statistics for each group.
o Dimensionality Reduction: Techniques like PCA to reduce the number of
variables.
 Tools and Libraries:
o Python: Pandas, NumPy
o R: dplyr
 Purpose:
o Understanding Data: Gain insights into data distribution, trends, and patterns.
o Data Cleaning: Identifying and handling outliers, missing values, and
inconsistencies.
o Feature Engineering: Creating new features from existing ones.
o Model Building: Selecting relevant features and training models.

Big Data

Definition: Big data refers to extremely large data sets that may be analyzed computationally to
reveal patterns, trends, and associations, especially relating to human behavior and interactions.

Key Characteristics:

 Volume: Large amounts of data.


 Velocity: Rapid generation of data.
 Variety: Diverse data types (structured, unstructured, semi-structured).
 Veracity: Data quality and accuracy.

Challenges:

 Storage: Requires efficient storage solutions.


 Processing: Demands powerful computing resources.
 Analysis: Complex analytical techniques are needed.

PREPROCESSING

DATA: Data is a raw and unorganized fact that required to be processed to make it meaningful.
Generally, data comprises facts, observations, numbers, characters, symbols, image, etc.

INFORMATION: Information is a set of data which is processed, structured or presented in a


meaningful and useful way according to the given instructions/ requirements.

WHAT ID DATA PREPROCESSING?

• Data preprocessing, a component of data preparation, describes any type of processing


performed on raw data to prepare it for another data processing procedure.
• Data preprocessing transforms the data into a format that is more easily and effectively
processed in data mining, machine learning and other data science tasks.

WHY IS DATA PREPROCESSING IMPORTANT?

Virtually any type of data analysis, data science or AI development requires some type of data
preprocessing to provide reliable, precise and robust results for enterprise applications.

Machine learning and deep learning algorithms work best when data is presented in a format that
highlights the relevant aspects required to solve a problem.
Feature engineering practices that involve data transformation, data reduction, feature selection
and feature scaling help restructure raw data into a form suited for particular types of algorithms.

This can significantly reduce the processing power and time required to train a new machine
learning or AI algorithm or run an inference against it.

One caution that should be observed in preprocessing data: the potential for reencoding bias
into the data set. Identifying and correcting bias is critical for applications that help make
decisions that affect people, such as loan approvals

WHAT IS A FEATURE IN MACHINE LEARNING?

In machine learning, a feature is a characteristic or attribute of a dataset that can be used to


train a model.

KEY STEPS IN DATA PREPROCESSING:

1. Data profiling. Data profiling is the process of examining, analyzing and reviewing data
to collect statistics about its quality.
2. Data cleansing. The aim here is to find the easiest way to rectify quality issues, such as
eliminating bad data, filling in missing data or otherwise ensuring the raw data is suitable
for feature engineering.
3. 3. Data reduction. Raw data sets often include redundant data that arise from characterizing
phenomena in different ways or data that is not relevant to a particular ML, AI or analytics task.
Data reduction uses techniques to transform the raw data into a simpler form suitable for
particular use cases.
4. 4. Data transformation. Here, data scientists think about how different aspects of the data need
to be organized to make the most sense for the goal.

HOW IS DATA PREPROCESSING USED?


• Data preprocessing plays a key role in earlier stages of machine learning and AI
application development. In an AI context, data preprocessing is used to improve the way
data is cleansed, transformed and structured to improve the accuracy of a new model,
while reducing the amount of compute required.
• User sessions may be tracked to identify the user, the websites requested and their order,
and the length of time spent on each one. Once these have been pulled out of the raw
data, they yield more useful information that can be applied, for example, to consumer
research, marketing or personalization.

DATA VISUALIZATION:

Data visualization is the practice of translating information into a visual context, such as a map
or graph, to make data easier for the human brain to understand and pull insights from.

The main goal of data visualization is to make it easier to identify patterns, trends and outliers in
large data sets. The term is often used interchangeably with information graphics, information
visualization and statistical graphics.

Data visualization is one of the steps of the data science process, which states that after data has
been collected, processed and modeled, it must be visualized for conclusions to be made.

Visualizations of complex algorithms are generally easier to interpret than numerical outputs.

Types of Visualizations:

o Histograms: Distribution of numerical data


o Bar Charts: Comparisons between categories
o Line Charts: Trends over time
o Scatter Plots: Relationships between variables
o Box Plots: Distribution of data, including outliers
o Heatmaps: Visualizing matrices of data
 Tools and Libraries:
o Python: Matplotlib, Seaborn, Plotly
o R: ggplot2
 Principles:
o Clarity: Clear and concise visualizations.
o Relevance: Visualizations should highlight key insights.
o Aesthetic: Visually appealing and easy to understand.

You might also like