Notes XII AI
Notes XII AI
An analytical approach in AI means carefully examining data and algorithms to understand and
solve problems. It starts with preparing and cleaning data, choosing the right algorithms, and
building models. Then, it involves testing these models to see how well they work. By
interpreting the results and making adjustments as needed, this approach helps in gaining
useful insights and making informed decisions based on data.
1. Data Collection: This refers to the process of acquiring data from different sources to use in
analysis or research. It involves identifying what data is needed, choosing appropriate methods
and tools for gathering it, and ensuring the data is accurate and relevant. Data can be collected
through various means, such as surveys, experiments, sensors, or accessing existing
databases.
2.Data Requirement: This outlines the specific types, quantities, and quality of data needed to
meet particular goals or answer specific questions. It defines what data is necessary to address
a problem or conduct analysis effectively. Data requirements include considerations of data
format, granularity, and the scope of information needed to ensure that the collected data will be
sufficient and appropriate for the intended use.
Data understanding:- involves the initial exploration and examination of the data to gain insights
into its structure, quality, and patterns. This phase helps to comprehend the data and identify
any issues that might affect the analysis or modeling process.
Data preparation:- is the process of cleaning, transforming, and organizing data to make it
suitable for analysis or modeling. This includes handling missing values, correcting errors, and
converting data into appropriate formats.
Evaluation:- is the process of assessing the performance of a machine learning model using
various metrics and techniques. This phase helps determine how well the model generalizes to
new, unseen data and identifies any potential issues such as overfitting or underfitting.
Modeling:- is the phase where machine learning algorithms are applied to the prepared data to
create a model that can make predictions or classifications. This involves selecting appropriate
algorithms, training the model on the dataset, and fine-tuning it to optimize performance.
Deployment is the process of setting up and making a product, service, or software available for
people to use. This includes preparing, adjusting settings, testing, and monitoring to ensure it
works correctly.
Feedback is the information or opinions from users about how something works, which helps in
identifying areas for improvement and enhancing overall quality.
To check a model’s quality, start by dividing your data into training, validation, and testing sets.
Use measures like accuracy or error rates to see how well the model performs. Try
cross-validation to confirm it works well with different data parts. Look at tools like confusion
matrices and ROC curves to understand results. Watch for overfitting (too tailored to training
data) and underfitting (not learning enough). Adjust settings for the best outcomes, and have
experts review it. Test the model under different conditions to make sure it stays reliable.
Train Dataset: A train dataset is the portion of your data used to build and fit the model. It helps
the model learn patterns, relationships, and features by adjusting its parameters to minimize
errors. The model is repeatedly exposed to this data during the training process to improve its
performance and accuracy.
Test Dataset: A test dataset is a separate portion of your data that is used to evaluate the
model's performance after training. It provides an unbiased assessment of how well the model
generalizes to new, unseen data. The test dataset helps determine the model’s accuracy,
robustness, and effectiveness in making predictions.
Cross-validation is a method to check how well a model works by using different parts of the
data for training and testing. Here’s how it works in simple steps:
1. Divide Data: Split your data into several equal parts, called folds.
2. Train and Test: Train the model on some folds and test it on the remaining fold. Repeat this
process so each fold gets a chance to be the test set.
3. Evaluate: Calculate the model’s performance for each fold and then average the results. This
gives a better idea of how the model will perform on new, unseen data.
4. Advantages: It helps ensure the model is tested thoroughly and makes the most of the
available data, leading to a more reliable performance estimate.
- Objective Function: A formula that the model tries to optimize, combining metrics and possibly
regularization to achieve the best performance.
- Loss Functions: Measures how far the model’s predictions are from the actual outcomes,
guiding how to improve the model. Examples include Mean Squared Error for regression and
Cross-Entropy Loss for classification.
- Gradient Descent: An algorithm that adjusts model parameters to minimize the loss function by
moving in the direction that reduces errors.
Q13.Categorized the loss function into 2 types, Classification and Regression Loss.
Q14.Define the following
- Mean Squared Error (MSE): MSE measures the average squared difference between the
predicted values and the actual values. It quantifies how far the predictions are from the actual
outcomes, with larger errors having a greater impact due to squaring.
- Root Mean Squared Error (RMSE): RMSE is the square root of MSE. It provides a measure of
the standard deviation of the prediction errors and represents the average distance between
predicted and actual values in the same units as the data, making it easier to interpret.
Q15. How you can calculate the mean, median, and mode using Python:
1. Mean: The mean is calculated using `statistics.mean(data)`, which adds up all the numbers in
the list and divides by the total count.
2. Median: The median is calculated with `statistics.median(data)`, which sorts the list and finds
the middle value. If there’s an even number of elements, it averages the two middle values.
3. Mode: The mode is calculated with `statistics.mode(data)`, which finds the number that
appears most frequently in the list. Note that if there are multiple modes, it returns the first one it
finds.
Example:
Unit:2:- AI Model Life Cycle
1. Define AI model life cycle.
The AI model lifecycle is a process that starts with defining the problem and gathering the
necessary data. Next, the data is cleaned and prepared for use. Various models are then trained
and tested to find the best fit for the problem. After training, the model is evaluated to ensure it
performs well. Once it passes these checks, the model is deployed to do its job in real-world
settings. To keep it effective, the model is continuously monitored and updated based on new
data or changing needs. This ongoing process helps the AI model stay useful and accurate.
a. Supervised Learning Models: These models learn from labeled datasets, where the
correct answers are provided. The model makes predictions or classifications based on
this training. Examples include linear regression, decision trees, support vector
machines, and neural networks.
b. Unsupervised Learning Models: These models work with unlabeled data and aim to
find patterns or groupings within the data. Common unsupervised learning models
include clustering algorithms like k-means, hierarchical clustering, and dimensionality
reduction techniques like principal component analysis (PCA).
c. Reinforcement Learning Models: These models learn by interacting with an
environment, taking actions, and receiving feedback in the form of rewards or penalties.
The model aims to maximize cumulative rewards over time. Reinforcement learning is
often used in robotics, game playing, and autonomous systems.
d. Deep Learning Models: A subset of machine learning models, deep learning models
use neural networks with multiple layers (deep neural networks) to process complex
data. These models are particularly effective for tasks such as image and speech
recognition, natural language processing, and more. Examples include convolutional
neural networks (CNNs), recurrent neural networks (RNNs), and transformers.
e. Generative Models: These models generate new data samples that are similar to the
training data. They are used in applications such as image generation, text synthesis,
and music composition. Examples include Generative Adversarial Networks (GANs) and
Variational Autoencoders (VAEs).
f. Transfer Learning Models: These models leverage pre-trained models on a large
dataset and fine-tune them for a specific task with a smaller dataset. This approach is
often used when there is limited labeled data available for the specific task at hand.
● Problem Definition: Clearly define the problem the AI solution aims to solve. This
involves understanding the business objectives and identifying how AI can add value.
● Feasibility Study: Assess whether the problem can be solved with AI, considering data
availability, technical requirements, and potential risks.
● Resource Planning: Determine the resources needed, including data, tools,
infrastructure, and personnel. Establish timelines and milestones for the project.
● Stakeholder Engagement: Engage with all relevant stakeholders to ensure their needs
are understood and incorporated into the project plan.
● Data Collection and Preparation: Gather the necessary data and prepare it for
analysis. This includes cleaning, transforming, and possibly augmenting the data to
ensure it is suitable for modeling.
● Model Selection and Development: Choose the appropriate AI models and techniques
based on the problem requirements and the nature of the data. Develop and train the
models using the prepared data.
● Testing and Evaluation: Rigorously test the model to evaluate its performance and
accuracy. This may involve using a separate validation dataset or applying
cross-validation techniques.
● Iteration: Based on the testing results, refine and improve the model iteratively. This
may involve tweaking the model parameters, trying different algorithms, or further data
preprocessing.
● Integration: Integrate the AI model into the existing systems or workflows. This may
involve developing APIs or user interfaces to interact with the model.
● Monitoring and Maintenance: Continuously monitor the model’s performance in the
production environment. Implement mechanisms to collect feedback and data on the
model’s predictions or outputs.
● Updating and Retraining: As new data becomes available or if the model’s
performance degrades, update and retrain the model to ensure it remains effective and
accurate.
● Documentation and Communication: Document the model, its usage, and its
performance metrics. Communicate with stakeholders about the model’s impact and any
necessary updates.
4. Design/Building the Model. During this phase, you need to evaluate the various AI
development platforms. Explain.
Open Languages
● Python: The most popular programming language for AI and machine learning due to its
simplicity, readability, and the vast ecosystem of libraries and frameworks.
● R: Preferred for statistical analysis and data visualization, with extensive packages for
data manipulation, analysis, and machine learning.
● Scala: Often used in big data contexts, Scala is known for its compatibility with Apache
Spark, a popular framework for large-scale data processing.
Open Frameworks
● Scikit-learn: A Python library that provides simple and efficient tools for data mining and
data analysis, covering a range of machine learning algorithms.
● XGBoost: An optimized gradient boosting library designed to be highly efficient, flexible,
and portable, widely used for structured or tabular data.
● TensorFlow: An end-to-end open-source platform for machine learning, providing a
comprehensive ecosystem of tools, libraries, and community resources.
Productivity-Enhancing Capabilities
● Visual Modeling: Tools that provide a graphical interface to build and visualize models,
reducing the need for extensive coding.
● AutoAI: Automated machine learning techniques that help in automating the processes
of feature engineering, algorithm selection, and hyperparameter optimization, making it
easier and faster to build models.
Development Tools
Q1.Why storytelling is so powerful and cross-cultural, and what this means for data
storytelling?
Storytelling is powerful because it draws people in and makes them feel connected. It helps
people understand different cultures and builds community. When we use stories to explain
data, the information becomes clearer and more engaging. This way, important messages are
easier to understand and more likely to inspire action.
For Example :- A company reports a 30% waste reduction. Instead of just numbers, they tell a
story about Alex, who started a recycling program and inspired others. This personal story
makes the data relatable and motivates action.
The steps involved in telling an effective data story are given below:
To tell a great story with your data, focus on these key points:
1. Define the Purpose: Know the key message you want to convey.
2. Know Your Audience: Tailor the story to their interests and understanding.
3. Create a Narrative: Structure the data with a clear beginning, middle, and end.
4. Use Visuals Effectively: Enhance the story with clear, relevant visuals.
Data storytelling is important because it simplifies complex information, makes data more
meaningful and engaging, and helps drive change by providing context and insight. Stories with
data are more persuasive, standardize communication, and make information more memorable.
Q6.Identify the elements that make a compelling data story and name them
- Input*: This is the data or information that you provide to a system (like a computer or a
process) for it to work with. For example, typing on a keyboard is input for a computer.
- Narrative: This is the way a story or series of events is told or structured. It includes the plot,
characters, and setting. For example, a novel or a film has a narrative that guides how the story
unfolds.
- Representation: This refers to how something is shown or portrayed. It can be visual, like a
painting, or conceptual, like how data is presented in charts. It’s about how an idea or thing is
depicted to others.