0% found this document useful (0 votes)

14 views58 pages

Gen AI ML and Python

The document provides an introduction to programming with Python, emphasizing its relevance for AI and machine learning due to its simplicity and extensive libraries. It covers Python installation, basic syntax, data types, control flow, data structures, functions, and file handling, along with practical exercises. Additionally, it introduces NumPy as a key library for numerical computing in Python.

Uploaded by

aj.797788.7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views58 pages

Gen AI ML and Python

Uploaded by

aj.797788.7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Gen AI / ML and Python

Introduction to Programming & Python Basics

What is Programming?
Programming is the process of writing instructions (code) that a computer can
execute to perform specific tasks.

It allows you to automate calculations, process data, build applications, and

solve real-world problems.

Why Python for AI/ML?

Python is a popular, beginner-friendly programming language known for its
simple syntax and readability

It has a vast ecosystem of libraries for AI and machine learning (e.g., NumPy,
Pandas, scikit-learn, TensorFlow, PyTorch).

Python is widely used in industry and academia for data science, AI, and
automation due to its flexibility and strong community support

Installing Python
Most computers do not come with Python pre-installed, but installation is
straightforward:

Download Python from the official website: python.org1

Follow the installation instructions for your operating system (Windows,

macOS, Linux).

During installation, ensure you check the box to "Add Python to PATH" for
easy access from the command line

Using IDEs: Colab and Jupyter

IDEs (Integrated Development Environments) make coding easier by
providing features like syntax highlighting, code completion, and error

Gen AI / ML and Python 1

checking.

Jupyter Notebook:

A web-based tool for writing and running Python code in “cells.” Great for
experimentation and data analysis.

Install via pip install notebook or use JupyterLab.

Google Colab:

Free, cloud-based Jupyter notebooks. No installation needed—just sign in

with your Google account at colab.research.google.com.

Excellent for beginners and for running code on any device

Running Your First Python Script

Open your IDE or a terminal/command prompt.

Type the following code and run it:

print("Hello, World!")

This will display:

Hello, World!

In Jupyter or Colab, enter the code in a cell and press Shift+Enter to run it

Python Syntax
Python uses indentation (spaces or tabs) to define code blocks.

Statements end with a newline, not a semicolon.

Variables
Variables store data values. You do not need to declare their type.

Gen AI / ML and Python 2

name = "Alice"
age = 20

Data Types
String: Text, e.g., "Hello"

Integer: Whole numbers, e.g., 5

Float: Decimal numbers, e.g., 3.14

Boolean: True or False

greeting = "Hi"
year = 2025
pi = 3.1415
is_student = True

Input and Output

Output: Use print() to display information.

print("Welcome to Python!")

Input: Use input() to get user input (always returns a string).

user_name = input("Enter your name: ")

print("Hello,", user_name)

Simple Arithmetic Operations

a = 10
b=3
print(a + b) # Addition: 13
print(a - b) # Subtraction: 7

Gen AI / ML and Python 3

print(a * b) # Multiplication: 30
print(a / b) # Division: 3.333...
print(a // b) # Integer division: 3
print(a % b) # Modulus (remainder): 1
print(a ** b) # Exponentiation: 1000

Practice Exercise
Write a script that:

Asks for two numbers from the user,

Adds them,

Prints the result.

num1 = input("Enter first number: ")

num2 = input("Enter second number: ")
sum = int(num1) + int(num2)
print("The sum is:", sum)

Control Flow & Data Structures

1. Conditional Statements ( if , elif , else )
Conditional statements control the flow of execution based on conditions.

Basic if Statement
Executes a block if the condition is True :

x = 10
if x > 5:
print("x is greater than 5")

if-else Statement
Executes one block if the condition is True , another if False :

Gen AI / ML and Python 4

number = int(input("Enter a number: "))
if number > 0:
print("Positive number")
else:
print("Not a positive number")

If the user enters 10 , output is Positive number .

If the user enters 0 , output is Not a positive number

if-elif-else Statement
Checks multiple conditions in sequence:

score = 85
if score >= 90:
print("Grade: A")
elif score >= 80:
print("Grade: B")
else:
print("Grade: C")

Only the first True condition's block executes.

Nested if Statements
You can nest if statements inside each other for complex logic.

2. Loops ( for , while )

Loops are used to execute a block of code repeatedly.

For Loop
Used for iterating over a sequence (list, tuple, string, etc.):

fruits = ["apple", "banana", "cherry"]

for fruit in fruits:

Gen AI / ML and Python 5

print(fruit)

Iterates over each element in the sequence

Use Cases:

Processing items in a list or tuple

Repeating actions a fixed number of times

While Loop
Executes as long as a condition is True :

i=1
while i < 6:
print(i)
i += 1

Useful when the number of iterations is not known in advance

Use Cases:

Waiting for user input

Running until a specific event occurs (e.g., guessing games, event loops)

Loop Control Statements

break : Exit the loop immediately.

continue : Skip the current iteration and continue with the next.

else : Optional; runs if the loop completes normally (not via break ).

Example:

for i in range(5):
if i == 3:
break
print(i)

Gen AI / ML and Python 6

else:
print("Loop finished")

The else block will not execute if the loop is exited with break

Session 2: Lists, Tuples, Sets, Dictionaries

1. Lists
Definition: Ordered, mutable collection. Allows duplicates.

Syntax: my_list = [1][8][9]

Indexing: Zero-based ( my_list is 1 )

Basic Operations:

Add: my_list.append(4)

Remove: my_list.remove(2)

Access: my_list[1] (returns 2 )

Slice: my_list[1:3] (returns [8][9] )

Iterate:
pythonfor item in my_list:
print(item)

Use Cases: Storing ordered data, dynamic collections

2. Tuples
Definition: Ordered, immutable collection. Allows duplicates.

Syntax: my_tuple = (1, 2, 3)

Indexing: Zero-based ( my_tuple is 1 )

Basic Operations:

Access: my_tuple[1]

Count: my_tuple.count(2)

Index: my_tuple.index(3)

Gen AI / ML and Python 7

Iterate:

for item in my_tuple:

print(item)

Use Cases: Fixed data, function returns, keys in dictionaries

3. Sets
Definition: Unordered, mutable collection of unique elements.

Syntax: my_set = {1, 2, 3}

Basic Operations:

Add: my_set.add(4)

Remove: my_set.remove(2)

Membership: 2 in my_set

Iterate:

for item in my_set:

print(item)

Use Cases: Removing duplicates, set operations (union, intersection)

4. Dictionaries
Definition: Unordered (ordered as of Python 3.7+), mutable collection of key-
value pairs.

Syntax: my_dict = {'a': 1, 'b': 2}

Indexing: By key ( my_dict['a'] returns 1 )

Basic Operations:

Add/Update: my_dict['c'] = 3

Remove: my_dict.pop('b')

Access: my_dict['a']

Gen AI / ML and Python 8

Iterate:

for key, value in my_dict.items():

print(key, value)

Use Cases: Fast lookups, mapping relationships

Comparison Table
Feature List Tuple Set Dictionary

Mutable Yes No Yes Yes

Ordered Yes Yes No Yes (3.7+)

Keys: No, Values:

Duplicates Yes Yes No
Yes

Indexing Integer Integer No Key-based

Syntax [1][8][9] (1,2,3) {1,2,3} {'a':1, 'b':2}

Indexing and Iteration

Lists/Tuples: Use integer indices and slicing.

my_list = [10, 20, 30]

print(my_list[1]) # 20
for i, val in enumerate(my_list):
print(i, val)

Sets: No indexing; iterate directly.

Dictionaries: Iterate over keys, values, or items.

for key in my_dict:

print(key, my_dict[key])
for key, value in my_dict.items():
print(key, value)

Gen AI / ML and Python 9

Advanced Iteration: Use range() , enumerate() , or custom logic for cyclic or
indexed iteration

Functions, Modules, and File Handling

1. Defining and Using Functions
Function Definition:
Use the def keyword, followed by the function name, parentheses (with
optional parameters), and a colon. The function body is indented.

def greet():
print("Hello, World!")
greet() # Output: Hello, World!

Purpose:
Functions help organize code, promote reuse, and improve readability

2. Parameters and Arguments

Parameters:

Variables listed inside the parentheses in the function definition.

Arguments:
Values passed to the function when it is called.

Types of Parameters:

Positional: Order matters.

Keyword: Specify by name, order doesn't matter.

Default: Provide a default value.

Variable-length: Use args for tuples, *kwargs for dictionaries.

def add(a, b=5):

return a + b

Gen AI / ML and Python 10

print(add(3)) # Output: 8
print(add(3, 7)) # Output: 10

3. Return Values
Returning Values:
Use the return statement to send a result back to the caller. If no return is
specified, the function returns None by default.

def square(x):
return x * x
result = square(4) # result is 16

Multiple Return Values:

Python allows returning multiple values as a tuple.

def stats(numbers):
return min(numbers), max(numbers)
mn, mx = stats([1, 2, 3])
# mn = 1, mx = 3

Returning Lists, Dictionaries, or Functions:

Functions can return any object, including lists, dictionaries, or even other
functions456.

4. Scope in Python
Local Scope:

Variables defined inside a function are local and accessible only within that
function.

def foo():
x = 10 # local to foo
print(x)

Gen AI / ML and Python 11

foo()
# print(x) # Error: x is not defined

Global Scope:
Variables defined outside any function are global and accessible throughout
the file.

x = 20
def bar():
print(x) # accesses global x
bar()

Nonlocal Scope:
Used in nested functions to refer to variables in the enclosing function.

def outer():
x = "outer"
def inner():
nonlocal x
x = "inner"
inner()
print(x) # Output: inner
outer()

Variable Shadowing:
If a variable with the same name exists in both local and global scope, the
local variable takes precedence inside the function

5. Importing Modules and Using Standard Libraries

Importing Modules:

Use the import statement to bring in external code (modules).

import math

Gen AI / ML and Python 12

print(math.sqrt(16)) # Output: 4.0

Import Specific Functions:

from math import pi

print(pi) # Output: 3.141592653589793

Standard Libraries:

Python comes with a rich set of standard modules, such as math , random ,
datetime , os , and sys .

Custom Modules:
You can create your own modules by saving functions in a .py file and
importing them

Reading from and Writing to Files

Opening Files:
Use the open() function with a filename and mode ( 'r' , 'w' , 'a' , etc.).

f = open('data.txt', 'r') # Open for reading

Reading Files:

read() : Reads the entire file.

readline() : Reads one line at a time.

readlines() : Reads all lines into a list.

with open('data.txt', 'r') as f:

content = f.read()

Writing Files:

'w' : Write (overwrites existing file or creates new).

'a' : Append to the end of the file.

Gen AI / ML and Python 13

with open('output.txt', 'w') as f:
f.write("Hello, file!")

Best Practice:
Use with statement to automatically close files

Mode Description

'r' Read (default)

'w' Write (overwrite/create)

'a' Append (end of file)

'r+' Read and write

'w+' Write and read (overwrite)

'a+' Append and read

String Operations
Common String Methods:

len(s) : Length of string

s.upper() , s.lower() : Change case

s.strip() : Remove whitespace

s.replace('a', 'b') : Replace substring

s.split(',') : Split into list

','.join(list) : Join list into string

s.find('sub') : Find substring index

s.isdigit() , s.isalpha() : Check content type

pythontext = " Hello, World! "
print(text.strip().upper())
# Output: HELLO, WORLD!

Formatting Strings:

f-strings: f"Value: {x}"

Gen AI / ML and Python 14

str.format() : "Value: {}".format(x)

Simple File-Based Mini Project: Word Counter

Objective:
Read a text file, count the frequency of each word, and write the results to a new
file.
Steps:

1. Read the file content.

2. Clean and split the text into words.

3. Count word occurrences.

4. Write the results to an output file.

Sample Code:

def count_words(input_file, output_file):

with open(input_file, 'r') as f:
text = f.read().lower()
words = text.split()
word_count = {}
for word in words:
word = word.strip('.,!?";:')
word_count[word] = word_count.get(word, 0) + 1
with open(output_file, 'w') as f:
for word, count in sorted(word_count.items()):
f.write(f"{word}: {count}\n")

count_words('input.txt', 'word_count.txt')

This project demonstrates file reading/writing, string manipulation, and

dictionary usage.

Python for Data Science

Gen AI / ML and Python 15

What is NumPy?
NumPy stands for Numerical Python and is a foundational library for
numerical and scientific computing in Python.

It provides a powerful n-dimensional array object and useful functions for

performing mathematical operations efficiently.

Commonly used for: data analysis, scientific computing, and as the base for
other libraries like Pandas and SciPy.

Creating and Working with NumPy Arrays

a. Creating Arrays

From lists:

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])

Multi-dimensional arrays:

b = np.array([[1, 2, 3], [4, 5, 6]])

Arrays filled with zeros or ones:

zeros = np.zeros(3) # array([0., 0., 0.])

ones = np.ones(3) # array([1., 1., 1.])

Arrays with a range of numbers:

arr = np.arange(0, 10, 2) # array([0, 2, 4, 6, 8])

linspace = np.linspace(0, 1, 5) # array([0. , 0.25, 0.5 , 0.75, 1. ])

arr = np.ones(3, dtype=np.int64)

Specify data type:

Gen AI / ML and Python 16

b. Indexing and Slicing

Access elements by index (0-based):

print(a[0]) # 1

Slicing:

print(a[:3]) # array([1, 2, 3])

c. Basic Operations

Element-wise operations:

data = np.array([1, 2])

ones = np.ones(2, dtype=int)
print(data + ones) # array([2, 3])
print(data * data) # array([1, 4])

Aggregations:

a = np.array([1, 2, 3, 4])
print(a.sum()) # 10
b = np.array([[1, 1], [2, 2]])
print(b.sum(axis=0)) # array([3, 3]) # sum over rows
print(b.sum(axis=1)) # array([2, 4]) # sum over columns

arr = np.array([
[1, 2, 3],
[4, 5, 6]
])

11
22

Reshaping:

Gen AI / ML and Python 17

c = np.arange(12).reshape(3, 4)
print(c)

What is Pandas?
Pandas is a Python library built on top of NumPy, designed for data
manipulation and analysis.

It introduces two main data structures:

Series: 1D labeled array

DataFrame: 2D labeled, tabular data structure (like an Excel spreadsheet)

Creating and Working with Pandas DataFrames

a. Creating DataFrames

From a dictionary:

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

From a NumPy array:

arr = np.array([[1, 2], [3, 4]])

df2 = pd.DataFrame(arr, columns=['A', 'B'])

b. Accessing Data

Select a column:

print(df['Name'])

Select rows by index:

Gen AI / ML and Python 18

print(df.iloc[0]) # first row
print(df.loc[0]) # by label/index

Slicing rows:

print(df[0:2]) # first two rows

c. Basic Data Manipulation

Add a new column:

df['Salary'] = [50000, 60000, 70000]

Filter rows:

adults = df[df['Age'] > 28]

Drop a column:

df = df.drop('Salary', axis=1)

Handle missing values:

df.isnull()
df.fillna(0)
df.dropna()

Aggregation:

df['Age'].mean()
df.groupby('Name').sum()

Sorting:

Gen AI / ML and Python 19

df.sort_values('Age')

Practice Exercise Examples

Create a NumPy array of numbers from 10 to 19.

Add two NumPy arrays element-wise.

Create a Pandas DataFrame from a list of dictionaries.

Select all rows in the DataFrame where Age > 30.

Calculate the sum and mean of a DataFrame column.

Replace missing values in a DataFrame with the column mean.

Sort the DataFrame by a specific column.

Handle the missing data {'A': [1, 2, np.nan], 'B': [5, np.nan, np.nan]}

Introduction to Data Cleaning

Definition: Data cleaning is the process of fixing or removing incorrect,
corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset

Why it matters: Clean data leads to more accurate analysis and insights.
Messy data can cause errors, misleading results, or make analysis impossible.

Common Data Cleaning Steps:

Remove duplicate or irrelevant data (e.g., repeated rows, out-of-scope

entries)1.

Fix structural errors (e.g., typos, inconsistent capitalization, mixed formats)1.

Handle missing values (e.g., fill with mean/median, remove rows/columns)2.

Standardize data (e.g., consistent date formats, units, text casing).

Example:

import pandas as pd

Gen AI / ML and Python 20

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Alice', 'Charlie', 'bob'],
'Score': [85, 90, 85, None, 90]
})
df = df.drop_duplicates()
df['Name'] = df['Name'].str.capitalize()
df['Score'] = df['Score'].fillna(df['Score'].mean())
print(df)

Data Filtering
Definition: Filtering means selecting rows that meet certain conditions,
helping you focus on relevant data

How to filter: Use boolean indexing in Pandas.

Example:

# Filter students with Score above 85

filtered_df = df[df['Score'] > 85]
print(filtered_df)

Combine multiple conditions using & (and) or | (or)4:

# Students named 'Bob' with Score above 85

filtered_df = df[(df['Name'] == 'Bob') & (df['Score'] > 85)]
print(filtered_df)

Data Sorting
Definition: Sorting arranges your data by the values in one or more columns,
making it easier to spot patterns or outliers35.

How to sort: Use .sort_values() in Pandas.

Example:

Gen AI / ML and Python 21

# Sort by Score in descending order
sorted_df = df.sort_values('Score', ascending=False)
print(sorted_df)

Sort by multiple columns:

# Sort by Name (A-Z), then by Score (high to low)

sorted_df = df.sort_values(['Name', 'Score'], ascending=[True, False])
print(sorted_df)

Simple Data Analysis Project

Project Idea:
Analyze a small dataset (e.g., students and scores, product sales, or Titanic
dataset) using the above techniques.
Project Steps:

1. Load the data (from a CSV or dictionary).

2. Clean the data (remove duplicates, fix names, handle missing values).

3. Filter the data (e.g., select students with high scores, products with sales
above a threshold).

4. Sort the data (e.g., by score, by product price).

5. Draw simple conclusions (e.g., who has the highest score? How many
products sold more than 10 units?).

Example:

import pandas as pd

# 1. Load data
data = {
'Student': ['Alice', 'Bob', 'Alice', 'Charlie', 'David'],
'Score': [85, 90, 85, None, 75]

Gen AI / ML and Python 22

}
df = pd.DataFrame(data)

# 2. Clean data
df = df.drop_duplicates()
df['Score'] = df['Score'].fillna(df['Score'].mean())

# 3. Filter: Scores above 80

high_scores = df[df['Score'] > 80]

# 4. Sort: By Score descending

sorted_scores = high_scores.sort_values('Score', ascending=False)

# 5. Analyze: Highest scorer

top_student = sorted_scores.iloc[0]

print("Cleaned Data:\n", df)

print("High Scores:\n", high_scores)
print("Sorted High Scores:\n", sorted_scores)
print("Top Student:\n", top_student)

In-Class Activity
Give students a small CSV or dictionary-based dataset.

Ask them to:

Remove duplicates

Fill missing values

Filter for a specific condition (e.g., scores above a threshold)

Sort the results

Print the top result

Data Visualization & Project

Gen AI / ML and Python 23

Why Data Visualization?
Data visualization helps you understand data, spot trends, and communicate
insights effectively.

Python’s most popular libraries for visualization are Matplotlib and Seaborn

2. Introduction to Matplotlib
Matplotlib is a foundational plotting library in Python, offering flexibility to
create a wide variety of static, animated, and interactive plots

It’s often imported as import matplotlib.pyplot as plt .

Basic Line Plot Example

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

plt.plot(x, y)
plt.title('Simple Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

You can customize colors, line styles, and add titles/labels easily

Other Basic Plots in Matplotlib

Bar Chart:

students = ["Alice", "Bob", "Charlie"]

scores = [85, 90, 78]
plt.bar(students, scores, color='skyblue')
plt.title("Student Scores")
plt.xlabel("Student")

Gen AI / ML and Python 24

plt.ylabel("Score")
plt.show()

Pie Chart:

labels = ["Python", "Java", "C++"]

sizes = [50, 30, 20]
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.title("Programming Language Popularity")
plt.show()

Scatter Plot:

x = [1, 2, 3, 4, 5]
y = [5, 7, 4, 6, 5]
plt.scatter(x, y)
plt.title("Scatter Plot Example")
plt.xlabel("X Value")
plt.ylabel("Y Value")
plt.show()

Introduction to Seaborn
Seaborn is built on top of Matplotlib and provides a higher-level, more user-
friendly interface for creating attractive statistical graphics

It works seamlessly with Pandas DataFrames and comes with better default
styles and color palettes

Getting Started with Seaborn

import seaborn as sns

import matplotlib.pyplot as plt

sns.set_theme() # Apply Seaborn's default styling

Gen AI / ML and Python 25

Basic Seaborn Plots
Histogram:

import seaborn as sns

import matplotlib.pyplot as plt

data = [1, 2, 2, 3, 3, 3, 4, 4, 5]
sns.histplot(data)
plt.title("Histogram Example")
plt.show()

Scatter Plot:

import seaborn as sns

import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.title("Total Bill vs Tip")
plt.show()

Line Plot:

import seaborn as sns

import matplotlib.pyplot as plt

fmri = sns.load_dataset("fmri")
sns.lineplot(x="timepoint", y="signal", data=fmri)
plt.title("FMRI Signal Over Time")
plt.show()

4. Matplotlib vs. Seaborn: When to Use Which?

Gen AI / ML and Python 26

Matplotlib: More control and customization, suitable for publication-quality
graphics and unique plot types

Seaborn: Simpler code for statistical plots, better default styles, and works
well for quick exploratory analysis

Hands-On Practice
Exercise Ideas:

Plot a bar chart of your favorite fruits and their quantities.

Visualize random numbers as a histogram using both Matplotlib and Seaborn.

Use Seaborn to plot a scatter plot from the built-in "tips" dataset.

Introduction to Machine Learning

What is Machine Learning (ML)?
Machine Learning (ML) is a subset of artificial intelligence (AI) that enables
computers to learn from data and make predictions or decisions without being
explicitly programmed for each task.

ML algorithms identify patterns in data, learn from past experiences, and

improve their performance over time with minimal human intervention

The process involves feeding large amounts of data to algorithms, which then
optimize their internal parameters to minimize errors and make accurate
predictions or classifications

Types of Machine Learning

Supervised Learning
Definition: The algorithm is trained on labeled data, meaning each input
comes with a known output.

Goal: Learn a mapping from inputs to outputs so it can predict the output for
new, unseen data.

Gen AI / ML and Python 27

Common Algorithms: Linear regression, logistic regression, decision trees,
support vector machines, neural networks.

Use Cases: Email spam detection, image classification, credit scoring, medical
diagnosis

Unsupervised Learning
Definition: The algorithm is given data without explicit labels and must find
patterns or groupings on its own.

Goal: Discover hidden structures or relationships in the data.

Common Algorithms: K-means clustering, hierarchical clustering, principal

component analysis (PCA), association rule learning.

Use Cases: Customer segmentation, anomaly detection, market basket

analysis, dimensionality reduction

Reinforcement Learning
Definition: An agent learns to make decisions by interacting with an
environment, receiving rewards or penalties for actions.

Goal: Maximize cumulative reward over time.

Use Cases: Robotics, game playing, recommendation systems, autonomous

vehicles

Machine Learning Workflow

A typical ML workflow consists of several key stages:

1. Problem Definition: Clearly define the business or research problem and

success criteria.

2. Data Collection: Gather relevant and high-quality data from various sources.

3. Data Preparation: Clean, preprocess, and transform data (handle missing

values, encode categories, normalize features).

4. Exploratory Data Analysis (EDA): Analyze data to understand distributions,

relationships, and potential issues.

Gen AI / ML and Python 28

5. Model Selection: Choose appropriate algorithms based on the problem
(classification, regression, clustering, etc.).

6. Model Training: Fit the model to the training data, adjusting parameters to
minimize errors.

7. Model Evaluation: Assess model performance using metrics (accuracy,

precision, recall, RMSE, etc.) on validation/test data.

8. Model Tuning: Optimize hyperparameters to improve performance.

9. Deployment: Integrate the trained model into production systems for real-
world use.

10. Monitoring and Maintenance: Continuously monitor model performance and

retrain as needed.

Real-World Applications of Machine Learning

Healthcare: Disease prediction, medical image analysis, drug discovery.

Finance: Fraud detection, credit scoring, algorithmic trading.

Retail: Recommendation systems, customer segmentation, inventory

management.

Transportation: Self-driving cars, route optimization, demand forecasting.

Natural Language Processing: Chatbots, sentiment analysis, language

translation.

Manufacturing: Predictive maintenance, quality control, supply chain

optimization.

Introduction to Scikit-learn and Building Your First ML Model

What is Scikit-learn?
Scikit-learn (sklearn) is a popular open-source Python library for machine
learning.

It provides simple and efficient tools for data mining and data analysis,
supporting both supervised and unsupervised learning.

Gen AI / ML and Python 29

Built on top of NumPy, SciPy, and Matplotlib, it offers a consistent interface for
a wide range of algorithms, including classification, regression, clustering, and
dimensionality reduction.

Scikit-learn is widely used in industry and academia due to its ease of use,
extensive documentation, and active community support.

Key Features
Ready-to-use algorithms for classification, regression, clustering, and more.

Tools for data preprocessing, model selection, and evaluation.

Built-in datasets for practice (e.g., Iris, Digits, Boston Housing).

Integration with other Python libraries for data science workflows.

Building Your First ML Model with Scikit-learn

Example: Classification with the Iris Dataset

Step-by-Step Process:

1. Import Libraries and Load Data

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

2. Split Data into Training and Test Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_sta

3. Initialize and Train the Model

Gen AI / ML and Python 30

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

4. Make Predictions

y_pred = model.predict(X_test)

5. Evaluate the Model

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")

Interpretation: The model predicts the species of iris flowers based on

features like petal and sepal length/width. The accuracy score indicates how
well the model performs on unseen data.

Example: Simple Regression

Using the Boston Housing Dataset (for regression tasks):

from sklearn.datasets import load_boston

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load dataset
boston = load_boston()
X, y = boston.data, boston.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=

# Train model
reg = LinearRegression()
reg.fit(X_train, y_train)

Gen AI / ML and Python 31

# Predict and evaluate
y_pred = reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

Interpretation: The model predicts house prices based on features like

number of rooms, location, etc. The mean squared error (MSE) measures
prediction accuracy for regression tasks.

Key Takeaways:
Machine learning enables systems to learn from data and make predictions.

Supervised and unsupervised learning are the two main types, each with
distinct use cases.

The ML workflow involves data preparation, model training, evaluation, and

deployment.

Scikit-learn simplifies building, evaluating, and deploying ML models in

Python, making it accessible for beginners and professionals alike.

Supervised Learning in Practice

Linear Regression — Theory, Implementation, Evaluation Metrics

Theory of Linear Regression

Definition:
Linear regression models the relationship between one or more independent
variables (predictors) and a continuous dependent variable by fitting a linear
equation to observed data.

Mathematical Model:y=β0+β1x+ϵ
For simple linear regression (one predictor), the model is:
y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilon
where:

Gen AI / ML and Python 32

yyy = dependent variable

xxx = independent variable

β0\beta_0β0 = intercept (value of yyy when x=0x=0x=0)

β1\beta_1β1 = slope (change in yyy per unit change in xxx)

ϵ\epsilonϵ = error term (difference between observed and predicted

values)

Goal:
Find β0\beta_0β0 and β1\beta_1β1 that minimize the sum of squared residuals
(differences between observed and predicted yyy) — this is called the
Ordinary Least Squares (OLS) method.

Multiple Linear Regression:y=β0+β1x1+β2x2+ ⋯+βpxp+ϵ

Extends to multiple predictors:
y=β0+β1x1+β2x2+ ⋯+βpxp+ϵy = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots +
\beta_p x_p + \epsilon

Implementation of Linear Regression in Python

a) Manual Calculation (Using NumPy)

Calculate means of xxx and yyy.

Compute slope (β1\beta_1β1) and intercept (β0\beta_0β0) using

formulas:β1=∑(xi−xˉ)2∑(xi−xˉ)(yi−yˉ)β0=yˉ−β1xˉ
β1=∑(xi−xˉ)(yi−yˉ)∑(xi−xˉ)2\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}
{\sum (x_i - \bar{x})^2}
β0=yˉ−β1xˉ\beta_0 = \bar{y} - \beta_1 \bar{x}

Predict values using:y^=β0+β1x

y^=β0+β1x\hat{y} = \beta_0 + \beta_1 x

Example:

import numpy as np

Gen AI / ML and Python 33

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

x_mean = np.mean(x)
y_mean = np.mean(y)

B1 = np.sum((x - x_mean) * (y - y_mean)) / np.sum((x - x_mean) ** 2)

B0 = y_mean - B1 * x_mean

y_pred = B0 + B1 * x
print(f"Slope: {B1}, Intercept: {B0}")
print("Predicted values:", y_pred)

b) Using Scikit-learn
Import LinearRegression from sklearn.linear_model .

Fit the model on training data.

Predict and evaluate.

Example:

from sklearn.linear_model import LinearRegression

import numpy as np

X = np.array([[1], [2], [3], [4], [5]]) # 2D array for sklearn

y = np.array([2, 4, 5, 4, 5])

model = LinearRegression()
model.fit(X, y)

print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)

y_pred = model.predict(X)
print("Predictions:", y_pred)

Gen AI / ML and Python 34

c) Visualization with Matplotlib and SciPy
Use scipy.stats.linregress to get slope, intercept, and statistical measures.

Plot scatter and regression line.

Example:

import matplotlib.pyplot as plt

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

def predict(x):
return slope * x + intercept

y_pred = list(map(predict, x))

plt.scatter(x, y)
plt.plot(x, y_pred, color='red')
plt.show()

3. Evaluation Metrics for Linear Regression

Mean Squared Error (MSE): Average squared difference between actual and
predicted values.

Root Mean Squared Error (RMSE): Square root of MSE; interpretable in

original units.

Mean Absolute Error (MAE): Average absolute difference.

R-squared (R2R^2R2): Proportion of variance in dependent variable explained

by the model; ranges from 0 to 1.

Using Scikit-learn Metrics:

Gen AI / ML and Python 35

from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y, y_pred)

r2 = r2_score(y, y_pred)

print(f"MSE: {mse}")
print(f"R-squared: {r2}")

Logistic Regression
Purpose:
Used for binary classification problems (output is categorical: 0 or 1).

Theory:σ(z)=1+e−z1
Instead of predicting continuous values, logistic regression predicts the
probability that an input belongs to a class using the logistic (sigmoid)
function:
σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}
where z=β0+β1x1+ ⋯+βpxpz = \beta_0 + \beta_1 x_1 + \cdots + \beta_p
x_pz=β0+β1x1+ ⋯+βpxp.
Output:
A probability between 0 and 1, which is thresholded (commonly at 0.5) to
assign class labels.

Use Cases:
Spam detection, disease diagnosis, customer churn prediction.

Decision Trees
Definition:
A tree-like model of decisions that splits data based on feature values to
classify or predict outcomes.

How it Works:

Gen AI / ML and Python 36

The tree splits nodes by selecting the feature and threshold that best separate
classes (using criteria like Gini impurity or entropy).

Advantages:
Easy to interpret, handles both numerical and categorical data, non-linear
relationships.

Limitations:
Can overfit, sensitive to small data changes.

Hands-on with Scikit-learn: Logistic Regression and Decision

Trees

Logistic Regression Example (Iris Dataset)

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# For binary classification, select two classes

X = X[y != 2]
y = y[y != 2]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict and evaluate

Gen AI / ML and Python 37

y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Decision Tree Example (Iris Dataset)

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import classification_report

# Using same train/test split as above# Train decision tree

tree = DecisionTreeClassifier(random_state=42)
tree.fit(X_train, y_train)

# Predict and evaluate

y_pred_tree = tree.predict(X_test)
print(classification_report(y_test, y_pred_tree))

Clustering (K-Means), Dimensionality Reduction (PCA), and

Hands-on Examples

K-Means Clustering: Theory and Algorithm

What is K-Means?
K-Means is an unsupervised learning algorithm used to partition data into K
distinct clusters based on feature similarity.

How It Works:

1. Initialization: Randomly select K centroids (cluster centers).

2. Expectation Step: Assign each data point to the nearest centroid based on
Euclidean distance.

3. Maximization Step: Update centroids by calculating the mean of all points

assigned to each cluster.

4. Repeat steps 2 and 3 until centroids stabilize (no change in assignments).

Objective:

Gen AI / ML and Python 38

Minimize the sum of squared errors (SSE) — the sum of squared distances
between points and their cluster centroids.

Challenges:

Choosing the right K (number of clusters).

Sensitivity to centroid initialization (can lead to different results).

Non-deterministic; often run multiple times with different initializations.

Elbow Method:
Plot SSE against different values of K to find the "elbow" point where adding
more clusters yields diminishing returns.

Dimensionality Reduction: Principal Component Analysis (PCA)

Purpose:
Reduce the number of features (dimensions) in a dataset while preserving as
much variance (information) as possible.

How PCA Works:

Computes new orthogonal axes (principal components) that capture

maximum variance.

First principal component captures the most variance, second is

orthogonal and captures the next most, and so forth.

Data is projected onto these components, reducing dimensionality.

Benefits:

Simplifies visualization (e.g., 2D or 3D plots).

Reduces noise and computational cost.

Helps avoid the “curse of dimensionality” in machine learning.

Hands-on Example: K-Means Clustering with PCA in Python

from sklearn.datasets import load_iris

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

Gen AI / ML and Python 39

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Load dataset
data = load_iris()
X = data.data

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Reduce dimensions to 2 for visualization

pca = PCA(n_components=2, random_state=42)
X_pca = pca.fit_transform(X_scaled)

# Apply K-Means clustering

kmeans = KMeans(n_clusters=3, init='k-means++', n_init=50, max_iter=500, rand
clusters = kmeans.fit_predict(X_pca)

# Plot clusters
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=clusters, cmap='viridis')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('K-Means Clustering with PCA')
plt.show()

Model Evaluation Metrics

1. Accuracy
Proportion of correct predictions (both true positives and true negatives) over
total predictions.

Best for balanced datasets.

2. Precision

Gen AI / ML and Python 40

Measures how many predicted positives are actually
positive.Precision=TP+FPTP
Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}

3. Recall (Sensitivity)
Measures how many actual positives were correctly
identified.Recall=TP+FNTP
Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}

4. F1-Score
Harmonic mean of precision and recall, balancing
both.F1=2×Precision+RecallPrecision×Recall
F1=2×Precision×RecallPrecision+RecallF1 = 2 \times \frac{\text{Precision}
\times \text{Recall}}{\text{Precision} + \text{Recall}}

5. Confusion Matrix

Predicted Positive Predicted Negative

Actual Positive True Positive (TP) False Negative (FN)

Actual Negative False Positive (FP) True Negative (TN)

Shows counts of true/false positives and negatives.

6. Cross-Validation
Technique to assess model generalization by splitting data into multiple
train/test folds.

Common method: k-fold cross-validation, where data is divided into k subsets;

each subset is used once as test data while the others train the model.

Helps avoid overfitting and provides robust performance estimates.

Introduction to Deep Learning and Neural Networks

What is Deep Learning?

Gen AI / ML and Python 41

A subset of machine learning that uses artificial neural networks with many
layers (deep architectures) to model complex patterns in data.

Excels at tasks like image recognition, natural language processing, and

speech recognition.

Learns hierarchical feature representations automatically.

Neural Network Basics

Neuron:
Basic computational unit that receives inputs, applies weights, adds bias, and
passes the result through an activation function.

Weights and Biases:

Parameters learned during training that determine the importance of inputs.

Activation Functions:
Introduce non-linearity; common types include:

Sigmoid: Outputs values between 0 and 1.

ReLU (Rectified Linear Unit): Outputs zero for negative inputs, linear for
positive.

Tanh: Outputs values between -1 and 1.

Layers:

Input layer: Receives raw data.

Hidden layers: Perform transformations.

Output layer: Produces final prediction.

Building a Simple Neural Network with TensorFlow/Keras: Digit

Recognition (MNIST)

Step-by-step example:

import tensorflow as tf
from tensorflow.keras import layers, models

Gen AI / ML and Python 42

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess data
X_train = X_train.reshape(-1, 28*28).astype('float32') / 255
X_test = X_test.reshape(-1, 28*28).astype('float32') / 255

y_train = to_categorical(y_train, 10)

y_test = to_categorical(y_test, 10)

# Build model
model = models.Sequential([
layers.Dense(128, activation='relu', input_shape=(784,)),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train model
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate model
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")

Explanation:

Flatten 28x28 images into 784-dimensional vectors.

Two hidden layers with ReLU activation.

Gen AI / ML and Python 43

Output layer with softmax for multi-class classification (digits 0-9).

Use categorical_crossentropy loss and adam optimizer.

Introduction to Generative AI

What is Generative AI?

Generative AI is a branch of artificial intelligence focused on creating new, original
content—such as text, images, music, or code—by learning patterns from large
datasets. Unlike traditional AI, which is designed to analyze data and make
predictions or decisions based on predefined rules, generative AI can produce
outputs that did not previously exist, mimicking creativity and innovation

Generative AI vs. Traditional AI

Aspect Traditional AI Generative AI

Analyzes data, makes predictions, Creates new content (text, images,

Core Function
classifies code, music, etc.)

Pattern creation, self-learning from

Approach Rule-based, pattern recognition
data

Predictions, classifications, Original content (stories, images,

Output
recommendations code, etc.)

Spam filters, recommendation ChatGPT, DALL·E, GitHub Copilot,

Examples
systems, chatbots music generators

Data Type Structured data Structured & unstructured data

Capable of producing creative,

Creativity Limited to defined tasks
novel outputs

Key Difference:
Traditional AI is reactive and task-oriented, excelling at analyzing and predicting
within set boundaries. Generative AI is proactive, capable of producing new,
creative content by learning from existing data

Key Applications of Generative AI

Gen AI / ML and Python 44

Text Generation: Chatbots (ChatGPT), content writing, translation,
summarization.

Image Generation: Creating artwork (DALL·E, Midjourney), photo editing, style

transfer.

Code Generation: Writing and completing code (GitHub Copilot, OpenAI

Codex).

Audio & Music: Composing music, generating synthetic voices.

Video & Animation: Generating video content, deepfakes, animation.

Other Areas: Drug discovery, synthetic data creation, personalized

recommendations.

Summary
Generative AI represents a shift from AI systems that simply analyze or classify
data to those that can create entirely new content, opening up new possibilities in
creativity, productivity, and problem-solving across industries

Hands-on—Using OpenAI API or HuggingFace Transformers to

Generate Text; Prompt Engineering Basics

1. Introduction to Text Generation Tools

OpenAI API: Provides access to powerful language models (like GPT-3/4) that
can generate human-like text.

HuggingFace Transformers: An open-source library with many pre-trained

generative models (e.g., GPT-2, GPT-3, T5, BERT).

2. Hands-on: Generating Text

Using OpenAI API (Python Example)

import openai

client = OpenAI(

Gen AI / ML and Python 45

api_key=os.environ.get("OPENAI_API_KEY"),
)

response = client.responses.create(
model="o4-mini",
instructions="You are a concise assistant.",
input="Explain the difference between a list and a tuple in Python.",
)

print(response.output_text)

Replace "YOUR_API_KEY" with your actual OpenAI API key.

Using HuggingFace Transformers (Python Example)

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

prompt = "Once upon a time in a distant galaxy,"
result = generator(prompt, max_length=50, num_return_sequences=1)
print(result[0]['generated_text'])

3. Prompt Engineering Basics

What is Prompt Engineering?
The art of crafting effective input prompts to guide generative AI models
toward desired outputs.

Tips for Good Prompts:

Be clear and specific: "Summarize this article in three sentences."

Provide context: "Act as a travel guide and recommend places to visit in

Paris."

Use examples: "Translate the following English sentence to French: 'Hello,

how are you?'"

Gen AI / ML and Python 46

Experiment and iterate: Try different phrasings to get the best results.

4. Practice Exercise
Try generating:

A poem about summer.

A Python function that calculates factorial.

A product description for a new smartphone.

Experiment with prompt variations and observe how outputs change.

Summary
Generative AI enables machines to create new content, setting it apart from
traditional, rule-based AI.

Text generation is accessible using tools like OpenAI API or HuggingFace

Transformers.

Prompt engineering is key to getting high-quality, relevant outputs from

generative models.

Generative AI for Images and Code

Image Generation Basics — DALL-E, Stable Diffusion, Using Web
APIs and Demo Tools

Overview of Image Generation Models

DALL-E:
Developed by OpenAI, DALL-E (and its successors like DALL-E 2 and DALL-E
3) are powerful text-to-image models that generate high-quality, detailed
images from natural language prompts. DALL-E 3 improves over previous
versions by better understanding and rewriting prompts internally to produce
more compelling and accurate images.

Stable Diffusion:

Gen AI / ML and Python 47

An open-source text-to-image diffusion model that generates photorealistic
images by iteratively denoising random noise guided by a text prompt. It is
notable for being accessible for local installation and customization, unlike
some proprietary models.

How These Models Work (Briefly)

Diffusion Models:
Both DALL-E 2/3 and Stable Diffusion use diffusion techniques that start with
random noise and progressively refine it into a coherent image matching the
prompt. The process involves learning to reverse a noising process, guided by
text embeddings.

CLIP Model (DALL-E):

DALL-E uses a CLIP model to map text and images into a shared semantic
space, enabling the generation of images that semantically match the input
text.

Using DALL-E via Web API

OpenAI API:
You can generate images by sending a text prompt to the OpenAI API
specifying the DALL-E model version (e.g., DALL-E 3). The API returns URLs to
generated images.

Example Workflow:

Define a detailed text prompt describing the desired image.

Call the .images.generate() method with parameters like model , prompt , n

(number of images), and size .

Retrieve the image URL from the response and display or download it.

Tips for Better Results:

More detailed prompts yield higher quality images. DALL-E 3 internally
rewrites prompts to optimize generation.

Using Stable Diffusion Locally or via Colab

Gen AI / ML and Python 48

Local Setup:

Clone the Stable Diffusion repository.

Create a Conda environment with required dependencies.

Download the model weights (e.g., checkpoint v1.4).

Run commands like:

python scripts/txt2img.py --prompt "your prompt here" --ckpt sd-v1-4.ck

GPU usage is recommended for faster generation; CPU is possible but

slower (8-12 minutes per image).

Colab Notebooks:
Google Colab notebooks allow running Stable Diffusion without local setup,
with GPU acceleration available on Colab Pro.

Demo Tools and Platforms

Platforms like Midjourney, RunwayML, and Hugging Face Spaces provide web
interfaces to generate images using Stable Diffusion or DALL-E models.

These tools often allow prompt refinement, image upscaling, outpainting, and
blending modes for creative control.

Code Generation with Large Language Models (LLMs) — GitHub

Copilot, OpenAI Codex; Practical Exercises

What is Code Generation with LLMs?

LLMs like OpenAI Codex and GitHub Copilot are trained on vast amounts of
code and natural language, enabling them to generate code snippets,
complete functions, or even entire programs from natural language prompts or
partial code.

GitHub Copilot
An AI-powered code completion tool integrated into code editors (e.g., VS
Code).

Gen AI / ML and Python 49

Suggests code lines or blocks as you type, based on context.

Supports many languages and frameworks.

Helps accelerate development, reduce boilerplate, and learn new APIs.

OpenAI Codex
The underlying model powering Copilot.

Accessible via API for custom code generation tasks.

Can generate code from natural language prompts, translate between

languages, or explain code.

Practical Exercises
Exercise 1: Generate a function from a docstring prompt.
Prompt: "Write a Python function to check if a number is prime."
Expected output: Function code implementing prime check.

Exercise 2: Complete partial code snippets.

Provide a partial function and ask the model to complete it.

Exercise 3: Generate unit tests for existing functions.

Prompt the model to write test cases based on function definitions.

Exercise 4: Translate code from one language to another.

E.g., Python to JavaScript.

Exercise 5: Debugging assistance.

Provide buggy code and ask for corrections or explanations.

Best Practices
Always review generated code for correctness and security.

Use generated code as a starting point or assistant, not a final solution.

Combine with human expertise for best results.

Gen AI / ML and Python 50

Retrieval Augmented Generation & LLM Frameworks
The Limits of LLMs and the Need for RAG
Explanation:

LLMs (like GPT, Gemini) are trained on vast but static datasets. Their
knowledge is frozen at training time and may be outdated or incomplete.

LLMs can “hallucinate” (make up facts) and struggle with domain-specific

or up-to-date information.

Example:

Ask ChatGPT: “Who won the 2024 Olympics?” (It can’t answer accurately
if trained before 2024.)

Discussion:

Why is this a problem for real-world applications (e.g., customer support,

research, enterprise tools)?

What is RAG? (Retrieval Augmented Generation)

Definition:

RAG combines information retrieval (searching relevant documents/data)

with generative AI (LLMs) to produce grounded, accurate, and context-
aware outputs.

How it Works:

1. Retrieve: Search for relevant documents/passages from a knowledge

base (using keyword or semantic search, often with vector databases).

2. Augment: Provide the retrieved content as context to the LLM.

3. Generate: The LLM uses both its training and the fetched context to
answer the user’s query.

Diagram:

User Query → Retriever (search) → Relevant Docs → LLM (with docs as conte

Gen AI / ML and Python 51

Key Benefits:

Enhanced accuracy: Reduces hallucination by grounding answers in real

data.

Up-to-date information: Can access current knowledge without

retraining.

Domain adaptation: Easily apply LLMs to private or niche datasets.

Scalability: Add new knowledge without retraining the model.

RAG in Practice—Real-World Applications

Examples:

Enterprise chatbots answering questions from internal documentation.

Research assistants summarizing the latest scientific papers.

Customer support bots accessing product manuals and support tickets.

Demo:

Show a live RAG-powered chatbot (e.g., Bing Copilot, Gemini Advanced, or

a simple open-source demo).

How Retrieval Works (Under the Hood)

Retrieval Methods:

Keyword Search: Classic search (e.g., Elasticsearch).

Semantic Search: Uses embeddings/vectors to find similar meaning, not

just keywords.

Hybrid Search: Combines both, often with a re-ranker for best results.

Vector Databases:

Store documents as embeddings for fast, semantic retrieval (e.g.,

Pinecone, ChromaDB).

Multi-modal Retrieval:

Not just text—can retrieve images, audio, etc. using the same principles.

Gen AI / ML and Python 52

Intro to Langchain, LlamaIndex, Building a Simple Q&A
System

Introduction to Langchain & LlamaIndex

Langchain:

A Python framework for building LLM-powered applications, with tools for

chaining together retrieval, generation, and more.

LlamaIndex:

A toolkit for connecting LLMs to custom data sources. Makes it easy to

ingest, index, and query your own documents.

Why these tools?

They simplify RAG workflows and speed up prototyping.

Setting Up Your Environment

Install the libraries:

pip install langchain llama-index chromadb streamlit

Obtain API keys for your LLM provider (OpenAI, Gemini, etc.).

Building a Simple Q&A System with Langchain

Step 1: Prepare Your Data

Use a few sample text files, PDFs, or URLs as your knowledge base.

Step 2: Index the Data

Example (Langchain with ChromaDB):

from langchain.document_loaders import TextLoader

from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

Gen AI / ML and Python 53

loader = TextLoader("my_docs/")
documents = loader.load()
db = Chroma.from_documents(documents, OpenAIEmbeddings())

Step 3: Connect to an LLM

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(api_key="YOUR_API_KEY")

Step 4: Create the RetrievalQA Chain

from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=db.as_retriever()
)

Step 5: Ask Questions!

answer = qa.run("What is the main topic of document X?")

print(answer)

Building with LlamaIndex

Step 1: Ingest Data

from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data").load_data()

Step 2: Create an Index

from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex.from_documents(documents)

Gen AI / ML and Python 54

Step 3: Query the Index

response = index.query("What is this document about?")

print(response)

Optional:

Use LlamaIndex’s Data Connectors to pull in data from PDFs, SQL, APIs, etc..

Hands-On Activity—Build and Test Your Own Q&A Bot

Task:

Use Langchain or LlamaIndex to build a simple Q&A system over a small

document set (e.g., Wikipedia articles, class notes, or company docs).

Suggested Steps:

1. Load your documents.

2. Index them with embeddings.

3. Connect to an LLM.

4. Run queries and observe the answers.

Challenge:

Try modifying the retriever (e.g., switch from keyword to semantic search).

Add a Streamlit UI for interactive Q&A.

Project Ideas:
1. AI Art Generator for Game Assets (Using Stable Diffusion)
Description: Build a simple app that generates game art assets (characters,
backgrounds, items) from text prompts using Stable Diffusion.

Skills: Prompt engineering, API integration or local model usage, image saving
and display.

Gen AI / ML and Python 55

Tools: Stable Diffusion (local install with Automatic1111 WebUI or via
DreamStudio API), Python for scripting.

Why: Great for learning image generation basics and creating usable assets
for your own game projects.

Reference: Many beginners start with Stable Diffusion v1.5 or SDXL models
and user-friendly UIs like Fooocus or InvokeAI

2. Text-to-Image Web App with DALL-E API

Description: Create a web interface where users enter a text prompt and get
AI-generated images back using OpenAI’s DALL-E API.

Skills: REST API calls, frontend/backend integration, handling image URLs.

Tools: OpenAI API, Flask/Django or Node.js backend, React or plain HTML/JS

frontend.

Why: Learn how to integrate powerful generative AI models into web apps and
handle asynchronous API responses.

Reference: DALL-E API usage and prompt design tips

3. AI-Powered Image Style Transfer or Editing Tool

Description: Build a tool that applies different artistic styles or edits images
using AI models (e.g., Stable Diffusion inpainting or style transfer).

Skills: Image processing, model inference, UI for uploading and editing

images.

Tools: Stable Diffusion inpainting models, Python, OpenCV, Gradio or Streamlit

for UI.

Why: Hands-on experience with advanced generative AI features beyond

simple text-to-image generation.

Reference: Stable Diffusion’s editing capabilities and UI options

4. Code Generation Assistant Using OpenAI Codex or GitHub

Copilot API

Gen AI / ML and Python 56

Description: Build a simple code assistant chatbot that generates code
snippets based on user prompts or completes partial code.

Skills: NLP prompt engineering, API integration, conversational UI design.

Tools: OpenAI Codex API, Flask or FastAPI backend, React or plain JS

frontend.

Why: Learn code generation with LLMs and practical API usage for developer
tools.

Reference: Code generation with LLMs and practical exercises[Session 2

content].

5. AI-Powered Writing Assistant with Text Generation

Description: Develop an app that generates creative writing, summaries, or
paraphrases using GPT models.

Skills: Text generation, prompt tuning, UI/UX design.

Tools: OpenAI GPT API, Streamlit or Flask, basic frontend.

Why: Explore generative AI for NLP and content creation, useful for blogs,
marketing, or education.

6. Interactive Image Generation Playground with Multiple Models

Description: Build a playground app where users can generate images using
different models (DALL-E, Stable Diffusion, Midjourney API if available) and
compare results.

Skills: Multi-API integration, UI design, user input handling.

Tools: APIs for each model, React or Vue frontend, Node.js or Python
backend.

Why: Understand differences between generative models and provide users

with flexible creative tools.

Reference: Comparison of DALL-E, Midjourney, Stable Diffusion capabilities

7. AI-Powered Meme Generator

Gen AI / ML and Python 57

Description: Combine image generation with text overlay to create humorous
or themed memes from prompts.

Skills: Image generation, text rendering on images, web app development.

Tools: Stable Diffusion or DALL-E API, Pillow (Python imaging), Flask or React.

Why: Fun project to practice image generation and simple graphics

manipulation.

8. Personalized Avatar Creator Using Generative AI

Description: Generate custom avatars based on user descriptions or style
preferences. Include options for hair, clothes, background.

Skills: Prompt engineering, conditional generation, UI/UX design.

Tools: Stable Diffusion with control models, web frontend, backend API
integration.

Why: Practical use case for social apps or games, combining AI with user
inputs.

Gen AI / ML and Python 58

Python Cheat Sheet - Essential Quick and Easy Guide
No ratings yet
Python Cheat Sheet - Essential Quick and Easy Guide
47 pages
Python Lab
No ratings yet
Python Lab
108 pages
Advanced Programming 2024 Lecture 2a
No ratings yet
Advanced Programming 2024 Lecture 2a
70 pages
Python Mid Term Notes
No ratings yet
Python Mid Term Notes
42 pages
BASIC - FUNCTIONALITIES - OF - PYTHON (1) Vikas
No ratings yet
BASIC - FUNCTIONALITIES - OF - PYTHON (1) Vikas
52 pages
Module 1
No ratings yet
Module 1
117 pages
Python Introduction
No ratings yet
Python Introduction
50 pages
UNIT 1 PDS Notes
No ratings yet
UNIT 1 PDS Notes
83 pages
Unit 1
No ratings yet
Unit 1
69 pages
Apocalypse Sales Dashboard - III
No ratings yet
Apocalypse Sales Dashboard - III
21 pages
PFE 1021 TH Week1 N
No ratings yet
PFE 1021 TH Week1 N
30 pages
PyDays Day - 1
No ratings yet
PyDays Day - 1
21 pages
Python Class
No ratings yet
Python Class
44 pages
Learners Guide - Machine Learning and Advanced Analytics Using Python
No ratings yet
Learners Guide - Machine Learning and Advanced Analytics Using Python
44 pages
5.1 Python Workbook
0% (1)
5.1 Python Workbook
177 pages
01 Python Fundamentals and Jupyter Notebooks
No ratings yet
01 Python Fundamentals and Jupyter Notebooks
24 pages
Intership Body
No ratings yet
Intership Body
31 pages
Task 01
No ratings yet
Task 01
8 pages
Python Full Summary
No ratings yet
Python Full Summary
15 pages
DSA Lab 1
No ratings yet
DSA Lab 1
21 pages
PythonProgramming Unit1 Complete
No ratings yet
PythonProgramming Unit1 Complete
26 pages
Python-Programming by H - Panchal
No ratings yet
Python-Programming by H - Panchal
61 pages
Rneklerle Python 1724763272
No ratings yet
Rneklerle Python 1724763272
90 pages
11 Python
No ratings yet
11 Python
24 pages
Variable Declaration Python
No ratings yet
Variable Declaration Python
70 pages
PYTHON Notes by Devaraj
100% (1)
PYTHON Notes by Devaraj
40 pages
PYTHON&MYSQL
No ratings yet
PYTHON&MYSQL
14 pages
Pyp Ques Bank
No ratings yet
Pyp Ques Bank
24 pages
DFI Python Beginners Slides - 111226
No ratings yet
DFI Python Beginners Slides - 111226
122 pages
UserMan Firco Filter Engine V5.8.16.0
No ratings yet
UserMan Firco Filter Engine V5.8.16.0
355 pages
Python Lecture For Beginners
No ratings yet
Python Lecture For Beginners
45 pages
Python Module-1
No ratings yet
Python Module-1
34 pages
1.1introduction To Python
No ratings yet
1.1introduction To Python
85 pages
PowerBI Dashboard III
No ratings yet
PowerBI Dashboard III
21 pages
Week 1 2 Python Fundamentals
No ratings yet
Week 1 2 Python Fundamentals
43 pages
AI Lab Manual Version 1.3
100% (1)
AI Lab Manual Version 1.3
123 pages
(Reading) AfterWork - Python Programming Basics
No ratings yet
(Reading) AfterWork - Python Programming Basics
5 pages
Python Programming Notes
No ratings yet
Python Programming Notes
62 pages
Python
No ratings yet
Python
25 pages
Essentials of Python Unit1 Notes
No ratings yet
Essentials of Python Unit1 Notes
3 pages
Lab 01
No ratings yet
Lab 01
15 pages
Python Programming For Beginners - Sections 1 and 2
No ratings yet
Python Programming For Beginners - Sections 1 and 2
39 pages
Python Tutorial
No ratings yet
Python Tutorial
52 pages
pythonCW 86525
No ratings yet
pythonCW 86525
11 pages
Python Important Topics
No ratings yet
Python Important Topics
12 pages
Python ?
No ratings yet
Python ?
69 pages
Ict Revision Coding
No ratings yet
Ict Revision Coding
22 pages
Python Basics Simplified
No ratings yet
Python Basics Simplified
16 pages
IPL Dahboard III
No ratings yet
IPL Dahboard III
21 pages
Lab Experiment 1 - 2 AI
No ratings yet
Lab Experiment 1 - 2 AI
18 pages
Python Programming Essentials
No ratings yet
Python Programming Essentials
7 pages
Python Scripting For System Administration: Rebeka Mukherjee
No ratings yet
Python Scripting For System Administration: Rebeka Mukherjee
50 pages
Python Chapter 1
No ratings yet
Python Chapter 1
15 pages
Chapter 7 - Basics of Python in AI
No ratings yet
Chapter 7 - Basics of Python in AI
3 pages
AIML Short Term Internship Session 13 Summary-1719637291003
No ratings yet
AIML Short Term Internship Session 13 Summary-1719637291003
7 pages
Python 1
No ratings yet
Python 1
11 pages
12.1 Course Summary
No ratings yet
12.1 Course Summary
13 pages
Python ML Theory
No ratings yet
Python ML Theory
6 pages
pc2 Series
No ratings yet
pc2 Series
126 pages
Gaurav's MR, R
No ratings yet
Gaurav's MR, R
50 pages
PTPPT
No ratings yet
PTPPT
265 pages
Grade 3 Summer Holiday Homework
No ratings yet
Grade 3 Summer Holiday Homework
30 pages
Soal Dan Jawaban Bridge - EoIP - Wireless - Routing
No ratings yet
Soal Dan Jawaban Bridge - EoIP - Wireless - Routing
8 pages
Learn Python: September 7, 2016
No ratings yet
Learn Python: September 7, 2016
10 pages
Updated Unit 1
No ratings yet
Updated Unit 1
56 pages
Vision Ias Test 1 Answer Sheet
No ratings yet
Vision Ias Test 1 Answer Sheet
40 pages
Resume
No ratings yet
Resume
4 pages
Week 11-12 Module
No ratings yet
Week 11-12 Module
31 pages
BOMA 2019 Industrial Standard Fact Sheet
No ratings yet
BOMA 2019 Industrial Standard Fact Sheet
1 page
Marketing Management For HSBC
No ratings yet
Marketing Management For HSBC
11 pages
Testing of Circuit Breaker and Over Current Relay Implementation by Using MATLAB / SIMULINK
No ratings yet
Testing of Circuit Breaker and Over Current Relay Implementation by Using MATLAB / SIMULINK
13 pages
U Like Unicorn Second Grade Reading Comprehension Worksheet
No ratings yet
U Like Unicorn Second Grade Reading Comprehension Worksheet
3 pages
The Impact of Gamification On Students L
No ratings yet
The Impact of Gamification On Students L
8 pages
Planning, Analysis & Design of A Four-Stored Residential Bulding by Using Staad Pro
No ratings yet
Planning, Analysis & Design of A Four-Stored Residential Bulding by Using Staad Pro
2 pages
Status ID IMEI/Serial Successful Successful Successful Successful Successful
No ratings yet
Status ID IMEI/Serial Successful Successful Successful Successful Successful
10 pages
2 - Irp548
No ratings yet
2 - Irp548
4 pages
NCERT Solutions For Class 10 Maths Unit 2
No ratings yet
NCERT Solutions For Class 10 Maths Unit 2
18 pages
Articulo Ingenieria
No ratings yet
Articulo Ingenieria
4 pages
HMAC Algorithm Stands For Hashed or Hash Based Message Authentication Code
No ratings yet
HMAC Algorithm Stands For Hashed or Hash Based Message Authentication Code
4 pages
Principles of Communication Comparison
100% (1)
Principles of Communication Comparison
7 pages
Auditing in Oracle 10g Release 2
No ratings yet
Auditing in Oracle 10g Release 2
9 pages
CD Unit-I, II Question Bank
No ratings yet
CD Unit-I, II Question Bank
2 pages
PVH Dbox5 11
No ratings yet
PVH Dbox5 11
1 page
6 Pci
No ratings yet
6 Pci
2 pages
JournalNX - Smart Public Transport
No ratings yet
JournalNX - Smart Public Transport
3 pages
Eleonora Vasilyovska
No ratings yet
Eleonora Vasilyovska
3 pages
Storage
No ratings yet
Storage
3 pages
Registration Form: Innovation Challenge 2018
No ratings yet
Registration Form: Innovation Challenge 2018
2 pages
Python Power: For Absolute Beginners
From Everand
Python Power: For Absolute Beginners
Harinav
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet