0% found this document useful (0 votes)
6 views65 pages

Shubhash

The document is an internship report by Shubhash N, detailing his experience creating a calculator using Python during his internship at YBI Foundation from April 10 to May 10, 2025. It outlines the purpose, scope, and outcomes of the internship, including the development process and the skills acquired. The report also includes acknowledgments, organizational profiles, and a summary of the software development lifecycle followed during the project.

Uploaded by

manskow37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views65 pages

Shubhash

The document is an internship report by Shubhash N, detailing his experience creating a calculator using Python during his internship at YBI Foundation from April 10 to May 10, 2025. It outlines the purpose, scope, and outcomes of the internship, including the development process and the skills acquired. The report also includes acknowledgments, organizational profiles, and a summary of the software development lifecycle followed during the project.

Uploaded by

manskow37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

An Internship Report On

Calculator using Python


Submitted in the partial fulfillment of requirement
For the 6th Semester of

BACHELOR OF COMPUTER APPLICATIONS

Of

Submitted by

Shubhash .N
U18EX22S0065

Under the guidance of

Mrs. Shilpa Nayak


Assistant Professor
Department of BCA

Nagarjuna Degree College


(Affiliated to Bengaluru City University)
# 38/36, Ramagondanahalli, Yelhanka, Bengaluru -560 064.
2024-2025

1
[Certificate of Internship from company letter head]

Certificate of Internship

This is to certify that Shubhash.N, bearing register number U18EX22S0065 student of


Nagarjuna Degree College. He has successfully completed internship course from 10-04-2025 to
10-05-2025 from our organization. His conduct during stay with us satisfactory.
I wish all the best for future endeavor.

Signature of authority
Seal of Organization

2
CERTIFICATE
This to certify that the internship entitled

Calculator using python

Submitted in the partial fulfillment of requirement of the 6th Semester of


Bachelor of Computer Applications

is a result of the bonafide work carried out by

Shubhash .N
U18EX22S0065
BCA C1

From 10th April 2025 to 10th May 2025 at YBI Foundation.

Internship Guide
Mrs. Shilpa Nayak
Assistant Professor

Head of the Department Principal


Mrs. Uma S Dr. Harish Babu

3
NAGARJUNA DEGREE COLLEGE
(Affiliated to Bengaluru City University)
# 38/36, Ramagondanahalli, Yelahanka, Bengaluru -560 064.
Phone: 6364231112
E Mail: [email protected] Website: www.nagarjunadegreecollege.co.in

Student Declaration

I SHUBHASH .N , hereby declare that this report entitled “Calculator using python “is uniquely
prepared by me during my internship at YBI Foundation from 10th April 2025 to 10th May2025.
This report is submitted as a partial fulfillment of the requirements for the VI Semester BCA
Internship Course.

Signature
Shubhash.N

4
ACKNOWLEDGEMENT

Firstly I would like to thank [Shri. Dr Alok Yadav of YBI Foundation for giving an
opportunity to do an internship within the organization.

We are indebted to our cherished Principal, Dr. Harish Babu S, whose support and permission
were instrumental in undertaking this dissertation. Their guidance throughout this challenging
endeavor has been invaluable.

We are deeply grateful to our respected Head of Department, Mrs. Uma, whose support and
permission were instrumental in enabling us to embark on this dissertation journey and
guiding us through its completion.

Our sincere appreciation goes to our dedicated guide, Mrs. Shilpa Nayak, for their invaluable
mentorship and timely assistance in bringing our project to fruition.

We are also indebted to the members of the Computer Department for their guidance and
support throughout this endeavor.

Last but not least, we express our heartfelt gratitude to our beloved parents and dear friends
for their unwavering encouragement and inspiration.

Shubhash .N
U18EX22S0065

5
CONTENTS
Serial No. Page No

1 Executive Summary 1-4

1.1 Purpose 1

1.2 Scope of the internship 1-3

1.3 Outcome of the internship 3-4

2 Introduction and Organizational Profile 4-8

2.1 Scope of work in company 4-5

2.2 Domain description 5-6

2.3 Organizational Profile 6-8

2.3.1 Products & Clients 8

3 Work Description 9-13

3.1 Organizational Chart 9-10

3.2 Interns job role description 10-11

3.3 Programming language/concept/technology 11-12

3.4 Software/hardware used 12-13

4 Learning Outcome 13-24

4.1 Abstract of work experience 13-14

4.2 Any application development/ technology learnt 14-24

5 Bibliography 25

Annexures

Letter of application to the employer for internship [ from college]

Letter of Acceptance by Employer [internship confirmation letter by Employer]

Log sheet Provided by Company

Questionaries

6
7
1. Executive Summary
1.1 Purpose
The purpose of this internship was to provide hands-on experience in software development
through a practical project. I was assigned the task of creating a calculator using Python. This task
aimed to reinforce my foundational understanding of Python programming, GUI development,
logic building, and application deployment. The project enabled me to bridge the gap between
theoretical knowledge and practical implementation, thereby enhancing my problem-solving
capabilities, algorithmic thinking, and user interface design skills.
1.2 Scope of the Internship
The internship, conducted under the guidance of experienced mentors at the YBI Foundation,
focused on software development and digital skills enhancement. As part of the broader objective
to empower youth through technology, I was given the opportunity to develop a functional
calculator application. The project was designed to include key programming concepts such as
functions, event-driven programming, data validation, and GUI creation using libraries like
Tkinter. It also included phases of design, implementation, testing, and debugging.
1.3 Outcome of the Internship
The final outcome of the internship was a fully functional, user-friendly calculator built using
Python. This calculator could perform fundamental arithmetic operations (addition, subtraction,
multiplication, division) and some advanced operations such as square root, percentage, and
modulus. Additionally, I gained proficiency in Python syntax, GUI design, event handling,
exception handling, and debugging techniques. The project improved my logical reasoning and
provided real-world experience of building a complete software application from scratch.

2. Introduction and Organizational Profile


2.1 Scope of Work in Company
At YBI Foundation, the internship program was designed to nurture technical capabilities in
young learners by allowing them to develop small-scale but meaningful projects. My role was
centered on developing a desktop calculator application. The scope of work included researching
similar existing applications, designing a layout, implementing logic using Python, integrating a
GUI, handling user inputs, testing the application, and preparing final documentation. The entire
lifecycle of software development was covered during this internship.
2.2 Domain Description
The domain of this project lies in Software Development, more specifically Desktop Application
Development using Python. This field involves writing programs that run on personal computers
and are designed to solve specific user problems. Creating a calculator is a foundational exercise
in this domain as it involves handling arithmetic operations, user interfaces, and program logic. It
serves as a stepping stone to more advanced applications and builds strong fundamentals for any

8
aspiring developer.
2.3 Organizational Profile
YBI Foundation (Youth Empowerment and Digital Skills Training Organization) is a non-profit,
tech-oriented institution focused on empowering youth with essential digital and technological
skills. The foundation organizes multiple training programs and internships to enhance
employability and foster innovation among youth. YBI emphasizes hands-on learning and project-
based internships to encourage students to build real applications.
2.3.1 Products & Clients
YBI Foundation offers training programs in:
 Web Development
 Python Programming
 Data Science
 Android App Development
 Machine Learning
 Internet of Things (IoT)
Clients and Beneficiaries:
 Students from various academic backgrounds
 Young job seekers
 Educational institutions
 NGOs focused on digital literacy
 Government youth development initiatives

3. Work Description
3.1 Organizational Chart
Chairman

Program Director

Technical Trainers

Internship Coordinators

Interns (Students, including me)
I worked under the guidance of a technical trainer and coordinator who evaluated weekly progress
and provided feedback and support.

9
3.2 Intern’s Job Role Description
As an intern, my primary responsibility was to:
 Understand the requirements of the calculator project
 Learn and apply Python programming to implement the logic
 Design a simple but functional user interface
 Handle user inputs and events
 Debug and test the application thoroughly
 Document the development process
Additional tasks included participating in review meetings, preparing presentations, and
collaborating with fellow interns to learn and share feedback.
3.3 Programming Language/Concept/Technology
Language Used: Python
Key Concepts Applied:
 Variables and Data Types
 Control Statements (if-else, loops)
 Functions and Modular Programming
 Exception Handling
 GUI Programming using Tkinter
 Event Handling
 Data Binding and User Input Validation
This project helped me understand how programming logic is mapped to real-world functionality
and how to interact with users via graphical components.
3.4 Software/Hardware Used
Software:
 Python 3.x
 Tkinter Library (built-in)
 Visual Studio Code / PyCharm IDE
 GitHub (for version control and submission)
Hardware:
 Laptop with minimum configuration: i5 Processor, 8GB RAM, Windows 10 OS
The calculator app was designed to be lightweight, so it could run efficiently on any standard PC
or laptop.

10
SOURCE CODE:
# -*- coding: utf-8 -*-
from tkinter import Tk, END, Entry, N, E, S, W, Button
from tkinter import font
from tkinter import Label
from functools import partial

def get_input(entry, argu):


entry.insert(END, argu)

def backspace(entry):
input_len = len(entry.get())
entry.delete(input_len - 1)

def clear(entry):
entry.delete(0, END)

def calc(entry):
input_info = entry.get()
try:
if(input_info==""):
popup = Tk()
popup.resizable(0, 0)
popup.geometry("150x100")
popup.title("Alert")
label = Label(popup, text="Enter valid values")
label.pack(side="top", fill="x", pady=10)
B1 = Button(popup, text="Okay", bg="#DDDDDD", command=popup.destroy)
B1.pack()
else:
output = str(eval(input_info.strip()))

except ZeroDivisionError:
popupmsg()

11
output = ""
clear(entry)
entry.insert(END, output)

def popupmsg():
popup = Tk()
popup.resizable(0, 0)
popup.geometry("150x100")
popup.title("Alert")
label = Label(popup, text="Cannot divide by 0 ! \n Enter valid values")
label.pack(side="top", fill="x", pady=10)
B1 = Button(popup, text="Okay", bg="#DDDDDD", command=popup.destroy)
B1.pack()

def cal():
root = Tk()
root.title("Calc")
root.resizable(0, 0)

entry_font = font.Font(size=15)
entry = Entry(root, justify="right", font=entry_font)
entry.grid(row=0, column=0, columnspan=4,
sticky=N + W + S + E, padx=5, pady=5)

cal_button_bg = '#FF6600'
num_button_bg = '#4B4B4B'
other_button_bg = '#DDDDDD'
text_fg = '#FFFFFF'
button_active_bg = '#C0C0C0'

num_button = partial(Button, root, fg=text_fg, bg=num_button_bg,


padx=10, pady=3, activebackground=button_active_bg)
cal_button = partial(Button, root, fg=text_fg, bg=cal_button_bg,
padx=10, pady=3, activebackground=button_active_bg)

12
button7 = num_button(text='7', bg=num_button_bg,
command=lambda: get_input(entry, '7'))
button7.grid(row=2, column=0, pady=5)

button8 = num_button(text='8', command=lambda: get_input(entry, '8'))


button8.grid(row=2, column=1, pady=5)

button9 = num_button(text='9', command=lambda: get_input(entry, '9'))


button9.grid(row=2, column=2, pady=5)

button10 = cal_button(text='+', command=lambda: get_input(entry, '+'))


button10.grid(row=4, column=3, pady=5)

button4 = num_button(text='4', command=lambda: get_input(entry, '4'))


button4.grid(row=3, column=0, pady=5)

button5 = num_button(text='5', command=lambda: get_input(entry, '5'))


button5.grid(row=3, column=1, pady=5)

button6 = num_button(text='6', command=lambda: get_input(entry, '6'))


button6.grid(row=3, column=2, pady=5)

button11 = cal_button(text='-', command=lambda: get_input(entry, '-'))


button11.grid(row=3, column=3, pady=5)

button1 = num_button(text='1', command=lambda: get_input(entry, '1'))


button1.grid(row=4, column=0, pady=5)

button2 = num_button(text='2', command=lambda: get_input(entry, '2'))


button2.grid(row=4, column=1, pady=5)

button3 = num_button(text='3', command=lambda: get_input(entry, '3'))


button3.grid(row=4, column=2, pady=5)

13
button12 = cal_button(text='*', command=lambda: get_input(entry, '*'))
button12.grid(row=2, column=3, pady=5)

button0 = num_button(text='0', command=lambda: get_input(entry, '0'))


#button0.grid(row=5, column=0, columnspan=2, padx=3, pady=5, sticky=N + S + E + W)
button0.grid(row=5, column=0, pady=5)

button13 = num_button(text='.', command=lambda: get_input(entry, '.'))


button13.grid(row=5, column=1, pady=5)

button14 = Button(root, text='/', fg=text_fg, bg=cal_button_bg, padx=10, pady=3,


command=lambda: get_input(entry, '/'))
button14.grid(row=1, column=3, pady=5)

button15 = Button(root, text='Del', bg=other_button_bg, padx=10, pady=3,


command=lambda: backspace(entry), activebackground=button_active_bg)
button15.grid(row=1, column=0, columnspan=2,
padx=3, pady=5, sticky=N + S + E + W)

button16 = Button(root, text='C', bg=other_button_bg, padx=10, pady=3,


command=lambda: clear(entry), activebackground=button_active_bg)
button16.grid(row=1, column=2, pady=5)

button17 = Button(root, text='=', fg=text_fg, bg=cal_button_bg, padx=10, pady=3,


command=lambda: calc(entry), activebackground=button_active_bg)
button17.grid(row=5, column=3, pady=5)

button18 = Button(root, text='^', fg=text_fg, bg=cal_button_bg, padx=10, pady=3,


command=lambda: get_input(entry, '**'))
button18.grid(row=5, column=2, pady=5)
def quit():
exit['command'] = root.quit()
exit = Button(root, text='Quit', fg='white', bg='black', command=quit, height=1, width=7)

14
exit.grid(row=8, column=2)

root.mainloop()

if __name__ == '__main__':
cal()

4. Learning Outcome
4.1 Abstract of Work Experience
During the internship at YBI Foundation, I worked on a calculator project that allowed me to
experience the complete cycle of software development—from concept to deployment. Initially, I
analyzed existing calculator models and identified key functionalities. I then planned my app's
layout using wireframes and began coding the core functionality. Using Python’s Tkinter library,
I created buttons, display screens, and event listeners.
Throughout the development process, I learned:
 How to structure a Python application
 How to handle errors and prevent application crashes
 How to connect backend logic with frontend design
Daily interactions with mentors and peers helped improve my collaboration and communication
skills. By the end of the internship, I had not only developed a fully operational calculator but also
gained confidence in tackling future development projects.
4.2 Application Development / Technology Learned
Application Developed:
Name: Python Calculator
Functionality:
 Basic Arithmetic: Addition, Subtraction, Multiplication, Division
 Advanced Operations: Modulus, Square root, Percentage
 Clear and Reset Functionality
 Error Display and Handling
 GUI Interface using Tkinter
Technologies Learned:
1. Python Programming:

15
Enhanced understanding of:
o Syntax
o Loops and conditionals
o Function-based programming
o Exception handling
2. Tkinter GUI Library:
Gained knowledge of:
o Widgets (Button, Label, Entry)
o Grid and Pack Layout Managers
o Event handling via command binding
o Aesthetic improvement (colors, fonts, padding)
3. Software Development Lifecycle:
o Requirement gathering
o Design and prototyping
o Development and testing
o Documentation and review
4. Debugging and Testing:
o Identifying runtime errors
o Ensuring input validation
o Logical correctness of arithmetic operations
5. Version Control (GitHub):
o Learned basic Git commands
o Created repositories
o Uploaded and shared source code
6. Time and Project Management:
o Creating development timelines
o Following weekly milestones
o Meeting submission deadlines
This internship helped me develop a foundational understanding of building real-world
applications and introduced me to core software engineering practices.

16
5. Bibliography
1. Books & Online Resources:
o Python Crash Course by Eric Matthes
o Automate the Boring Stuff with Python by Al Sweigart
o Tkinter Documentation – https://docs.python.org/3/library/tkinter.html
o GeeksforGeeks Python Programming Tutorials –
https://www.geeksforgeeks.org
o W3Schools Python GUI –
https://www.w3schools.com/python/python_gui_tkinter.asp
2. Software & Tools:
o Python.org for official Python software downloads
o GitHub for repository management
o Visual Studio Code for code editing
3. Mentorship and Guidance:
o Trainers and coordinators at YBI Foundation for hands-on support and
feedback

17
Log Sheet

Name of Student: Mithun BL


Name of Organization: YBI Foundation

Duration
Sn Date Concept learnt
(in Hours)

1 10/04/2025 Introduction to Python 2 Hours

2 11/04/2025 Anaconda 3 Hours

3 12/04/2025 Google Colab 3 Hours

4 13/04/2025 Python Data Types 4 Hours

5 14/04/2025 Types of DataTypes 5 Hours

6 15/04/2025 Python Variables 5 Hours

7 16/04/2025 Python Operators 5 Hours

8 17/04/2025 Python Arithmetic Operators 5 Hours

9 18/04/2025 Introduction to Pandas 5 Hours

10 19/04/2025 Read External File 5 Hours

11 20/04/2025 DataFrame Attributes 5 Hours

12 21/04/2025 Explore DataFrame 5 Hours

13 22/04/2025 DataFrame Modes in Pandas 5 Hours

14 23/04/2025 Manipulate DataFrame 5 Hours

18
15 24/04/2025 Practice quix 1 Hours

16 25/04/2025 Introduction to AIML and Data Science 5 Hours

17 26/04/2025 Supervised Machine Learning 5 Hours

18 27/04/2025 Unsupervised Machine Learning 5 Hours

19 28/04/2025 Reinforcement Learning 5 Hours

20 29/04/2025 Deep Learning(AI) 5 Hours

21 30/04/2025 Generative AI 5 Hours

22 01/05/2025 Introduction to Linear Regression 5 Hours

23 02/05/2025 Simple Linear Regression 5 Hours

24 03/05/2025 Step by Step Simple Linear Regression 5 Hours

25 04/05/2025 Step by Step Multiple Linear Regression 5 Hours

Total Number of Days: 25 Hours:


116

Declaration:
It is declared that the student is completed his/her internship in our organization as per
above schedule.

Signature of Authority

Seal of Organization

19
Introduction to Python

Python is a high-level, general-purpose programming language that has gained immense popularity
due to its simplicity, readability, and versatility. It was created by Guido van Rossum and first
released in 1991. Python emphasizes code readability and allows developers to express concepts in
fewer lines of code compared to many other languages.
The language supports multiple programming paradigms, including procedural, object-oriented, and
functional programming. Python's vast standard library and active community support make it
suitable for a wide range of applications, such as web development, data analysis, machine learning,
automation, and software development.
During the course of the internship, Python was used extensively due to its flexibility and efficiency
in solving real-world problems, writing scripts, and developing scalable applications.
Python is a powerful, interpreted, high-level programming language that supports multiple
programming paradigms including procedural, object-oriented, and functional programming.
Designed with code readability in mind, Python uses clear and concise syntax that enables developers
to write logical and maintainable code for both small-scale scripts and large-scale applications.
Originally developed by Guido van Rossum and released in 1991, Python has grown to become one
of the most widely used programming languages in the world. It is open-source and supported by a
large global community, which contributes to its rich ecosystem of libraries and frameworks. These
libraries, such as NumPy, Pandas, TensorFlow, Flask, and Django, extend Python’s capabilities and
make it a go-to language for fields like data science, web development, artificial intelligence,
machine learning, automation, and software testing.
Python’s platform independence allows code to run seamlessly across various operating systems,
such as Windows, Linux, and macOS. Furthermore, its compatibility with modern development tools
and integration capabilities with other languages (like C, C++, and Java) enhance its utility in
enterprise and academic environments.
🔹 Key Features of Python

1. Simple and Easy to Learn

Python has clean and readable syntax, similar to English, which makes it ideal for beginners and
reduces the learning curve.
2. Interpreted Language
Python is an interpreted language, meaning the code is executed line-by-line, which makes debugging
easier and quicker.
3. Dynamically Typed

You don’t need to declare the data type of a variable. Python automatically detects it based on the
value assigned.
4. Cross-Platform Compatibility

Python works on all major operating systems (Windows, macOS, Linux), so programs can run almost
anywhere without modification.
20
5. Large Standard Library

Python comes with a vast library of built-in modules and third-party packages, which makes
development faster and easier.
6. Object-Oriented and Functional

Python supports both object-oriented and functional programming paradigms, giving developers
flexibility in how they structure their code.
7. Extensible and Embeddable

Python can integrate with other languages like C, C++, and Java, making it suitable for complex
applications and systems programming.
8. High-Level Language
Python handles low-level details like memory management automatically, so developers can focus on
problem-solving rather than technical complexi.
BASIC PYTHON POGRAMS

Write a python program to check whether a number is odd or even.

# Check if a number is even or odd


num = int(input("Enter a number: "))

if num % 2 == 0:
print(f"{num} is even.")
else:
print(f"{num} is odd.")

output:

Enter a number: 4
4 is even.

21
Anaconda

Anaconda is a free and open-source distribution of Python and R programming languages, designed
for scientific computing, data science, machine learning, and large-scale data processing.
It simplifies package management and deployment using tools like Conda, and comes pre-installed
with many popular data science libraries and tools.
Key Features of Anaconda:

 Pre-installed packages: Comes with 250+ scientific and data packages like NumPy, pandas,
Matplotlib, scikit-learn, TensorFlow, etc.
 Environment management: Easily create isolated environments to avoid dependency conflicts.
 Package manager: Uses conda, which can install packages from both Conda and pip repositories.
 User-friendly tools:
 Jupyter Notebook – for interactive data analysis.
 Spyder – an IDE for scientific programming.
 Anaconda Navigator – a graphical interface for managing environments and launching tools.
Why we use Anaconda ?
1. All-in-One Data Science Toolkit

Anaconda comes pre-installed with:


 Python (or R)
 250+ popular libraries (e.g., NumPy, pandas, scikit-learn, TensorFlow)
 Development tools like Jupyter Notebook and Spyder
➡️Saves time—you don’t have to install each tool separately.
2. Easy Package & Dependency Management

With the Conda tool:


 You can install, update, or remove packages easily.
 It handles dependencies automatically.
 Supports both Conda and pip packages.
➡️No more "dependency hell" or version conflicts.
3. Environment Isolation

Create separate virtual environments for each project:


bash
CopyEdit
conda create --name myproject python=3.10

➡️Keeps your projects organized and prevents interference between different package versions.
4. Cross-Platform Support

Works seamlessly on Windows, macOS, and Linux.


22
➡️Ideal for teams working across different operating systems.
5. Ideal for Beginners

 Graphical interface (Anaconda Navigator) for managing tools and environments without
using the command line.
 Pre-configured setup removes complexity from the development process.
➡️Great starting point for people new to Python or data science.
6. Trusted by Professionals

Used by researchers, data scientists, and companies globally due to its stability and rich ecosystem.
Applications You Can Launch from Navigator:

 Jupyter Notebook – Interactive notebooks for code, plots, and text.


 Spyder – IDE for scientific development.
 VS Code – (if installed) general-purpose code editor.
 RStudio – For R language development.
 Orange – Visual programming for machine learning.
Google colab

What is google colab ?

Google Colab (Colaboratory) is a free, cloud-based Jupyter notebook environment provided by


Google. It allows you to write and run Python code in your browser—no installation required.
It’s widely used for machine learning, data analysis, and education, especially when you need
access to free GPUs or TPUs
Key features of Google colab:
1. Cloud-Based Environment

 Runs entirely in your web browser.


 No software installation needed.
 Accessible from anywhere with an internet connection.
2. Jupyter Notebook Interface
 Supports Python code, Markdown, LaTeX, and inline visualizations.
 Easy to mix code, charts, equations, and explanations in one document.
3. Free Access to GPUs and TPUs
 Includes NVIDIA Tesla K80, T4, or A100 GPUs and Google TPUs for accelerating ML
tasks.
 Great for training deep learning models at no cost.
4.Pre-installed Libraries

 Comes with many popular libraries:


o Machine Learning: TensorFlow, PyTorch, Keras, Scikit-learn
o Data Analysis: pandas, NumPy
23
o Visualization: Matplotlib, Seaborn
 You can also install any other Python library using !pip install.
5.Google Drive Integration

 Save notebooks directly to Google Drive.


 Load and process files stored in Drive.
 Useful for long-term storage and collaboration.
6. Real-Time Collaboration

 Share notebooks like Google Docs.


Collaborate with others in real time, see live changes.

24
Python Data Types

. What Are Data Types in Python?

Data types define the kind of data a variable can hold in a programming language like Python.
They help Python understand how to store and manipulate values correctly—whether it’s a number,
text, list, or something else.
Why Are Data Types Important?
. Ensure that operations are valid (e.g., you can't add a string to a number).
. Allow Python to manage memory efficiently.
. Help you organize and structure your program’s data logically.
Types of Data Types in python:

1. Numeric Types
int (Integer)

 Whole numbers, positive or negative, without a decimal point.


 Example: x = 42
float (Floating Point)

 Numbers with a decimal point.


 Example: y = 3.14159
complex (Complex)

 Numbers with a real and an imaginary part (e.g., 3 + 4j).


 Example: z = 2 + 3j
2. Text Type
str (String)

 A sequence of characters, enclosed by quotes (single, double, or triple quotes).


 Example: name = "Alice"

3. Sequence Types
🔹 list

 Ordered, mutable (can be changed) collection of items.


 Can hold elements of different types.
 Example: fruits = ["apple", "banana", "cherry"]
🔹 tuple

 Ordered, immutable (cannot be changed) collection of items.


 Example: coordinates = (10, 20)
🔹 range

25
 Represents a sequence of numbers, commonly used in loops.
 Example: range(5) generates numbers from 0 to 4.

4. Mapping Type
🔸 dict (Dictionary)

 A collection of key-value pairs.


 Keys are unique and used to retrieve values.
 Example: person = {"name": "Bob", "age": 30}

5. Set Types
🔹 set

 Unordered, unique collection of elements.


 Example: unique_numbers = {1, 2, 3}
🔹 frozenset

 Immutable version of a set.


 Example: immutable_set = frozenset([1, 2, 3])

6. Boolean Type
bool

 Represents True or False values, used for logic and conditionals.


 Example: is_active = True

7. Binary Types
🔹 bytes

 Immutable sequence of bytes (used for binary data).


 Example: data = b"hello"

Python Variables

What is a Variable in Python?

A variable in Python is essentially a name that refers to a value. You can think of it as a container
that stores data (such as a number, string, list, etc.). Once a variable is assigned a value, you can use
that variable in your program to manipulate or display the stored data.
Features of Variables in Python

1. Dynamically Typed
26
 No need to declare types: Python automatically determines the type of a variable based on
the value assigned to it.
 You don’t have to declare the type of a variable when you create it.
2. No Explicit Declaration

 Python does not require explicit declaration of variables. You simply assign a value to a
variable name, and Python handles the rest.
 This makes Python a high-level language, with fewer lines of code needed to manage
variables.
3. Reassignable

 Variables in Python are mutable and can be reassigned any time. You can change the value or
type of a variable during the program’s execution.
4. No Memory Allocation Issues

 Python manages memory allocation and garbage collection automatically. Variables will
automatically be freed when they are no longer in use (i.e., when there are no references to
them).
5. Supports Multiple Assignments

 Python allows you to assign multiple variables in a single line.


 This is especially useful for swapping values or grouping variables together.
6. Variables Can Hold Any Data Type

 Variables in Python are not restricted to any specific data type. A variable can hold any type
of data, such as integers, strings, lists, tuples, dictionaries, and more.
7. Case Sensitivity

 Variable names are case-sensitive. This means that myVariable and myvariable are
considered different variables in Python.
8. No Restrictions on Variable Names (Except Keywords)

 Variables can be named using letters, digits, and underscores, but they cannot start with a
digit. Additionally, Python keywords (reserved words) like if, for, and while cannot be used
as variable names.
9. Garbage Collection

 Python has an automatic garbage collection system that manages memory and removes
unused variables from memory. When a variable is no longer referenced, it is automatically
deleted.

10. Global and Local Variables

27
 Variables can be local (inside a function) or global (available throughout the program).
 You can use the global keyword to modify global variables inside a function.

Python operators

What are Operators in Python?

Operators are symbols or keywords that let you perform operations on values or variables. Think of
them like instructions for doing things like math, comparisons, or combining logic.

1. Arithmetic Operators

These do basic math:


Operator Meaning Example Result
+ Add 5+3 8
- Subtract 5-3 2
* Multiply 5*3 15
/ Divide 5/2 2.5
// Floor Division 5 // 2 2
% Modulus (Remainder) 5%2 1
** Power 2 ** 3 8

2. Comparison (Relational) Operators

These compare two values and return True or False.


28
Operator Meaning Example Result
== Equal to 5 == 5 True
!= Not equal to 5 != 3 True
> Greater than 5>3 True
< Less than 5<3 False
>= Greater or equal 5 >= 5 True
<= Less or equal 3 <= 5 True

3. Logical Operators

Used to combine conditions:


Operator Meaning Result
And Both must be True True and False False
Or Either can be True True or False True
Not Reverses the result not True False

4. Assignment Operators

Used to assign or update values in a variable.


Operator Example Same As
= a=5 Assign 5 to a
+= a += 2 a=a+2
-= a -= 1 a=a-1
*= a *= 3 a=a*3
/= a /= 2 a=a/2

And others like //=, %=, **=

5. Bitwise Operators

Work at the bit level (used less often for beginners):


Operator Meaning Example (a=5, b=3) Result
& AND a&b 1
` ` OR `a
^ XOR a^b 6
~ NOT ~a -6
<< Shift left a << 1 10
>> Shift right a >> 1 2
29
6. Membership Operators

Check if a value is in a sequence (like a list or string):


Operator Meaning Example Result
In Exists 'a' in 'apple' True
not in Doesn’t exist 'x' not in 'apple' True

7. Identity Operators

Check if two variables point to the same object in memory.


Operator Meaning Example Result
Is Same object a is b True/False
is not Not same object a is not b True/False

Program for Arithmetic operation

# Simple program to demonstrate arithmetic operators

# Input two numbers from the user


a = int(input("Enter first number: "))
b = int(input("Enter second number: "))

# Perform arithmetic operations


print("Addition:", a + b)
print("Subtraction:", a - b)
print("Multiplication:", a * b)
print("Division:", a / b)
print("Floor Division:", a // b)
print("Modulus:", a % b)
print("Exponentiation:", a ** b)

output:

Enter first number: 10


Enter second number: 3
Addition: 13
Subtraction: 7
Multiplication: 30
Division: 3.3333333333333335
30
Floor Division: 3
Modulus: 1
Exponentiation: 1000

Program for Comparison (Relational) Operators:

# Simple program to demonstrate comparison (relational) operators

# Input two numbers from the user


a = int(input("Enter first number: "))
b = int(input("Enter second number: "))

# Perform comparison operations


print("a == b:", a == b) # Equal to
print("a != b:", a != b) # Not equal to
print("a > b:", a > b) # Greater than
print("a < b:", a < b) # Less than
print("a >= b:", a >= b) # Greater than or equal to
print("a <= b:", a <= b) # Less than or equal to

output:

Enter first number: 10


Enter second number: 5
a == b: False
a != b: True
a > b: True
a < b: False
a >= b: True
a <= b: False

Program for assignment Operators:

# Simple program to demonstrate assignment operators

# Start with an initial value


a = 10
print("Initial value of a:", a)

# Use assignment operators


a += 5

31
print("After a += 5, a =", a)

a -= 3
print("After a -= 3, a =", a)

a *= 2
print("After a *= 2, a =", a)

a /= 4
print("After a /= 4, a =", a)

a //= 2
print("After a //= 2, a =", a)

a %= 3
print("After a %= 3, a =", a)

a **= 2
print("After a **= 2, a =", a)

output:

Initial value of a: 10
After a += 5, a = 15
After a -= 3, a = 12
After a *= 2, a = 24
After a /= 4, a = 6.0
After a //= 2, a = 3.0
After a %= 3, a = 0.0
After a **= 2, a = 0.0

QUIX

32
Introduction To Pandas

Pandas (styled as pandas) is a software library written for the Python programming language for
data manipulation and analysis. In particular, it offers data structures and operations for manipulating
numerical tables and time series. It is free software released under the three-clause BSD license.
[2]
The name is derived from the term "panel data", an econometrics term for data sets that include
observations over multiple time periods for the same individuals, [3] as well as a play on the phrase
"Python data analysis".[4]: 5 Wes McKinney started building what would become Pandas at AQR
Capital while he was a researcher there from 2007 to 2010.[5]
The development of Pandas introduced into Python many comparable features of working with
DataFrames that were established in the R programming language.[6] The library is built upon another
library, NumPy.

Key Features of Pandas:

1. Data Structures:
o Series: A one-dimensional labeled array.
o DataFrame: A two-dimensional labeled data structure, like a table in a database or Excel
spreadsheet.
2. Data Handling:
o Supports reading and writing data from various formats: CSV, Excel, SQL, JSON, and
more.
o Powerful tools for filtering, transforming, grouping, and cleaning data.
3. Missing Data Handling:
o Provides tools like isna(), fillna(), and dropna() for dealing with missing data.
4. Label-based Indexing:
o Use of labels for rows and columns to access data instead of relying solely on integer
indices.
5. Data Aggregation and Grouping:
o Easily group data using groupby() and apply functions like sum, mean, or custom
aggregations.
6. Time Series Support:
o Built-in support for time series data, including date range generation and frequency
conversion.

1. Core Data Structures


a. Series

 A one-dimensional labeled array capable of holding any data type.

python
33
CopyEdit
import pandas as pd
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)
b. DataFrame

 A two-dimensional table (like a spreadsheet or SQL table).

python
CopyEdit
data = {'Name': ['Anna', 'Ben'], 'Age': [28, 34]}
df = pd.DataFrame(data)
print(df)

1. Reading and Writing Data

 From CSV: pd.read_csv('file.csv')


 To CSV: df.to_csv('output.csv', index=False)
 Other formats: Excel (read_excel()), JSON (read_json()), SQL (read_sql()), etc.

3. Data Selection and Filtering


python
CopyEdit
# Selecting a column
df['Age']

# Selecting multiple columns


df[['Name', 'Age']]

# Filtering rows
df[df['Age'] > 30]

# Using loc (label-based) and iloc (position-based)


df.loc[0] # First row by label
df.iloc[0] # First row by position

4. Data Cleaning
python
CopyEdit
df.dropna() # Remove missing data
df.fillna(0) # Replace missing values with 0
df.rename(columns={'Name': 'FullName'}, inplace=True)
df['Age'] = df['Age'].astype(int) # Change data type

34
5. GroupBy and Aggregation
python
CopyEdit
df.groupby('City').mean() # Average age by city
df['Age'].sum() # Sum of age
df['Age'].agg(['min', 'max']) # Multiple aggregations

6. Date and Time


python
CopyEdit
# Convert string to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Extracting parts
df['Year'] = df['Date'].dt.year

7. Data Visualization (with Matplotlib or Seaborn)


python
CopyEdit
import matplotlib.pyplot as plt

df['Age'].plot(kind='bar')
plt.show()

8. Common Operations

 Sorting: df.sort_values(by='Age')
 Merging: pd.merge(df1, df2, on='id')
 Concatenating: pd.concat([df1, df2])
 Pivot tables: df.pivot_table(values='Sales', index='Region', columns='Month')

Python program for Student Marks Analysis

import pandas as pd

# Step 1: Create a DataFrame


data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Math': [85, 78, 92, 60, 88],
'Science': [90, 80, 95, 70, 85],

35
'English': [88, 76, 89, 65, 90]
}

df = pd.DataFrame(data)

# Step 2: Display the data


print("Student Marks:")
print(df)

# Step 3: Calculate average score


df['Average'] = df[['Math', 'Science', 'English']].mean(axis=1)
print("\nWith Average Scores:")
print(df)

# Step 4: Filter students who scored above 85 average


top_students = df[df['Average'] > 85]
print("\nTop Students (Average > 85):")
print(top_students)

# Step 5: Sort students by average score


sorted_df = df.sort_values(by='Average', ascending=False)
print("\nStudents Sorted by Average:")
print(sorted_df)

output:
Student Marks:
Name Math Science English
0 Alice 85 90 88
1 Bob 78 80 76
2 Charlie 92 95 89
3 David 60 70 65
4 Eva 88 85 90

With Average Scores:


Name Math Science English Average
0 Alice 85 90 88 87.666667
1 Bob 78 80 76 78.000000
2 Charlie 92 95 89 92.000000
3 David 60 70 65 65.000000
4 Eva 88 85 90 87.666667

36
Top Students (Average > 85):
Name Math Science English Average
0 Alice 85 90 88 87.666667
2 Charlie 92 95 89 92.000000
4 Eva 88 85 90 87.666667

Read External File


To read an external file using Pandas, you typically use functions like pd.read_csv(),
pd.read_excel(), or pd.read_json() depending on the file type. Here's a brief overview:
🔹 Common File Reading Methods in Pandas
File Type Function Example Usage
CSV pd.read_csv() pd.read_csv("data.csv")
Excel pd.read_excel() pd.read_excel("data.xlsx", sheet_name=0)
JSON pd.read_json() pd.read_json("data.json")
Text pd.read_table() pd.read_table("data.txt", delimiter="\t")
HTML pd.read_html() pd.read_html("data.html") (returns list)
Parquet pd.read_parquet() pd.read_parquet("data.parquet")
SQL pd.read_sql() pd.read_sql("SELECT * FROM table", conn)

🔹 Basic Example: Read a CSV


python
CopyEdit
import pandas as pd

# Read a CSV file


df = pd.read_csv('path/to/your/file.csv')

# View the first few rows


print(df.head())

🔹 Additional Parameters

 delimiter: For specifying custom separators (e.g., sep=';')


 header: Row number(s) to use as column names
 index_col: Column(s) to set as index
 usecols: Return a subset of the columns
What is CSV File

A CSV file (short for Comma-Separated Values) is a simple text file used to store tabular data
(like a spreadsheet or database table).

🔹 Key Features of CSV Files:

37
 Plain text format
 Each line in the file is a data record
 Each record consists of fields, separated by commas
 Often used for data exchange between systems (e.g., Excel, databases, web apps)
🔹 Example CSV Content:
pgsql
CopyEdit
Name,Age,Country
Alice,30,USA
Bob,25,Canada
Charlie,35,UK

This represents a table with three columns (Name, Age, Country) and three rows of data.
🔹 Common Uses:

 Exporting data from spreadsheets (like Microsoft Excel)


 Sharing data between software applications
 Importing data into databases or programming languages like Python.
Properties of CSV File:
🔹 1. Plain Text Format

 CSV files are human-readable.


 They can be opened and edited with any text editor (like Notepad, VS Code) or spreadsheet
software (like Excel).

🔹 2. Comma as Default Delimiter

 Fields are separated by commas (,), but other delimiters like tabs (\t), semicolons (;), or pipes (|)
can also be used.

🔹 3. Rows and Columns

 Each line represents a row.


 Each field in the row (separated by commas) represents a column value.
🔹 4. No Standard Metadata

 CSV files usually do not store data types, formatting, or formulas.


 It’s just raw data — no styling or advanced structure like in Excel or JSON.

🔹 5. Optional Header Row

 The first row often contains column names (headers), but it's not required.

🔹 6. Lightweight and Portable


38
 Very small in size, easy to share or transfer.
 Widely supported across platforms, databases, and programming languages.

🔹 7. No Hierarchical Structure

 CSV is flat — not suitable for nested or complex data like JSON or XML.

🔹 8. Encoding

 Usually encoded in UTF-8 or ASCII.


 Encoding issues can occur if special characters are used.
Separators in CSV File:

In a CSV file, the separator (or delimiter) is the character used to divide fields (columns) in each
row. The default separator is a comma (,), but other separators can be used depending on the region,
software, or data structure.
🔹 Common Separators in CSV-like Files
Separator Character Example Line Common Use
Comma , Name,Age,Country Standard CSV (US, international)
Used in Europe (due to comma used as decimal
Semicolon ; Name;Age;Country
separator)
Tab \t Name\tAge\tCountry TSV (Tab-Separated Values)
Pipe ` ` `Name
Space '' Name Age Country Rare, not recommended
DataFrame Attribute in Pandas

A DataFrame in Pandas has several attributes that help you inspect and understand the structure and
metadata of your data — not to be confused with methods, which perform operations.

Key DataFrame Attributes

Attribute Description Example Output


df.shape Tuple of (rows, columns) (100, 5)
df.size Total number of elements (rows × columns) 500
df.ndim Number of dimensions (usually 2 for DataFrame) 2
df.columns Column labels as an Index object Index(['A', 'B', 'C'], dtype='object')
df.index Row labels as an Index object RangeIndex(start=0, stop=100, step=1)
df.dtypes Data types of each column A: int64, B: object, ...
df.values Underlying Numpy array of data array([...])
df.axes List of row and column axis labels [df.index, df.columns]
df.empty Returns True if DataFrame is empty True or False
df.T Transposed version (rows ↔ columns) DataFrame

39
1. df.shape

 Returns a tuple: (rows, columns)


 Example:

python
CopyEdit
df.shape # (100, 5)

🔹 2. df.columns

 Lists all column labels.


 Example:

python
CopyEdit
df.columns # Index(['Name', 'Age', 'Country'], dtype='object')

🔹 3. df.index

 Shows the index (row labels).


 Example:

python
CopyEdit
df.index # RangeIndex(start=0, stop=100, step=1)

🔹 4. df.dtypes

 Displays data types of each column.


 Example:

python
CopyEdit
df.dtypes
# Name object
# Age int64
# Country object

🔹 5. df.size

 Total number of elements (rows × columns).


 Example:

40
python
CopyEdit
df.size # 500

🔹 6. df.ndim

 Number of dimensions (usually 2 for DataFrames).


 Example:

python
CopyEdit
df.ndim # 2

🔹 7. df.head(n)

 Displays the first n rows (default is 5).


 Example:

python
CopyEdit
df.head() # First 5 rows

🔹 8. df.tail(n)

 Displays the last n rows.


 Example:

python
CopyEdit
df.tail(3) # Last 3 rows

🔹 9. df.info()

 Summary of the DataFrame: index, column data types, non-null counts, memory usage.
 Example:

python
CopyEdit
df.info()

Explore Data Frame

41
Exploring a Pandas DataFrame means getting a quick understanding of its structure, content,
summary statistics, and potential data quality issues. Here’s a step-by-step breakdown of how to
explore a DataFrame:

1. Basic Structure and Metadata


python
CopyEdit
df.shape # Number of rows and columns
df.columns # List of column names
df.index # Row labels
df.dtypes # Data types of each column
df.info() # Summary: non-null counts, dtypes, memory usage

2. Preview the Data


python
CopyEdit
df.head() # First 5 rows
df.tail() # Last 5 rows
df.sample(3) # Random 3 rows

3. Summary Statistics
python
CopyEdit
df.describe() # Descriptive stats for numeric columns
df.describe(include='object') # Stats for categorical (object) columns

4. Check for Missing Data


python
CopyEdit
df.isnull().sum() # Number of missing values per column
df.isna().any() # Whether any NaN values exist
df.notnull().sum() # Count of non-null values

5. Value Counts and Uniqueness


python
CopyEdit
df['column_name'].value_counts() # Frequency of each value
df['column_name'].unique() # Unique values
df['column_name'].nunique() # Number of unique values

6. Data Types and Conversion


python
CopyEdit

42
df['col'] = df['col'].astype(str) # Convert column to string
df.select_dtypes(include='number') # Select only numeric columns

7. Correlation and Relationships


python
CopyEdit
df.corr(numeric_only=True) # Correlation between numeric columns

8. Sorting and Filtering


python
CopyEdit
df.sort_values('Age', ascending=False) # Sort by column
df[df['Age'] > 30] # Filter rows

9. Group and Aggregate


python
CopyEdit
df.groupby('Country').mean() # Mean per group
df.groupby('Country').size() # Count per group

Example Workflow for Exploring a New Dataset

python
CopyEdit
print(df.shape)
print(df.columns)
print(df.dtypes)
print(df.head())
print(df.describe(include='all'))
print(df.isnull().sum())
print(df['target'].value_counts(normalize=True))

Data Frame Mode in Pandas


In Pandas, the .mode() function is used to find the mode — the most frequently occurring value(s)
— in each column of a DataFrame or Series.

Syntax
python
CopyEdit
df.mode(axis=0, numeric_only=False, dropna=True)

Parameters
43
Parameter Description
Axis 0 for columns (default), 1 for rows
numeric_only If True, only includes numeric data
Dropna If True, excludes NA/null when determining mode

Returns

 A DataFrame containing mode value(s).


 If multiple modes exist, all are returned as rows.

Example 1: Column-wise Mode


python
CopyEdit
import pandas as pd

data = {
'A': [1, 2, 2, 3],
'B': ['x', 'x', 'y', 'y'],
'C': [10, 20, 20, 10]
}
df = pd.DataFrame(data)

print(df.mode())

Output:

css
CopyEdit
A B C
0 2.0 x 10
1 NaN y 20

 Column A has one mode: 2


 Column B has two modes: x, y
 Column C has two modes: 10, 20

Example 2: Mode of a Single Column


python
CopyEdit
df['A'].mode()
# Output: 2

Use Cases

44
 Finding the most common category in a column
 Checking data skew in distributions
 Preprocessing for replacing missing values

1. Multi-Mode Behavior

If a column has more than one most frequent value, .mode() returns multiple rows.

python
CopyEdit
import pandas as pd

df = pd.DataFrame({'A': [1, 1, 2, 2, 3]})

print(df.mode())

Output:

css
CopyEdit
A
0 1.0
1 2.0

🔸 Both 1 and 2 appear twice, so both are returned.

2. Mode Across Rows (Axis=1)

This finds the most frequent value within each row:

python
CopyEdit
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [1, 2, 3],
'C': [1, 3, 3]
})

df.mode(axis=1)

Output:

markdown
CopyEdit

45
0
0 1.0
1 2.0
2 3.0

🔹 For each row, it finds the mode horizontally.

3. Mode on Categorical Data


python
CopyEdit
df = pd.DataFrame({
'Color': ['Red', 'Blue', 'Red', 'Green', 'Red']
})

df.mode()

Output:

css
CopyEdit
Color
0 Red

🎯 Useful for finding the most common category in survey or label data.

4. Groupby with Mode

You can combine .groupby() and .mode() to get the most common value per group.

python
CopyEdit
df = pd.DataFrame({
'Group': ['A', 'A', 'B', 'B', 'B'],
'Score': [1, 2, 2, 3, 2]
})

# Group-wise mode
df.groupby('Group')['Score'].agg(lambda x: x.mode().iloc[0])

Output:

css
CopyEdit
Group

46
A 1
B 2
Name: Score, dtype: int64

📌 Use .iloc[0] to select the first mode when multiple exist.

5. Handling NaNs in Mode

By default, .mode() ignores NaN values. To include them, set dropna=False.

python
CopyEdit
df = pd.DataFrame({'A': [1, 1, None, None, 2]})
df.mode(dropna=False)

Output:

css
CopyEdit
A
0 1.0
1 NaN
Manipulate Data Frame
DataFrame manipulation refers to the process of modifying, transforming, or restructuring the
contents of a Pandas DataFrame to make the data more suitable for analysis, visualization, or
modeling.
Common DataFrame Manipulation Tasks Include:

 Renaming columns or rows


 Adding or removing columns/rows
 Filtering or sorting data
 Replacing or filling missing values
 Changing data types (e.g., to datetime or numeric)
 Grouping and aggregating data
 Merging or concatenating DataFrames
1. Renaming Columns and Index
python
CopyEdit
df.rename(columns={'old_name': 'new_name'}, inplace=True)
df.rename(index={0: 'first_row'}, inplace=True)

2. Adding New Columns


python
47
CopyEdit
df['new_col'] = df['col1'] + df['col2'] # Based on other columns
df['constant'] = 1 # Constant value

3. Dropping Columns or Rows


python
CopyEdit
df.drop(columns=['col1', 'col2'], inplace=True)
df.drop(index=[0, 2], inplace=True)

4. Filtering Rows (Condition-Based Selection)


python
CopyEdit
df[df['Age'] > 30] # Rows where Age > 30
df[df['Gender'] == 'Male'] # Rows where Gender is Male

5. Sorting Data
python
CopyEdit
df.sort_values(by='Age', ascending=True, inplace=True)
df.sort_index(inplace=True) # Sort by index

6. Replacing Values
python
CopyEdit
df['Gender'].replace({'M': 'Male', 'F': 'Female'}, inplace=True)
df.replace(0, np.nan, inplace=True) # Replace 0s with NaN

7. Handling Missing Data


python
CopyEdit
df.fillna(0) # Replace NaNs with 0
df.dropna() # Drop rows with any NaNs
df.fillna(df['col'].mean()) # Replace with column mean

8. Changing Data Types


python
CopyEdit
df['col'] = df['col'].astype(str) # Convert to string
df['col'] = pd.to_datetime(df['col']) # Convert to datetime

9. Creating New Data from Existing


48
python
CopyEdit
df['Name_Length'] = df['Name'].apply(len)
df['Is_Adult'] = df['Age'].apply(lambda x: x >= 18)

10. Combining DataFrames


python
CopyEdit
pd.concat([df1, df2], axis=0) # Stack vertically
pd.concat([df1, df2], axis=1) # Stack horizontally
df.merge(other_df, on='key') # SQL-like merge

11. Grouping and Aggregating


python
CopyEdit
df.groupby('Gender').mean()
df.groupby('Department')['Salary'].sum()

12. Pivot Tables


python
CopyEdit
df.pivot_table(index='Gender', columns='Department', values='Salary', aggfunc='mean')

13. Resetting or Setting Index


python
CopyEdit
df.reset_index(inplace=True)
df.set_index('ID', inplace=True)

Why Data Manipulation Is Important

1. Raw Data Is Rarely Ready for Analysis

 Real-world data is often incomplete, inconsistent, or poorly structured.


 Example: Missing values, wrong formats, or irrelevant columns.
🔍 You can't get accurate results from dirty or misaligned data.
2. Enables Meaningful Insights

 Data manipulation helps reveal patterns, trends, and relationships by organizing and
aggregating information properly.
 Example: Grouping sales by region, pivoting customer behavior, or filtering key segments.
3. Essential for Machine Learning & Modeling
49
 Models require numerical, well-structured, normalized data.
 You often need to encode categories, scale values, or impute missing entries.
4. Supports Decision Making
 Clean, transformed data feeds into dashboards, KPIs, and reports that guide business
strategy.
 Without manipulation, stakeholders would be overwhelmed with noise.

Introduction To Data Science AIML

Introduction to Data Science


Definition:

Data Science is an interdisciplinary field that uses statistics, programming, and domain
knowledge to extract insights and knowledge from structured and unstructured data.
Core Components of Data Science:

1. Data Collection – Gathering data from sources like databases, sensors, web APIs.
2. Data Cleaning – Handling missing, incorrect, or inconsistent data.
3. Data Exploration & Visualization – Using tools like Pandas, Matplotlib, and Seaborn.
4. Statistical Analysis – Understanding relationships and trends in data.
5. Machine Learning – Building predictive models using algorithms.
6. Data Communication – Reporting results via dashboards, charts, and reports.
Applications:

 Business analytics
 Fraud detection
 Recommendation systems
 Healthcare diagnostics
 Social media analysis
🤖 Introduction to AI and Machine Learning (AIML)
Artificial Intelligence (AI):

AI is the broader concept of machines simulating human intelligence — including reasoning,


learning, and problem-solving.
Machine Learning (ML):

ML is a subset of AI that enables systems to learn from data and improve over time without being
explicitly programmed.

Types of Machine Learning:

50
Type Description Example

Supervised Learn from labeled data Spam detection, price prediction

Unsupervised Find patterns in unlabeled data Customer segmentation, clustering

Reinforcement Learn via rewards & penalties Game-playing AI, robotics

Common AIML Tools & Libraries:


Python – The most widely used language
Pandas, NumPy – Data manipulation
Scikit-learn – ML models
TensorFlow, PyTorch – Deep learning
Matplotlib, Seaborn – Visualization

Why Learn Data Science and AIML?


High demand in tech, finance, healthcare, and more
Enables smarter decisions through data
Drives innovation with automation and intelligence

Skills You Need for Data Science & ML


Technical Skills
Python, R
Pandas, NumPy, Scikit-learn, TensorFlow
SQL for databases
Data visualization (Matplotlib, Seaborn, Power BI, Tableau)
Soft Skills
Critical thinking
Communication
Problem-solving
Domain expertise (e.g. finance, healthcare)

Reinforcement Machine Learning


Definition
Reinforcement Learning is a type of machine learning where an agent learns to make decisions
by interacting with an environment, receiving rewards or penalties based on its actions.
Think of it like training a dog: if it performs the correct action, it gets a treat (reward);
otherwise, it gets nothing (or a penalty).

Core Components of Reinforcement Learning


Term Description
Agent The learner or decision maker (e.g., a robot, game bot)
Environment Everything the agent interacts with
51
Term Description
State (S) Current situation of the agent
Action (A) Possible choices the agent can make
Reward (R) Feedback from the environment after an action
Policy (π) Strategy the agent follows to choose actions based on states
Value Function
Expected long-term reward for each state
(V)

How It Works (Learning Cycle)


Agent observes the current state of the environment.
It takes an action based on a policy.
It receives a reward (positive or negative).
The environment changes to a new state.
The agent learns to improve its actions to maximize total rewards over time.

Real-World Example
Application Agent Environment Reward
Game playing (e.g. Chess,
Game AI The game Win = +1, Lose = -1
Go)
Car control
Self-driving cars Road conditions Safe driving = positive
algorithm
Task completion =
Robotics Robot arm Assembly task
reward
Ad placement Ad selector User interaction Click = +1, No click = 0

Popular Reinforcement Learning Algorithms


Algorithm Type Description
Q-Learning Value-based Learns the value of actions in states (Q-values)
SARSA Value-based Similar to Q-learning but learns differently
Deep Q-Network (DQN) Deep RL Uses neural networks to approximate Q-values
Policy Gradient Policy-based Directly learns the best policy
Actor-Critic Hybrid Combines policy and value functions
Proximal Policy Optimization
Deep RL Stable policy learning (used in robotics/games)
(PPO)

Types of Reinforcement Learning


Type Description
Positive RL Agent gets rewards for correct actions
Negative RL Agent gets penalties for incorrect actions
Model-free Learns directly from interaction (e.g. Q-Learning)

52
Type Description
Model-
Learns a model of the environment
based

Challenges in RL
Exploration vs Exploitation: Should the agent try new actions (explore) or stick with the known
best ones (exploit)?
Delayed rewards: The agent may need to make several steps before knowing if its action was
good.
Large state spaces: Real-world problems often have millions of states or actions.
Training time: Can be very slow compared to supervised learning.

Tools & Libraries


OpenAI Gym – Simulated environments
Stable-Baselines3 – RL algorithms in Python
TensorFlow Agents / PyTorch RL – For custom RL models
Unity ML-Agents – For training agents in simulated environments

Deep Learning

Definition
Deep Learning is a subfield of Machine Learning that uses artificial neural networks with many
layers (hence “deep”) to model and solve complex problems — especially those involving
unstructured data like images, text, and audio.
It mimics the way the human brain processes information — learning from raw data through
layers of neurons.

What Makes It Different from Traditional ML?


Traditional ML Deep Learning
Needs feature engineering Learns features automatically
Struggles with raw data Excels at unstructured data
Shallow models Deep multi-layered neural networks
Simple decision boundaries Highly flexible and expressive

Basic Building Blocks of Deep Learning


Component Description
Neuron (Node) Basic computation unit (like a logic gate)
Layer Group of neurons; types include input, hidden, and output
Weights & Biases Parameters that are learned during training
Activation
Introduce non-linearity (e.g., ReLU, Sigmoid)
Functions
53
Component Description
Loss Function Measures prediction error (e.g., Cross-Entropy)
Optimizer Updates weights to reduce error (e.g., SGD, Adam)

Real-World Applications
Field Deep Learning Use Case
Vision Image recognition, facial recognition, object detection
NLP (Text) Chatbots, translation, sentiment analysis
Audio Speech recognition, music generation
Healthcare Disease detection, medical imaging
Self-driving cars Lane detection, pedestrian recognition
Robotics Visual perception and control

Popular Deep Learning Architectures


Model Type Use Case
Feedforward Neural Network (FNN) Basic tabular data prediction
Convolutional Neural Network (CNN) Image classification, object detection
Recurrent Neural Network (RNN) Time series, speech, text sequences
Long Short-Term Memory (LSTM) Better RNN for long sequences
Transformer (e.g., BERT, GPT) State-of-the-art for text and language tasks
Autoencoders Data compression, anomaly detection
GANs (Generative Adversarial
Image generation, deepfakes
Networks)

Deep Learning Frameworks


TensorFlow (by Google)
PyTorch (by Meta, most popular for research)
Keras (high-level API on TensorFlow)
MXNet, JAX, ONNX – others for specific use cases

Challenges in Deep Learning


Requires large datasets
Needs high computing power (GPUs/TPUs)
Risk of overfitting
Interpretability – difficult to explain decisions
Training time can be very long

Training Process Summary


Input raw data (images, text, etc.)
Pass data through multiple neural layers
Use forward propagation to compute predictions
54
Calculate error using a loss function
Use backpropagation to adjust weights
Repeat until the model converges (lea
Generative AI
What Is Generative AI?
Generative AI (Artificial Intelligence) refers to systems that can generate new content — such
as text, images, audio, video, or code — that resembles human-created data.
Instead of just analyzing or classifying data, Generative AI creates new, original data.
Examples of Generative AI in Action
Type of Content What It Can Generate Tools / Models
Text Articles, emails, summaries, poems ChatGPT, GPT-4, Claude
Images Artwork, photorealistic images DALL·E, Midjourney, Stable Diffusion
Code Python, HTML, JavaScript, etc. GitHub Copilot, CodeWhisperer
Audio Music, speech synthesis Jukebox, ElevenLabs, Voicemod
Video Short clips, AI avatars Sora (OpenAI), Runway

How Does Generative AI Work?


Generative AI uses models from deep learning, particularly:
1. Generative Adversarial Networks (GANs)
Two networks: a generator (creates) and a discriminator (judges).
The generator improves until it can fool the discriminator.
Used for image generation, style transfer, deepfakes.
2. Variational Autoencoders (VAEs)
Encodes input to a latent space, then decodes it back.
Useful for data compression and generation.
3. Transformers (e.g., GPT, BERT)
Deep learning models for text and multimodal generation.
Power tools like ChatGPT, Bard, and Claude.

Key Features of Generative AI


Feature Description
Creativity Generates new content beyond existing data
Personalizatio
Can adapt output to user context or preference
n
Multimodality Works across text, images, audio, and more
Interactivity Used in chatbots, assistants, and content tools
Applications of Generative AI
Industry Use Case
Marketing Ad copy, blog writing, image generation
Education Automated tutoring, summarization
Entertainment Game art, music creation, AI avatars

55
Industry Use Case
Healthcare Medical imaging, drug discovery
Finance Report generation, data simulation
Design Logo creation, website mockups

Challenges and Concerns


Concern Explanation
Bias in Output Trained data may carry social or cultural bias
Misinformation Can generate fake news or deepfakes
Intellectual
Content generation may infringe on copyrighted work
Property
Hallucinations Models may invent facts or make errors
Ethical Use Responsible usage in sensitive domains is critical

Popular Tools and Models


Tool / Model Use
ChatGPT / GPT-
Text generation, Q&A, summarization
4
DALL·E 3 Image creation from text
Sora AI-generated video from text
Midjourney Artistic images from prompts
Stable Diffusion Open-source image generation
GitHub Copilot AI for code writing

The Future of Generative AI


More interactive & creative tools
Smarter agents that remember context
Wider use in business automation
Greater need for ethical guidelines and trustworthy AI

Introduction To Linear Regression

Definition
Linear Regression is a supervised machine learning algorithm used to predict a continuous
(numeric) value based on one or more input features (independent variables).
It models the relationship between variables by fitting a straight line (called the regression line)
through the data.
It answers questions like: “How does the price of a house change with its size?”
The Equation
For Simple Linear Regression (1 input feature):
56
y=mx+by = mx + by=mx+b
Where:
y = predicted value (dependent variable)
x = input feature (independent variable)
m = slope (how much y changes with x)
b = intercept (value of y when x = 0)
For Multiple Linear Regression (multiple features):
y=b0+b1x1+b2x2+...+bnxny = b_0 + b_1x_1 + b_2x_2 + ... + b_nx_ny=b0+b1x1+b2x2+...+bnxn

Use Cases
Scenario Input (X) Output (Y)
House price prediction Size, bedrooms, location Price
Salary estimation Experience, education level Annual salary
Sales forecasting Ad spend, past sales Future sales
Temperature prediction Time of day, humidity Temperature

Types of Linear Regression


Type Description
Simple Linear One independent variable
Multiple Linear Two or more independent variables
Polynomial
Extends linear model with nonlinear features
Regression
Ridge / Lasso Regularized versions for better performance

Assumptions of Linear Regression


Linear relationship between inputs and output
Independence of observations
Homoscedasticity – constant variance of errors
Normal distribution of residuals
No multicollinearity among independent variables

Performance Metrics
Metric Meaning
R² Score % of variance explained by the model
MSE (Mean Squared
Average squared error of predictions
Error)
RMSE Root of MSE – interpretable like original units
MAE Mean absolute error – average magnitude of errors

Example (in Python using scikit-learn)


python
57
CopyEdit
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Sample data
X = [[1000], [1500], [2000]] # Size of house
y = [200000, 250000, 300000] # Price

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Create and train model


model = LinearRegression()
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Evaluate
print("MSE:", mean_squared_error(y_test, predictions))

Simple Linear Regression

What is Simple Linear Regression?


Simple linear regression is used when one independent variable (X) is used to predict one
dependent variable (Y). It assumes a linear relationship between the two variables.

Assumptions of Simple Linear Regression


For the model to be valid, the following assumptions should hold:
Linearity: The relationship between X and Y is linear.
Independence: Observations are independent of each other.
Homoscedasticity: The residuals (errors) have constant variance at every level of X.
Normality of residuals: The errors should be normally distributed.

How It Works (Mathematically)


You're trying to find the line:
Y=β0+β1XY = \beta_0 + \beta_1XY=β0+β1X
Where:
β0\beta_0β0: Intercept (where the line crosses the Y-axis)
β1\beta_1β1: Slope (how much Y changes for a one-unit change in X)
We determine β0\beta_0β0 and β1\beta_1β1 using the least squares method:
β1=∑(Xi−Xˉ)(Yi−Yˉ)∑(Xi−Xˉ)2\beta_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i
58
- \bar{X})^2}β1=∑(Xi−Xˉ)2∑(Xi−Xˉ)(Yi−Yˉ) β0=Yˉ−β1Xˉ\beta_0 = \bar{Y} - \beta_1 \bar{X}β0
=Yˉ−β1Xˉ

Evaluating the Model


R-squared (R2R^2R2): Measures how much of the variance in Y is explained by X. Ranges
from 0 to 1.
Mean Squared Error (MSE): Average of the squares of the residuals.
Residual plots: Help check for violations of assumptions.

Example in Python (using scikit-learn)


python
CopyEdit
from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
X = np.array([[150], [160], [170], [180], [190]]) # height in cm
y = np.array([50, 60, 65, 70, 80]) # weight in kg

model = LinearRegression()
model.fit(X, y)

print("Intercept (β₀):", model.intercept_)


print("Slope (β₁):", model.coef_[0])

When to Use Simple Linear Regression


When you're interested in predicting a continuous variable based on another single variable.
When the relationship between variables appears linear.
When you're doing basic predictive modeling or explanatory analysis.

59
Step-by-Step Simple Linear Regression

Step-by-Step: Simple Linear Regression

Step 1: Collect Your Data


You need two numeric variables:
X (Independent) Y (Dependent)
1 2
2 3
3 5
4 4
5 6

Step 2: Calculate the Means


Find the average of X and Y.
Xˉ=1+2+3+4+55=3.0\bar{X} = \frac{1+2+3+4+5}{5} = 3.0Xˉ=51+2+3+4+5=3.0
Yˉ=2+3+5+4+65=4.0\bar{Y} = \frac{2+3+5+4+6}{5} = 4.0Yˉ=52+3+5+4+6=4.0

Step 3: Calculate the Slope (β₁)


Use the formula:
β1=∑(Xi−Xˉ)(Yi−Yˉ)∑(Xi−Xˉ)2\beta_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i
- \bar{X})^2}β1=∑(Xi−Xˉ)2∑(Xi−Xˉ)(Yi−Yˉ)
Let’s compute it step by step:
X−XˉX - \ Y−YˉY - \ (X−Xˉ)(Y−Yˉ)(X - \bar{X})(Y - \ (X−Xˉ)2(X - \
XY
bar{X}X−Xˉ bar{Y}Y−Yˉ bar{Y})(X−Xˉ)(Y−Yˉ) bar{X})^2(X−Xˉ)2
1 2 -2 -2 4 4
2 3 -1 -1 1 1
3 5 0 1 0 0
4 4 1 0 0 1
5 6 2 2 4 4
Now sum:
∑(X−Xˉ)(Y−Yˉ)=4+1+0+0+4=9\sum (X - \bar{X})(Y - \bar{Y}) = 4 + 1 + 0 + 0 + 4 = 9∑(X−Xˉ)
(Y−Yˉ)=4+1+0+0+4=9 ∑(X−Xˉ)2=4+1+0+1+4=10\sum (X - \bar{X})^2 = 4 + 1 + 0 + 1 + 4 =
10∑(X−Xˉ)2=4+1+0+1+4=10 β1=910=0.9\beta_1 = \frac{9}{10} = 0.9β1=109=0.9

Step 4: Calculate the Intercept (β₀)


β0=Yˉ−β1⋅Xˉ=4.0−(0.9⋅3.0)=1.3\beta_0 = \bar{Y} - \beta_1 \cdot \bar{X} = 4.0 - (0.9 \cdot 3.0) =
1.3β0=Yˉ−β1⋅Xˉ=4.0−(0.9⋅3.0)=1.3

Step 5: Build the Regression Equation


Y^=1.3+0.9X\hat{Y} = 1.3 + 0.9XY^=1.3+0.9X
60
You can now predict Y for any given X using this equation.

Step 6: Make Predictions (Optional)


Predict Y when X=6X = 6X=6:
Y^=1.3+0.9⋅6=6.7\hat{Y} = 1.3 + 0.9 \cdot 6 = 6.7Y^=1.3+0.9⋅6=6.7

61
Step-by-Step Multiple Linear Regression

Step-by-Step: Multiple Linear Regression

Step 1: Understand the Model


The equation for multiple linear regression is:
Y=β0+β1X1+β2X2+⋯+βnXn+ϵY = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_nX_n + \
epsilonY=β0+β1X1+β2X2+⋯+βnXn+ϵ
YYY: Dependent variable (target)
X1,X2,…,XnX_1, X_2, \dots, X_nX1,X2,…,Xn: Independent variables (predictors)
β0\beta_0β0: Intercept
β1,β2,…,βn\beta_1, \beta_2, \dots, \beta_nβ1,β2,…,βn: Coefficients
ϵ\epsilonϵ: Error term
Step 2: Gather the Data
X₁ (Study X₂ (Sleep
Y (Test Score)
Hours) Hours)
2 7 50
4 6 60
6 5 65
8 4 70
10 3 85

Step 3: Represent in Matrix Form


X matrix (with a column of 1s for the intercept):
X=[1271461651841103]X = \begin{bmatrix} 1 & 2 & 7 \\ 1 & 4 & 6 \\ 1 & 6 & 5 \\ 1 & 8 & 4 \\ 1
& 10 & 3 \\ \end{bmatrix}X=1111124681076543
Y vector:
Y=[5060657085]Y = \begin{bmatrix} 50 \\ 60 \\ 65 \\ 70 \\ 85 \\ \end{bmatrix}Y=5060657085

Step 4: Calculate Coefficients Using Matrix Algebra


The formula to compute the coefficient vector β\betaβ is:
β=(XTX)−1XTY\beta = (X^TX)^{-1}X^TYβ=(XTX)−1XTY
62
Where:
XTX^TXT is the transpose of X
XTXX^TXXTX is the dot product of X transpose and X
(XTX)−1(X^TX)^{-1}(XTX)−1 is the inverse of that product
XTYX^TYXTY is the dot product of X transpose and Y
This gives you the values of:
β0\beta_0β0: Intercept
β1\beta_1β1: Coefficient for X₁ (Study Hours)
β2\beta_2β2: Coefficient for X₂ (Sleep Hours)
Step 5: Build the Final Equation
Assume after calculations (or using Python, see below) you get:
Y^=30+5X1−2X2\hat{Y} = 30 + 5X_1 - 2X_2Y^=30+5X1−2X2
This means:
Each extra hour of study increases the score by 5 points.
Each extra hour of sleep decreases the score by 2 points (possibly due to less study time).
Step 6: Make Predictions
For a student who studies 7 hours and sleeps 4 hours:
Y^=30+5(7)−2(4)=30+35−8=57\hat{Y} = 30 + 5(7) - 2(4) = 30 + 35 - 8 =
57Y^=30+5(7)−2(4)=30+35−8=57

Step 7: Evaluate the Model (Optional)


Metrics to use:
R-squared: How well the model explains the variance in Y.
Adjusted R-squared: Adjusts R² based on the number of predictors.
MSE / RMSE: Measures prediction error.

63
Conclusion

Tic Tac Toe is a deceptively simple game that offers profound educational and technical value.
Through its 3×3 grid and minimal rules, it introduces players and developers alike to key principles in
logic, strategy, and artificial intelligence. Its finite number of game states makes it a "solved game,"
meaning the outcome can be predicted with perfect play. This characteristic allows learners to explore
concepts such as game trees, state evaluation, and minimax algorithms in a manageable environment.

From a programming perspective, implementing Tic Tac Toe is often a beginner’s first foray into
interactive application development. It challenges the developer to think critically about user input
validation, game state management, condition checking, and user interface feedback. It also serves as
a gateway into more advanced topics like AI-driven opponents and GUI development.

In terms of game theory, Tic Tac Toe serves as a practical example of zero-sum games and Nash
equilibrium. It demonstrates how certain outcomes (win, lose, or draw) are not just probable but
predictable when both players act rationally.

Beyond academics and software, Tic Tac Toe highlights important human-centric ideas such as turn-
taking, fairness, and strategic planning. It is universally recognizable and easy to understand, making
it an enduring tool in both education and entertainment.

In summary, Tic Tac Toe is more than just a simple game — it’s a versatile platform for learning,
experimentation, and understanding foundational principles in computer science and game theory.

64
Project Conclusion

This project on Tic Tac Toe successfully demonstrates the design, logic, and implementation of a
classic two-player game using both programming logic and user interaction principles. Through the
development process, we implemented core game functionalities such as turn-based play, win/draw
detection, game reset, and a simple user interface.

The project not only strengthened our understanding of conditional logic, loops, and event handling,
but also emphasized the importance of user experience and code organization. It served as an
excellent introduction to building interactive applications, providing a practical foundation for more
complex game or software development in the future.

Overall, the Tic Tac Toe project achieved its objectives: creating a functional, user-friendly game
while reinforcing key programming concepts and encouraging logical thinking and problem-solving
skills.

65

You might also like