0% found this document useful (0 votes)
19 views

Python Programming CSEAIML

Uploaded by

kashish verma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Python Programming CSEAIML

Uploaded by

kashish verma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

In [1]: # Fle handling

# open function
help(open)

Help on function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=Tru


e, opener=None)
Open file and return a stream. Raise OSError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returned I/O object is closed, unless closefd is set to False.)

mode is an optional string that specifies the mode in which the file
is opened. It defaults to 'r' which means open for reading in text
mode. Other common values are 'w' for writing (truncating the file if
it already exists), 'x' for creating and writing to a new file, and
'a' for appending (which on some Unix systems, means that all writes
append to the end of the file regardless of the current seek position).
In text mode, if encoding is not specified the encoding used is platform
dependent: locale.getencoding() is called to get the current locale encoding.
(For reading and writing raw bytes use binary mode and leave encoding
unspecified.) The available modes are:

========= ===============================================================
Character Meaning
--------- ---------------------------------------------------------------
'r' open for reading (default)
'w' open for writing, truncating the file first
'x' create a new file and open it for writing
'a' open for writing, appending to the end of the file if it exists
'b' binary mode
't' text mode (default)
'+' open a disk file for updating (reading and writing)
========= ===============================================================

The default mode is 'rt' (open for reading text). For binary random
access, the mode 'w+b' opens and truncates the file to 0 bytes, while
'r+b' opens the file without truncation. The 'x' mode implies 'w' and
raises an `FileExistsError` if the file already exists.

Python distinguishes between files opened in binary and text modes,


even when the underlying operating system doesn't. Files opened in
binary mode (appending 'b' to the mode argument) return contents as
bytes objects without any decoding. In text mode (the default, or when
't' is appended to the mode argument), the contents of the file are
returned as strings, the bytes having been first decoded using a
platform-dependent encoding or using the specified encoding if given.

buffering is an optional integer used to set the buffering policy.


Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
line buffering (only usable in text mode), and an integer > 1 to indicate
the size of a fixed-size chunk buffer. When no buffering argument is
given, the default buffering policy works as follows:

* Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying device's
"block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
On many systems, the buffer will typically be 4096 or 8192 bytes long.

* "Interactive" text files (files for which isatty() returns True)


use line buffering. Other text files use the policy described above
for binary files.

encoding is the name of the encoding used to decode or encode the


file. This should only be used in text mode. The default encoding is
platform dependent, but any encoding supported by Python can be
passed. See the codecs module for the list of supported encodings.

errors is an optional string that specifies how encoding errors are to


be handled---this argument should not be used in binary mode. Pass
'strict' to raise a ValueError exception if there is an encoding error
(the default of None has the same effect), or pass 'ignore' to ignore
errors. (Note that ignoring encoding errors can lead to data loss.)
See the documentation for codecs.register or run 'help(codecs.Codec)'
for a list of the permitted encoding error strings.

newline controls how universal newlines works (it only applies to text
mode). It can be None, '', '\n', '\r', and '\r\n'. It works as
follows:

* On input, if newline is None, universal newlines mode is


enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
these are translated into '\n' before being returned to the
caller. If it is '', universal newline mode is enabled, but line
endings are returned to the caller untranslated. If it has any of
the other legal values, input lines are only terminated by the given
string, and the line ending is returned to the caller untranslated.

* On output, if newline is None, any '\n' characters written are


translated to the system default line separator, os.linesep. If
newline is '' or '\n', no translation takes place. If newline is any
of the other legal values, any '\n' characters written are translated
to the given string.

If closefd is False, the underlying file descriptor will be kept open


when the file is closed. This does not work when a file name is given
and must be True in that case.

A custom opener can be used by passing a callable as *opener*. The


underlying file descriptor for the file object is then obtained by
calling *opener* with (*file*, *flags*). *opener* must return an open
file descriptor (passing os.open as *opener* results in functionality
similar to passing None).

open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.

It is also possible to use a string or bytearray as a file for both


reading and writing. For strings StringIO can be used like a file
opened in a text mode, and for bytes a BytesIO can be used like a file
opened in a binary mode.

In [50]: file_obj = open("CSEAIML1_sample.txt")


# file_obj.read()
# print(file_obj.read())
# equivalent to number of rows in file
# len(file_obj.readlines())
lis_var = file_obj.readlines()
# file is read it will be string variable
type(lis_var[2])
lis_var[2].split()

['Line',
Out[50]:
'3',
'-',
'File',
'support',
'various',
'modes',
'for',
'reading',
'and',
'writing']

In [63]: print(file_obj.mode)
'''
first need to close the file
open file in required mode -
w - write - delete the existing content form file
a - append - add content at the end of file.
'''
file_obj.close()
file_obj.closed

file_obj = open("CSEAIML1_sample.txt", 'a')


file_obj.write("the content write inside file")

file_obj.close()
file_obj = open("CSEAIML1_sample.txt")
file_obj.readlines()

r
['Line 1 - This is python class\n',
Out[63]:
'Line 2 - We are learning python file handling\n',
'Line 3 - File support various modes for reading and writing\n',
'Line 4 - Python programming\n',
'Line 5 - We are learning online classthe content write inside filethe content write in
side filethe content write inside filethe content write inside filethe content write ins
ide filethe content write inside filethe content write inside file']

In [29]: print(file_obj.mode)
'''
We first need to close the file
We open the file in required mode - write w/a
w - write inside the file while overiting the existing data
a - append the data at the end of file
'''
file_obj.close()
file_obj.closed
file_obj = open("CSEAIML1_sample.txt", 'a')
file_obj.write("text that we want to write inside file")
'''Returns the number of characters written (which is always equal to
the length of the string). '''

a
'Returns the number of characters written (which is always equal to\nthe length of the s
Out[29]:
tring). '

In [34]: file_obj.close()
file_obj.closed
file_obj = open("CSEAIML1_sample.txt")
file_obj.readlines()

['Line 1 - This is python class\n',


Out[34]:
'Line 2 - We are learning python file handling\n',
'Line 3 - File support various modes for reading and writing\n',
'Line 4 - Python programming\n',
'Line 5 - We are learning online classtext that we want to write inside filetext that w
e want to write inside filetext that we want to write inside file']

Write a program to create file with personal information

In [ ]: file_obj.write()

In [19]: print(file_obj.mode)
file_obj.close()
file_obj = open("CSEAIML1_sample.txt", 'r')
# file_obj.write("We are learning online classes")
print(file_obj.read())

a+
Line 1 - This is python class
Line 2 - We are learning python file handling
Line 3 - File support various modes for reading and writing
Line 4 - Python programmingWe are learning online classesWe are learning online classes

In [1]: # print(file_obj, type(file_obj))


# file_obj.read(10)
list_file = file_obj.readlines()
''' What is roll no of Ankit?'''
str_var = list_file[2]
# print(str_var, type(str_var))
list_var = str_var.split()
list_var[len(list_var)-1]

---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[1], line 6
4 list_file = file_obj.readlines()
5 ''' What is roll no of Ankit?'''
----> 6 str_var = list_file[2]
7 # print(str_var, type(str_var))
8 list_var = str_var.split()

IndexError: list index out of range

In [ ]: # file_obj = open("CSEAIML1_sample.txt")
# file_obj.read(10)
# list_var = file_obj.readlines()
# list_var

In [ ]: str_var = list_var[2]
list_var2 = str_var.split()
list_var2[1:]

In [18]: # file_obj.write("Mukul is a good student")


# file_obj.read()
# file_obj.mode

In [13]: # writing inside file


# file_obj.write("Line 4 - Python programming")
# file_obj.mode()
# file_obj.close()
file_obj2 = open("CSEAIML1_sample.txt", 'w')
# file_obj2.mode
file_obj2.write("Line 4 - Python programming")
file_obj2.close()
In [14]: with open('CSEAIML1_sample.txt', 'r') as f:
f.read()

Implementing all the functions in File Handling

In [15]: import os

def create_file(filename):
try:
with open(filename, 'w') as f:
f.write('Hello, world!\n')
print("File " + filename + " created successfully.")
except IOError:
print("Error: could not create file " + filename)

def read_file(filename):
try:
with open(filename, 'r') as f:
contents = f.read()
print(contents)
except IOError:
print("Error: could not read file " + filename)

def append_file(filename, text):


try:
with open(filename, 'a') as f:
f.write(text)
print("Text appended to file " + filename + " successfully.")
except IOError:
print("Error: could not append to file " + filename)

def rename_file(filename, new_filename):


try:
os.rename(filename, new_filename)
print("File " + filename + " renamed to " + new_filename + " successfully.")
except IOError:
print("Error: could not rename file " + filename)

def delete_file(filename):
try:
os.remove(filename)
print("File " + filename + " deleted successfully.")
except IOError:
print("Error: could not delete file " + filename)

if __name__ == '__main__':
filename = "example.txt"
new_filename = "new_example.txt"

create_file(filename)
read_file(filename)
append_file(filename, "This is some additional text.\n")
read_file(filename)
rename_file(filename, new_filename)
read_file(new_filename)
delete_file(new_filename)

File example.txt created successfully.


Hello, world!

Text appended to file example.txt successfully.


Hello, world!
This is some additional text.
File example.txt renamed to new_example.txt successfully.
Hello, world!
This is some additional text.

File new_example.txt deleted successfully.

Excercise : Create a file with personal information using create, write, read, rename function.

In [5]: # Program to show various ways to read and


# write data in a file.
file1 = open("myfile.txt", "w")
L = ["This is Delhi \n", "This is Paris \n", "This is London \n"]

# \n is placed to indicate EOL (End of Line)


file1.write("Hello \n")
file1.writelines(L)
file1.close() # to change file access modes

file1 = open("myfile.txt", "r+")

print("Output of Read function is ")


print(file1.read())
print()

# seek(n) takes the file handle to the nth


# byte from the beginning.
file1.seek(0)

print("Output of Readline function is ")


print(file1.readline())
print()

file1.seek(0)

# To show difference between read and readline


print("Output of Read(9) function is ")
print(file1.read(9))
print()

file1.seek(0)

print("Output of Readline(9) function is ")


print(file1.readline(9))

file1.seek(0)
# readlines function
print("Output of Readlines function is ")
print(file1.readlines())
print()
file1.close()

Output of Read function is


Hello
This is Delhi
This is Paris
This is London

Output of Readline function is


Hello

Output of Read(9) function is


Hello
Th
Output of Readline(9) function is
Hello

Output of Readlines function is


['Hello \n', 'This is Delhi \n', 'This is Paris \n', 'This is London \n']

In [1]: # Python program to illustrate


# Append vs write mode
file1 = open("myfile.txt","w")
L = ["This is Delhi \n","This is Paris \n","This is London \n"]
file1.writelines(L)
file1.close()

# Append-adds at last
file1 = open("myfile.txt","a")#append mode
file1.write("Today \n")
file1.close()

file1 = open("myfile.txt","r")
print("Output of Readlines after appending")
print(file1.readlines())
print()
file1.close()

# Write-Overwrites
file1 = open("myfile.txt","w")#write mode
file1.write("Tomorrow \n")
file1.close()

file1 = open("myfile.txt","r")
print("Output of Readlines after writing")
print(file1.readlines())
print()
file1.close()

Output of Readlines after appending


['This is Delhi \n', 'This is Paris \n', 'This is London \n', 'Today \n']

Output of Readlines after writing


['Tomorrow \n']

Python seek() function


In Python, seek() function is used to change the position of the File Handle to a given specific position. File
handle is like a cursor, which defines from where the data has to be read or written in the file.

The Python File seek() method sets the file's cursor at a specified position in the current file. A file's cursor
is used to store the current position of the read and write operations in a file; and this method can move this
file cursor forward or backward.

For instance, whenever we open a file to read from it, the file cursor is always positioned at 0. It is gradually
incremented as we progress through the content of the file. But, some scenarios require the file to be read
from a particular position in the file. This is where this method comes into picture.

The seek() method sets the current file position in a file stream.

The seek() method also returns the new postion.

the syntax for the Python File seek() method fileObject.seek(offset[, whence])

offset -> A number representing the position to set the current file stream position. whence − (Optional) It
defaults to 0; which means absolute file positioning, other values are 1 which means seek relative to the
current position and 2 means seek relative to the file's end.

0 (default): refers to the beginning of the file

1: refers to the current position of the file pointer

2: refers to the end of the file

In [3]: # Change the current file position to 4, and return the rest of the line:
f = open("CSEAIML1_sample.txt", "r")
f.seek(4)
print(f.readline())

4 - Python programming

In [6]: # Return the new position:


f = open("CSEAIML1_sample.txt", "r")
print(f.seek(10))

10

In [2]: f = open("GfG.txt", "r")


f.seek()
f.closed()

In [7]: # Python program to demonstrate


# seek() methods

# Opening "GfG.txt" text file


f = open("CSEAIML1_sample.txt", "r")

# Second parameter is by default 0


# sets Reference point to twentieth
# index position from the beginning
f.seek(20)

# prints current position


'''
The tell() method goes hand-in-hand with the seek() method.
In the following example, to use the seek() method
to set the file cursor at a specific position and then,
use the tell() method to retrieve this position set.
'''
print(f.tell())

print(f.readline())
f.close()

20
ramming

In [10]: '''
Excercise:
Create a file
Add content inside the file
use seek() funciton
use tell() function
'''

'\nExcercise: \n Create a file \n Add content inside the file\n use seek() f
Out[10]:
unciton\n use tell() function \n'

Python Package?
Python Packages are a way to organize and structure your Python code into reusable components. Think of
it like a folder that contains related Python files (modules) that work together to provide certain functionality.
Packages help keep your code organized, make it easier to manage and maintain, and allow you to share
your code with others. They’re like a toolbox where you can store and organize your tools (functions and
classes) for easy access and reuse in different projects.

Python packages are directories of Python modules containing additional functionalities.


Packages are essential for extending Python's capabilities beyond its built-in functions.
They provide reusable code for various purposes, including data manipulation, scientific computing,
and visualization.

Installing Python Packages: Packages can be installed using the pip package manager,
e.g., pip install numpy.
Importing Packages: After installation, packages can be imported into Python scripts
using the import statement, e.g., import numpy as np.
Exploring Package Functionalities: Once imported, the functionalities provided by
packages can be explored and utilized.

Introduction to Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical mathematics
extension NumPy.
It provides an object-oriented API for embedding plots into applications.
Matplotlib produces high-quality figures suitable for publication.

Using Matplotlib
Installation: Matplotlib can be installed using pip, the Python package installer.
pip install matplotlib
Importing Matplotlib: In Python scripts, import the matplotlib.pyplot module to access
Matplotlib's plotting functions.
import matplotlib.pyplot as plt
Creating Basic Plots: Matplotlib provides various types of plots, such as line plots, scatter
plots, bar plots, histograms, etc.

Matplotlib Example: Matplotlib is a popular plotting library for Python. Below is a simple example of
how to create a basic plot using matplotlib.
In [9]: # Line Plot:
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Create a plot
plt.plot(x, y)

# Add labels and title


plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Plot')

# Show plot
plt.show()

In [10]: # Scatter Plot


import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Create a scatter plot


plt.scatter(x, y)

# Add labels and title


plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Scatter Plot')

# Show plot
plt.show()
In [11]: # Bar Plot
import matplotlib.pyplot as plt

# Sample data
categories = ['A', 'B', 'C', 'D']
values = [10, 20, 15, 25]

# Create a bar plot


plt.bar(categories, values, color='skyblue')

# Add labels and title


plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Plot Example')

# Show plot
plt.show()
In [13]: # Histogram
import matplotlib.pyplot as plt
import numpy as np

# Generate random data


data = np.random.randn(1000)

# Create a histogram
plt.hist(data, bins=30, color='orange', edgecolor='black')

# Add labels and title


plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram Example')

# Show plot
plt.show()
In [15]: # Pie Chart
import matplotlib.pyplot as plt

# Sample data
sizes = [30, 20, 25, 15, 10]
labels = ['A', 'B', 'C', 'D', 'E']

# Create a pie chart


plt.pie(sizes, labels=labels, autopct='%1.1f%%', colors=['gold', 'lightcoral', 'lightsky

# Add title
plt.title('Pie Chart Example')

# Show plot
plt.show()
In [19]: # Subplots
# import Matplotlib's pyplot module and NumPy library.
import matplotlib.pyplot as plt
import numpy as np

# Sample data
'''
Generate sample data for our plots. In this case,
x represents an array of values from 0 to 2π,
and y1 and y2 represent the sine and cosine values of x, respectively.
'''
x = np.linspace(0, 2 * np.pi, 400)
y1 = np.sin(x)
y2 = np.cos(x)

# Create subplots
'''
the subplots() function to create a figure (fig) and a set of subplots (axes).
The arguments (2, 1) indicate that we want to create 2 rows and 1 column of subplots.
'''
fig, axes = plt.subplots(2, 1)

# Plot data on subplots


'''
We plot the data on each subplot (axes). axes[0] refers to the first subplot (top subplo
and axes[1] refers to the second subplot (bottom subplot).
We plot the sine function on the first subplot and the cosine function on the second sub
We also set titles for each subplot.
'''
axes[0].plot(x, y1, color='blue')
axes[0].set_title('Sin Function')

axes[1].plot(x, y2, color='red')


axes[1].set_title('Cos Function')

# Adjust layout
'''
automatically adjust the layout of subplots to prevent overlapping labels, titles, etc.
'''
plt.tight_layout()

# Show plot
'''
display the plot containing both subplots.
'''
plt.show()

NumPy (Numerical Python)


NumPy (Numerical Python) is a powerful library for numerical computing in
Python. It provides support for arrays, matrices, and many mathematical
functions.
Numpy is a fundamental package for scientific computing in Python. #### Key
Features of NumPy
Ndarray: A powerful n-dimensional array object.
Broadcasting: A set of rules for applying binary ufuncs (universal functions)
element-wise.
Vectorization: Elimination of the need for many loop constructs.
Standard Mathematical Functions: Comprehensive mathematical functions.
Linear Algebra: Tools for linear algebra and random number generation.

Installation

pip install numpy ##### Importing NumPy


import numpy as np

Numpy Example:
Here's a simple example of how to create and manipulate arrays using numpy.

In [21]: # Creating Arrays


# a NumPy array using np.array():
import numpy as np

# Creating a 1D array
arr1 = np.array([1, 2, 3, 4, 5])
print("1D array:", arr1)

# Creating a 2D array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print("2D array:\n", arr2)

1D array: [1 2 3 4 5]
2D array:
[[1 2 3]
[4 5 6]]

Array Attributes

NumPy arrays have several attributes:

ndim: Number of dimensions.


shape: Dimensions of the array.
size: Total number of elements.
dtype: Data type of the elements.
itemsize: Size of each element in bytes.

In [22]: print("Number of dimensions:", arr2.ndim)


print("Shape of the array:", arr2.shape)
print("Total number of elements:", arr2.size)
print("Data type of elements:", arr2.dtype)
print("Size of each element:", arr2.itemsize)

Number of dimensions: 2
Shape of the array: (2, 3)
Total number of elements: 6
Data type of elements: int32
Size of each element: 4

In [23]: import numpy as np

# Create a numpy array


arr = np.array([1, 2, 3, 4, 5])

# Perform some operations


print("Original array:", arr)
print("Sum of array elements:", np.sum(arr))
print("Mean of array elements:", np.mean(arr))
print("Maximum element:", np.max(arr))

Original array: [1 2 3 4 5]
Sum of array elements: 15
Mean of array elements: 3.0
Maximum element: 5

Array Operations

NumPy supports a wide range of array operations:

Arithmetic Operations: Element-wise addition, subtraction, multiplication, and division.


Universal Functions: Mathematical functions such as np.sum(), np.mean(), np.sqrt(), etc.
In [25]: # Arithmetic operations
arr = np.array([1, 2, 3, 4])
print("Array:", arr)
print("Array + 2:", arr + 2)
print("Array * 2:", arr * 2)

# Universal functions
print("Sum of array:", np.sum(arr))
print("Mean of array:", np.mean(arr))
print("Square root of array:", np.sqrt(arr))

Array: [1 2 3 4]
Array + 2: [3 4 5 6]
Array * 2: [2 4 6 8]
Sum of array: 10
Mean of array: 2.5
Square root of array: [1. 1.41421356 1.73205081 2. ]

Slicing and Indexing

slice and index NumPy arrays just like Python lists.

In [27]: # Slicing
print("First two elements:", arr[:2])

# Indexing
print("Element at index 1:", arr[1])

# Slicing 2D arrays
print("First row:", arr2[0, :])
print("Second column:", arr2[:, 1])

First two elements: [1 2]


Element at index 1: 2
First row: [1 2 3]
Second column: [2 5]

Reshaping Arrays

Change the shape of an array using reshape().

In [29]: arr = np.array([[1, 2, 3], [4, 5, 6]])


reshaped_arr = arr.reshape(3, 2)
print("Reshaped array:\n", reshaped_arr)

Reshaped array:
[[1 2]
[3 4]
[5 6]]

In [32]: # Example Program: Element-wise Operations and Statistics


# Let's create a program that demonstrates element-wise operations and computes some basi
'''
The program creates a NumPy array and performs element-wise operations (squaring and dou
It computes basic statistics (mean, median, and standard deviation) on the array.
'''
import numpy as np

# Create an array
'''
Here, we create a 1D NumPy array called data with the elements [5, 10, 15, 20, 25].
'''
data = np.array([5, 10, 15, 20, 25])
# Perform element-wise operations
'''
This operation squares each element of the array data. The result is stored in data_squa
Element-wise squaring: [5^2, 10^2, 15^2, 20^2, 25^2] results in [25, 100, 225, 400, 625]
'''
data_squared = data ** 2
'''
This operation multiplies each element of the array data by 2. The result is stored in d
Element-wise doubling: [5*2, 10*2, 15*2, 20*2, 25*2] results in [10, 20, 30, 40, 50].
'''
data_doubled = data * 2

# Compute basic statistics


'''
This calculates the mean (average) of the array data.
The mean is the sum of all elements divided by the number of elements.
Mean: (5 + 10 + 15 + 20 + 25) / 5 = 75 / 5 = 15.
'''
mean = np.mean(data)
'''
This calculates the median of the array data. The median is the middle value when the el
Median: The sorted array is [5, 10, 15, 20, 25], and the middle value is 15.
'''
median = np.median(data)
'''
This calculates the standard deviation of the array data.
The standard deviation measures the amount of variation or dispersion of the elements.
'''
std_dev = np.std(data)

# Display the results


print("Original data:", data)
print("Squared data:", data_squared)
print("Doubled data:", data_doubled)
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std_dev)

Original data: [ 5 10 15 20 25]


Squared data: [ 25 100 225 400 625]
Doubled data: [10 20 30 40 50]
Mean: 15.0
Median: 15.0
Standard Deviation: 7.0710678118654755

Pandas
Pandas is a powerful library for data manipulation and analysis in Python. It provides data
structures like Series and DataFrame which are essential for handling structured data.
Installation - pip install pandas
Importing Pandas - import pandas as pd

Key Features of Pandas


Series: One-dimensional labeled array capable of holding any data type.
DataFrame: Two-dimensional labeled data structure with columns of potentially
different types.
Data Cleaning: Handling missing data, data alignment, etc.
Data Transformation: Merging, reshaping, selecting, and slicing datasets.
Input/Output: Tools to read and write data in various formats (CSV, Excel, SQL,
etc.).

Pandas Example:

Here's a simple example of how to create a DataFrame and perform some basic operations
using pandas.

In [34]: # Creating Data Structures


# Series
# A Pandas Series is a one-dimensional array-like object containing an array of data and
import pandas as pd

# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print("Series:\n", s)

Series:
0 1
1 3
2 5
3 7
4 9
dtype: int64

In [41]: # DataFrame
# A DataFrame is a two-dimensional labeled data structure with columns of potentially di
# Creating a DataFrame
data = {
'Name': ['Ankit', 'Khusi', 'Pankaj'],
'Age': [24, 27, 22],
'City': ['New Delhi', 'Mumbai', 'Lucknow']
}
df = pd.DataFrame(data)
print("DataFrame:\n", df)

DataFrame:
Name Age City
0 Ankit 24 New Delhi
1 Khusi 27 Mumbai
2 Pankaj 22 Lucknow

DataFrame Attributes

DataFrames have several attributes:

shape: Dimensions of the DataFrame.


columns: Column labels.
index: Row labels.

In [42]: print("Shape of DataFrame:", df.shape)


print("Column labels:", df.columns)
print("Row labels:", df.index)

Shape of DataFrame: (3, 3)


Column labels: Index(['Name', 'Age', 'City'], dtype='object')
Row labels: RangeIndex(start=0, stop=3, step=1)

DataFrame Operations
Selection: Selecting columns and rows.
Filtering: Filtering rows based on conditions.
Aggregation: Applying functions to DataFrame.

In [44]: # Selecting a column


print("Name column:\n", df['Name'])

# Selecting multiple columns


print("Name and Age columns:\n", df[['Name', 'Age']])

# Selecting rows by index


print("First row:\n", df.iloc[0])

# Filtering rows
filtered_df = df[df['Age'] > 23]
print("Filtered DataFrame:\n", filtered_df)

# Aggregation
mean_age = df['Age'].mean()
print("Mean age:", mean_age)

Name column:
0 Ankit
1 Khusi
2 Pankaj
Name: Name, dtype: object
Name and Age columns:
Name Age
0 Ankit 24
1 Khusi 27
2 Pankaj 22
First row:
Name Ankit
Age 24
City New Delhi
Name: 0, dtype: object
Filtered DataFrame:
Name Age City
0 Ankit 24 New Delhi
1 Khusi 27 Mumbai
Mean age: 24.333333333333332

Data Cleaning

Handling Missing Data: Filling or dropping missing values.


Renaming Columns: Changing column names.
Removing Duplicates: Dropping duplicate rows.

In [46]: # Handling missing data


df_with_nan = df.copy()
df_with_nan.loc[1, 'Age'] = None
print("DataFrame with NaN:\n", df_with_nan)

# Filling missing values


df_filled = df_with_nan.fillna(0)
print("Filled DataFrame:\n", df_filled)

# Dropping missing values


df_dropped = df_with_nan.dropna()
print("Dropped NaN DataFrame:\n", df_dropped)

# Renaming columns
df_renamed = df.rename(columns={'Name': 'Full Name', 'Age': 'Years'})
print("Renamed DataFrame:\n", df_renamed)

# Removing duplicates
df_duplicates = df.append(df.iloc[0], ignore_index=True)
print("DataFrame with duplicates:\n", df_duplicates)
df_no_duplicates = df_duplicates.drop_duplicates()
print("DataFrame without duplicates:\n", df_no_duplicates)

DataFrame with NaN:


Name Age City
0 Ankit 24.0 New Delhi
1 Khusi NaN Mumbai
2 Pankaj 22.0 Lucknow
Filled DataFrame:
Name Age City
0 Ankit 24.0 New Delhi
1 Khusi 0.0 Mumbai
2 Pankaj 22.0 Lucknow
Dropped NaN DataFrame:
Name Age City
0 Ankit 24.0 New Delhi
2 Pankaj 22.0 Lucknow
Renamed DataFrame:
Full Name Years City
0 Ankit 24 New Delhi
1 Khusi 27 Mumbai
2 Pankaj 22 Lucknow
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_18532\3535841614.py in ?()
15 df_renamed = df.rename(columns={'Name': 'Full Name', 'Age': 'Years'})
16 print("Renamed DataFrame:\n", df_renamed)
17
18 # Removing duplicates
---> 19 df_duplicates = df.append(df.iloc[0], ignore_index=True)
20 print("DataFrame with duplicates:\n", df_duplicates)
21 df_no_duplicates = df_duplicates.drop_duplicates()
22 print("DataFrame without duplicates:\n", df_no_duplicates)

C:\anaconda3\Lib\site-packages\pandas\core\generic.py in ?(self, name)


5985 and name not in self._accessors
5986 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5987 ):
5988 return self[name]
-> 5989 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'append'

Input/Output

Reading from CSV: Reading data from a CSV file.


Writing to CSV: Writing data to a CSV file.

In [48]: # Reading from CSV


df_from_csv = pd.read_csv('example.csv')
print("DataFrame from CSV:\n", df_from_csv)

# Writing to CSV
df.to_csv('output.csv', index=False)

---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[48], line 2
1 # Reading from CSV
----> 2 df_from_csv = pd.read_csv('example.csv')
3 print("DataFrame from CSV:\n", df_from_csv)
5 # Writing to CSV

File C:\anaconda3\Lib\site-packages\pandas\io\parsers\readers.py:912, in read_csv(filepa


th_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converte
rs, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values,
keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_form
at, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize,
compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escap
echar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_
memory, memory_map, float_precision, storage_options, dtype_backend)
899 kwds_defaults = _refine_defaults_read(
900 dialect,
901 delimiter,
(...)
908 dtype_backend=dtype_backend,
909 )
910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)

File C:\anaconda3\Lib\site-packages\pandas\io\parsers\readers.py:577, in _read(filepath_


or_buffer, kwds)
574 _validate_names(kwds.get("names", None))
576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
579 if chunksize or iterator:
580 return parser

File C:\anaconda3\Lib\site-packages\pandas\io\parsers\readers.py:1407, in TextFileReade


r.__init__(self, f, engine, **kwds)
1404 self.options["has_index_names"] = kwds["has_index_names"]
1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)

File C:\anaconda3\Lib\site-packages\pandas\io\parsers\readers.py:1661, in TextFileReade


r._make_engine(self, f, engine)
1659 if "b" not in mode:
1660 mode += "b"
-> 1661 self.handles = get_handle(
1662 f,
1663 mode,
1664 encoding=self.options.get("encoding", None),
1665 compression=self.options.get("compression", None),
1666 memory_map=self.options.get("memory_map", False),
1667 is_text=is_text,
1668 errors=self.options.get("encoding_errors", "strict"),
1669 storage_options=self.options.get("storage_options", None),
1670 )
1671 assert self.handles is not None
1672 f = self.handles.handle

File C:\anaconda3\Lib\site-packages\pandas\io\common.py:859, in get_handle(path_or_buf,


mode, encoding, compression, memory_map, is_text, errors, storage_options)
854 elif isinstance(handle, str):
855 # Check whether the filename is to be opened in binary mode.
856 # Binary mode does not support 'encoding' and 'newline'.
857 if ioargs.encoding and "b" not in ioargs.mode:
858 # Encoding
--> 859 handle = open(
860 handle,
861 ioargs.mode,
862 encoding=ioargs.encoding,
863 errors=errors,
864 newline="",
865 )
866 else:
867 # Binary mode
868 handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'example.csv'

In [51]: # Example Program: Data Analysis with Pandas


# Let's create a program that demonstrates data manipulation and analysis using Pandas.
import pandas as pd

# Sample data
data = {
'Name': ['Ankit', 'Barkha', 'Chirag', 'Dravid', 'Esha'],
'Age': [24, 27, 22, 32, 29],
'City': ['New Delhi', 'Lucknow', 'Kanpur', 'Gurgao', 'Lucknow'],
'Salary': [70000, 80000, 60000, 90000, 85000]
}
df = pd.DataFrame(data)

# Display the DataFrame


print("Original DataFrame:\n", df)

# Calculate the mean salary


mean_salary = df['Salary'].mean()
print("Mean Salary:", mean_salary)

# Filter rows where Age > 25


filtered_df = df[df['Age'] > 25]
print("Filtered DataFrame (Age > 25):\n", filtered_df)

# Group by 'City' and calculate the mean salary


grouped_df = df.groupby('City')['Salary'].mean().reset_index()
print("Mean Salary by City:\n", grouped_df)

# Add a new column 'Age Group' based on age


df['Age Group'] = pd.cut(df['Age'], bins=[20, 25, 30, 35], labels=['20-25', '25-30', '30
print("DataFrame with Age Group:\n", df)

Original DataFrame:
Name Age City Salary
0 Ankit 24 New Delhi 70000
1 Barkha 27 Lucknow 80000
2 Chirag 22 Kanpur 60000
3 Dravid 32 Gurgao 90000
4 Esha 29 Lucknow 85000
Mean Salary: 77000.0
Filtered DataFrame (Age > 25):
Name Age City Salary
1 Barkha 27 Lucknow 80000
3 Dravid 32 Gurgao 90000
4 Esha 29 Lucknow 85000
Mean Salary by City:
City Salary
0 Gurgao 90000.0
1 Kanpur 60000.0
2 Lucknow 82500.0
3 New Delhi 70000.0
DataFrame with Age Group:
Name Age City Salary Age Group
0 Ankit 24 New Delhi 70000 20-25
1 Barkha 27 Lucknow 80000 25-30
2 Chirag 22 Kanpur 60000 20-25
3 Dravid 32 Gurgao 90000 30-35
4 Esha 29 Lucknow 85000 25-30

GUI programming with Tkinter:


Tkinter is the standard GUI (Graphical User Interface) library for Python. It provides a fast and
easy way to create desktop applications.

Key Features of Tkinter


Widgets: Various widgets such as buttons, labels, text boxes, and more.
Geometry Management: Organizing widgets in the window.
Event Handling: Responding to user inputs like mouse clicks and keyboard
events.
Customization: Configuring the appearance and behavior of widgets.

Installation
Tkinter is included with Python standard library, so no installation is required. If you
are using a Python distribution that does not include Tkinter, you can install it using
your package manager (for example, sudo apt-get install python3-tk on Debian-based
systems).

Importing Tkinter

import tkinter as tk
from tkinter import messagebox

Creating a Basic Tkinter Application

Steps to Create a Tkinter Application:

Import Tkinter: Import the Tkinter module.


Create the main window: Create an instance of the Tk class.
Add widgets: Add various widgets to the main window.
Run the application: Start the application's main event loop.

In [56]: import tkinter as tk


from tkinter import messagebox

# Step 1: Create the main window


root = tk.Tk() # Initializes the main application window.
root.title("Basic Tkinter App") # Sets the title of the window
root.geometry("300x200") # Sets the dimensions of the window.

# Step 2: Add widgets


label = tk.Label(root, text="Hello, Tkinter!") # Creates a label widget with the text "H
label.pack(pady=10) # Packs the label into the window with 10 pixels of vertical padding

def on_button_click(): # Defines a function that will be called when the button is click
messagebox.showinfo("Information", "Button Clicked!")

'''
Creates a button widget with the text "Click Me" that calls on_button_click when clicked
'''
button = tk.Button(root, text="Click Me", command=on_button_click)
button.pack(pady=10) # Packs the button into the window with 10 pixels of vertical paddin

# Step 3: Run the application


root.mainloop() # Starts the Tkinter event loop, which waits for user interaction.

In [59]: # Simple Tkinter Application


'''
a simple application that includes a label, an entry, and a button.
When the button is clicked, the text from the entry is displayed in a message box.
'''
import tkinter as tk
from tkinter import messagebox

# Create the main window


# Initializes the main application window with a title and specific dimensions.
root = tk.Tk()
root.title("Simple Tkinter App")
root.geometry("300x200")

# Add a label
# Creates and packs a label prompting the user to enter their name.
label = tk.Label(root, text="Enter your name:")
label.pack(pady=10)

# Add an entry
# Creates and packs an entry widget for the user to input their name.
entry = tk.Entry(root)
entry.pack(pady=10)

# Function to display the entered name


'''
Defines a function show_name that retrieves the text from
the entry widget and displays it in a message box.
'''
def show_name():
name = entry.get()
messagebox.showinfo("Name", f"Hello, {name}!")

# Add a button
'''
Creates and packs a button that calls the show_name function when clicked.
'''
button = tk.Button(root, text="Submit", command=show_name)
button.pack(pady=10)

# Run the application


root.mainloop()

In [7]: import tkinter as tk

# Create a window
window = tk.Tk()

# Set window title


window.title("My First Tkinter Window")

# Set window size


window.geometry("300x200")

# Add a label
label = tk.Label(window, text="Hello, Tkinter!")
label.pack()

# Run the event loop


window.mainloop()

Tkinter Widgets

Tkinter provides a variety of widgets to create interactive user interfaces. Here are
some of the most commonly used widgets in Tkinter:
Label: Displays text or an image.
Button: Triggers an action when clicked.
Entry: A single-line text field for user input.
Text: A multi-line text field.
Frame: A container for other widgets.
Checkbutton: A checkbox widget.
Radiobutton: A radio button widget.
Listbox: Displays a list of items.
Scrollbar: Adds a scrollbar to another widget.
Menu: Creates a menu.
Scale: A slider for selecting a numeric value..
Combobox: A combination of a dropdown list and an entry field (requires ttk).

In [60]: # Label: Displays static text or an image.


import tkinter as tk

root = tk.Tk()
root.title("Label Example")

label = tk.Label(root, text="Hello, Tkinter!")


label.pack(pady=10)

root.mainloop()

In [61]: # Button: Triggers a function when clicked.


import tkinter as tk
from tkinter import messagebox

def on_button_click():
messagebox.showinfo("Information", "Button Clicked!")

root = tk.Tk()
root.title("Button Example")

button = tk.Button(root, text="Click Me", command=on_button_click)


button.pack(pady=10)

root.mainloop()

In [62]: # Entry: A single-line text field for user input.


import tkinter as tk

root = tk.Tk()
root.title("Entry Example")

entry = tk.Entry(root)
entry.pack(pady=10)

root.mainloop()

In [63]: # Text: A multi-line text field for user input.


import tkinter as tk

root = tk.Tk()
root.title("Text Example")

text = tk.Text(root, height=5, width=30)


text.pack(pady=10)
root.mainloop()

In [65]: # Frame: A container for organizing other widgets.


import tkinter as tk

root = tk.Tk()
root.title("Frame Example")

frame = tk.Frame(root, borderwidth=2, relief=tk.SUNKEN)


frame.pack(padx=10, pady=10)

label = tk.Label(frame, text="Inside Frame")


label.pack()

root.mainloop()

In [66]: # LabelFrame: A LabelFrame is a container widget like Frame, but with a label.
import tkinter as tk

root = tk.Tk()
root.title("LabelFrame Example")

labelframe = tk.LabelFrame(root, text="This is a LabelFrame", padx=10, pady=10)


labelframe.pack(padx=10, pady=10)

label = tk.Label(labelframe, text="Inside LabelFrame")


label.pack()

root.mainloop()

In [68]: # Checkbutton: A Checkbutton widget is a checkbox that can be toggled on or off.


import tkinter as tk

root = tk.Tk()
root.title("Checkbutton Example")

check_var = tk.BooleanVar()

checkbutton = tk.Checkbutton(root, text="I agree", variable=check_var)


checkbutton.pack()

root.mainloop()

In [69]: # Radiobutton: A Radiobutton widget allows the user to select one option from a set of m
import tkinter as tk

root = tk.Tk()
root.title("Radiobutton Example")

radio_var = tk.IntVar()

radiobutton1 = tk.Radiobutton(root, text="Option 1", variable=radio_var, value=1)


radiobutton1.pack()

radiobutton2 = tk.Radiobutton(root, text="Option 2", variable=radio_var, value=2)


radiobutton2.pack()

root.mainloop()

In [70]: # Listbox: A Listbox widget displays a list of items.


import tkinter as tk

root = tk.Tk()
root.title("Listbox Example")

listbox = tk.Listbox(root)
listbox.pack()

for item in ["Item 1", "Item 2", "Item 3"]:


listbox.insert(tk.END, item)

root.mainloop()

In [71]: # Scale: A Scale widget is a slider that allows the user to select a value from a range.
import tkinter as tk

root = tk.Tk()
root.title("Scale Example")

scale = tk.Scale(root, from_=0, to=100, orient=tk.HORIZONTAL)


scale.pack()

root.mainloop()

In [72]: # Scrollbar: A Scrollbar widget provides a way to scroll content that is too large to be
import tkinter as tk

root = tk.Tk()
root.title("Scrollbar Example")

text = tk.Text(root, wrap=tk.NONE)


text.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)

scrollbar = tk.Scrollbar(root, command=text.yview)


scrollbar.pack(side=tk.RIGHT, fill=tk.Y)

text.config(yscrollcommand=scrollbar.set)

for i in range(100):
text.insert(tk.END, f"Line {i+1}\n")

root.mainloop()

In [73]: # Combobox (from ttk module)


# A Combobox widget is a drop-down list.
import tkinter as tk
from tkinter import ttk

root = tk.Tk()
root.title("Combobox Example")

combobox = ttk.Combobox(root, values=["Option 1", "Option 2", "Option 3"])


combobox.pack()

root.mainloop()

In [74]: # Menu: A Menu widget creates a menu bar.


import tkinter as tk

def say_hello():
print("Hello!")

root = tk.Tk()
root.title("Menu Example")

menubar = tk.Menu(root)
root.config(menu=menubar)
file_menu = tk.Menu(menubar, tearoff=0)
menubar.add_cascade(label="File", menu=file_menu)
file_menu.add_command(label="New", command=say_hello)
file_menu.add_command(label="Exit", command=root.quit)

root.mainloop()

In [75]: # Tkinter Application


# Here is an example that combines several widgets into a single application.
import tkinter as tk
from tkinter import messagebox
from tkinter import ttk

# Function to handle button click


def on_button_click():
name = name_entry.get()
selected_option = radio_var.get()
checked = check_var.get()
combobox_value = combobox.get()
text_content = text.get("1.0", tk.END).strip()

msg = f"Name: {name}\n"


msg += f"Selected Option: {selected_option}\n"
msg += f"Checked: {checked}\n"
msg += f"Combobox Value: {combobox_value}\n"
msg += f"Text Content: {text_content}"

messagebox.showinfo("Information", msg)

# Main application window


root = tk.Tk()
root.title("Comprehensive Tkinter App")
root.geometry("400x400")

# Label
label = tk.Label(root, text="Enter your name:")
label.pack(pady=5)

# Entry
name_entry = tk.Entry(root)
name_entry.pack(pady=5)

# Radiobuttons
radio_var = tk.IntVar()
radiobutton1 = tk.Radiobutton(root, text="Option 1", variable=radio_var, value=1)
radiobutton1.pack(pady=5)
radiobutton2 = tk.Radiobutton(root, text="Option 2", variable=radio_var, value=2)
radiobutton2.pack(pady=5)

# Checkbutton
check_var = tk.BooleanVar()
checkbutton = tk.Checkbutton(root, text="I agree", variable=check_var)
checkbutton.pack(pady=5)

# Combobox
combobox = ttk.Combobox(root, values=["Choice 1", "Choice 2", "Choice 3"])
combobox.pack(pady=5)

# Text
text = tk.Text(root, height=5, width=30)
text.pack(pady=5)

# Button
button = tk.Button(root, text="Submit", command=on_button_click)
button.pack(pady=20)

# Run the application


root.mainloop()

Python Programming with an IDE


Integrated Development Environments (IDEs) provide a comprehensive environment
for coding, testing, and debugging Python programs. Popular Python IDEs include
PyCharm, Visual Studio Code (VS Code), and Jupyter Notebook.

Key Features of IDEs


Code Editor: Supports syntax highlighting, code completion, and error detection.
Debugger: Allows step-by-step execution of code to identify and fix bugs.
Integrated Terminal: Provides a terminal within the IDE for running scripts and
commands.
Project Management: Manages files and directories in a project.
Plugins/Extensions: Enhances functionality through additional tools and
integrations.

list of Python IDEs:

https://www.techradar.com/best/best-ide-for-python
https://www.geeksforgeeks.org/top-python-ide/

PyCharm
Visual Studio Code (VS Code)
Jupyter Notebook
Google Colab
Sublime Text
Spyder
Atom
Thonny
IDLE
PyDev (Eclipse)
Komodo Edit

Machine Learning
A basic machine learning application in Python using the popular scikit-learn library to
train a simple linear regression model:

Data Generation: Synthetic data is generated using NumPy to create a linear


relationship with some added noise.
Data Splitting: The dataset is split into training and testing sets using
train_test_split from scikit-learn.
Model Training: A linear regression model is trained on the training data using
LinearRegression from scikit-learn.
Prediction: The trained model is used to make predictions on the testing data.
Visualization: The actual testing data points and the predicted values are plotted
to visualize the performance of the model. '''

In [78]: # Importing necessary libraries

# NumPy: NumPy is a fundamental package for numerical computing with Python.


# It provides support for multi-dimensional arrays and matrices, along with a collection
import numpy as np

# scikit-learn (sklearn): scikit-learn is a popular machine learning library in Python.


# It provides simple and efficient tools for data mining and data analysis, including va
from sklearn.model_selection import train_test_split # Module for splitting data into t
from sklearn.linear_model import LinearRegression # Linear regression model

# Matplotlib: Matplotlib is a plotting library for Python.


# It provides a MATLAB-like interface for creating plots and visualizations.
import matplotlib.pyplot as plt

# Generating synthetic data

# Setting a random seed for reproducibility


np.random.seed(0)

# Generating 100 random data points for the feature X


X = 2 * np.random.rand(100, 1)

# Generating the target variable y with added noise


# The relationship between X and y is y = 4 + 3*X + epsilon, where epsilon is random noi
y = 4 + 3 * X + np.random.randn(100, 1)

# Splitting the data into training and testing sets

# Splitting the dataset into 80% training and 20% testing sets
# The test_size parameter determines the proportion of the dataset to include in the tes
# The random_state parameter ensures reproducibility of the results
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42

# Training the linear regression model

# Creating an instance of the LinearRegression class


lin_reg = LinearRegression()

# Fitting the model to the training data


# This step involves learning the parameters (coefficients) of the linear regression mod
lin_reg.fit(X_train, y_train)

# Making predictions

# Predicting the target variable for the testing data


y_pred = lin_reg.predict(X_test)

# Plotting the results

# Plotting the actual testing data points


plt.scatter(X_test, y_test, color='blue', label='Actual')

# Plotting the predicted values


plt.plot(X_test, y_pred, color='red', label='Predicted')

# Adding a title to the plot


plt.title('Linear Regression')

# Labeling the x-axis


plt.xlabel('X')
# Labeling the y-axis
plt.ylabel('y')

# Adding a legend to the plot


plt.legend()

# Displaying the plot


plt.show()

In [87]: # Importing necessary libraries

# NumPy: NumPy is a fundamental package for numerical computing with Python.


# It provides support for multi-dimensional arrays and matrices, along with a collection
import numpy as np

# scikit-learn (sklearn): scikit-learn is a popular machine learning library in Python.


# It provides simple and efficient tools for data mining and data analysis, including va
from sklearn.datasets import load_iris # Dataset
from sklearn.model_selection import train_test_split # Module for splitting data into t
from sklearn.linear_model import LogisticRegression # Logistic regression model
from sklearn.metrics import accuracy_score, classification_report # Metrics for evaluati

# Loading the Iris dataset


'''
The Iris dataset comprises 150 samples of iris flowers, with
four measured features (sepal length, sepal width, petal length, petal width) per sa
and is commonly used for classification tasks in machine learning
due to its balanced classes and well-separated clusters.
'''
iris = load_iris()
X = iris.data # Features
y = iris.target # Target variable

# Splitting the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42
# Training the logistic regression model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Making predictions
y_pred = log_reg.predict(X_test)

# Evaluating the model


accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

# Printing the evaluation results


print("Accuracy:", accuracy)
print("\nClassification Report:")
print(report)

Accuracy: 1.0

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Random Forest Regression on Diabetes Dataset: Develop a regression model using the Random
Forest algorithm to predict the progression of diabetes in patients based on their baseline
characteristics. Utilize the Diabetes dataset containing 442 patient records with ten baseline
features, including age, sex, BMI, blood pressure, and six blood serum measurements. Train the
model to accurately estimate the future disease progression, facilitating the exploration and
evaluation of regression algorithms for predicting diabetes progression.

The Diabetes dataset contains 442 patient records with ten baseline features, such as age, sex, BMI,
blood pressure, and six blood serum measurements, alongside a quantitative measure of diabetes
progression one year after baseline. The problem formulation involves utilizing these features to
predict the progression of diabetes in patients. Specifically, it aims to develop a regression model
capable of accurately estimating the future disease progression based on the patient's baseline
characteristics. This dataset facilitates the exploration and evaluation of regression algorithms,
serving as a valuable resource for studying the relationship between patient attributes and the
progression of diabetes

In [85]: # Importing necessary libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes

# Load the Diabetes dataset


diabetes = load_diabetes()
X = diabetes.data # Features
y = diabetes.target # Target variable

# Convert the dataset into a DataFrame


df = pd.DataFrame(data=X, columns=diabetes.feature_names)
df['target'] = y

# Display basic information about the dataset


print("Dataset information:")
print(df.info())

# Summary statistics of the dataset


print("\nSummary statistics:")
print(df.describe())

# Visualize the distribution of the target variable


plt.figure(figsize=(8, 6))
sns.histplot(df['target'], kde=True, color='blue')
plt.title('Distribution of Target Variable (Diabetes Progression)')
plt.xlabel('Diabetes Progression')
plt.ylabel('Frequency')
plt.show()

# Visualize the correlation matrix


plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Matrix')
plt.show()

# Visualize pairplots of features against the target variable


sns.pairplot(df, x_vars=diabetes.feature_names, y_vars=['target'], kind='scatter', diag_
plt.suptitle('Pairplot of Features vs Target Variable', y=1.02)
plt.show()

Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 442 entries, 0 to 441
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 442 non-null float64
1 sex 442 non-null float64
2 bmi 442 non-null float64
3 bp 442 non-null float64
4 s1 442 non-null float64
5 s2 442 non-null float64
6 s3 442 non-null float64
7 s4 442 non-null float64
8 s5 442 non-null float64
9 s6 442 non-null float64
10 target 442 non-null float64
dtypes: float64(11)
memory usage: 38.1 KB
None

Summary statistics:
age sex bmi bp s1 \
count 4.420000e+02 4.420000e+02 4.420000e+02 4.420000e+02 4.420000e+02
mean -2.511817e-19 1.230790e-17 -2.245564e-16 -4.797570e-17 -1.381499e-17
std 4.761905e-02 4.761905e-02 4.761905e-02 4.761905e-02 4.761905e-02
min -1.072256e-01 -4.464164e-02 -9.027530e-02 -1.123988e-01 -1.267807e-01
25% -3.729927e-02 -4.464164e-02 -3.422907e-02 -3.665608e-02 -3.424784e-02
50% 5.383060e-03 -4.464164e-02 -7.283766e-03 -5.670422e-03 -4.320866e-03
75% 3.807591e-02 5.068012e-02 3.124802e-02 3.564379e-02 2.835801e-02
max 1.107267e-01 5.068012e-02 1.705552e-01 1.320436e-01 1.539137e-01

s2 s3 s4 s5 s6 \
count 4.420000e+02 4.420000e+02 4.420000e+02 4.420000e+02 4.420000e+02
mean 3.918434e-17 -5.777179e-18 -9.042540e-18 9.293722e-17 1.130318e-17
std 4.761905e-02 4.761905e-02 4.761905e-02 4.761905e-02 4.761905e-02
min -1.156131e-01 -1.023071e-01 -7.639450e-02 -1.260971e-01 -1.377672e-01
25% -3.035840e-02 -3.511716e-02 -3.949338e-02 -3.324559e-02 -3.317903e-02
50% -3.819065e-03 -6.584468e-03 -2.592262e-03 -1.947171e-03 -1.077698e-03
75% 2.984439e-02 2.931150e-02 3.430886e-02 3.243232e-02 2.791705e-02
max 1.987880e-01 1.811791e-01 1.852344e-01 1.335973e-01 1.356118e-01

target
count 442.000000
mean 152.133484
std 77.093005
min 25.000000
25% 87.000000
50% 140.500000
75% 211.500000
max 346.000000
C:\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout h
as changed to tight
self._figure.tight_layout(*args, **kwargs)

In [84]: # Importing necessary libraries

# NumPy: NumPy is a fundamental package for numerical computing with Python.


# It provides support for multi-dimensional arrays and matrices, along with a collection
import numpy as np

# scikit-learn (sklearn): scikit-learn is a popular machine learning library in Python.


# It provides simple and efficient tools for data mining and data analysis, including va
from sklearn.datasets import load_diabetes # Dataset
from sklearn.model_selection import train_test_split # Module for splitting data into t
from sklearn.ensemble import RandomForestRegressor # Random Forest Regressor
from sklearn.metrics import mean_squared_error, r2_score # Metrics for evaluation

# Loading the Diabetes dataset


diabetes = load_diabetes()
X = diabetes.data # Features
y = diabetes.target # Target variable
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42

# Training the random forest regressor


rf_reg = RandomForestRegressor(random_state=42)
rf_reg.fit(X_train, y_train)

# Making predictions
y_pred = rf_reg.predict(X_test)

# Evaluating the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Printing the evaluation results


print("Mean Squared Error:", mse)
print("R^2 Score:", r2)

Mean Squared Error: 2952.0105887640448


R^2 Score: 0.4428225673999313

You might also like