0% found this document useful (0 votes)
16 views

ML Lab File

PyCharm is a powerful and feature-rich IDE for Python development that provides a comprehensive set of tools for writing, testing, and debugging Python code. Spyder is an open-source IDE designed specifically for scientific computing and data analysis with Python that provides features well-suited for scientific libraries. Jupyter Notebook is a web-based interactive computing environment that allows users to create and share documents containing live code, equations, visualizations, and narrative text.

Uploaded by

The Champ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

ML Lab File

PyCharm is a powerful and feature-rich IDE for Python development that provides a comprehensive set of tools for writing, testing, and debugging Python code. Spyder is an open-source IDE designed specifically for scientific computing and data analysis with Python that provides features well-suited for scientific libraries. Jupyter Notebook is a web-based interactive computing environment that allows users to create and share documents containing live code, equations, visualizations, and narrative text.

Uploaded by

The Champ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

PYTHON

Python is a high-level, interpreted programming language that is widely


used for web development, scientific computing, data analysis, artificial
intelligence, and other applications. It was first released in 1991 by Guido
van Rossum and has since become one of the most popular programming
languages in the world, known for its simplicity, readability, and
flexibility.

Python is open-source, which means that it is freely available to use,


distribute, and modify. It is designed to be easy to learn, with a syntax
that emphasizes readability and reduces the cost of program
maintenance. Python supports multiple programming paradigms,
including procedural, functional, and object-oriented programming, and
has a large standard library that provides tools and modules for a wide
range of tasks. It also has a vibrant community of developers who
contribute to its development and create third-party libraries and
frameworks that extend its capabilities.

VARIOUS IDES FOR PYTHON

WHAT IS IDE?

An Integrated Development Environment (IDE) is a software application


that provides comprehensive facilities for software development. It
typically includes a source code editor, compiler or interpreter, debugger,
and other tools that facilitate the development of software applications.

The purpose of an IDE is to provide developers with an all-in-one


platform for writing, testing, and debugging code, as well as managing
projects and dependencies. IDEs typically provide features such as code
highlighting, code completion, version control integration, and project
management tools, making it easier for developers to write high-quality
code and manage complex projects.

An IDE (Integrated Development Environment) for Python is a software


application that provides a comprehensive set of tools and features for
Python development. It typically includes a code editor, debugger, code
completion, syntax highlighting, and other features that make it easier to
write, test, and debug Python code.

Python IDEs come in a range of options, from simple and lightweight


tools to more powerful and feature-rich environments. Some of the most
popular IDEs for Python development include PyCharm, Visual Studio
Code, Spyder, IDLE, Jupyter Notebook, and Sublime Text. IDEs have
become an essential tool for modern software development, as they
streamline the development process and improve the productivity of
developers.

IDE Description Features

PyCharm A powerful and feature-rich Code completion, debugging,


IDE for Python development. code analysis, version control
PyCharm is a powerful and integration, refactoring tools,
feature-rich IDE for Python testing tools, support for web
development. It is designed development frameworks like
for professional developers Django, Flask, and Pyramid.
and provides a
comprehensive set of tools for 1. Editor: PyCharm includes
writing, testing, and a Python code editor with
advanced features like
debugging Python code.
syntax highlighting, code
completion, and
refactoring tools.
2. Debugger: PyCharm
includes a powerful
debugger that allows
users to step through
code and inspect variables
to find and fix bugs.
3. Code Analysis: PyCharm
includes code analysis
tools that can detect
errors and suggest
improvements to code.
4.

Spyder Spyder is an open-source Code highlighting, debugging,


IDE designed specifically for IPython console, variable
scientific computing and data explorer, support for scientific
analysis with Python. It libraries like NumPy, Pandas,
provides a user-friendly and Matplotlib, project
interface and a range of management tools.
features that are well-suited
1. Editor: Spyder includes a
to working with scientific
Python code editor with
libraries like NumPy, Pandas,
syntax highlighting,
and Matplotlib. An open-
indentation, and code
source IDE designed
completion.
specifically for scientific
computing and data analysis 2. Debugger: Spyder
with Python. includes a debugger that
allows users to step
through code and inspect
variables to find and fix
bugs.

3. IPython Console: Spyder


includes an IPython
console that allows users
to experiment with Python
code and run individual
statements or scripts.

Jupyter Jupyter Notebook is a web- Code cells with syntax


Notebook based interactive computing highlighting, Markdown cells for
environment that allows text formatting, integration with
users to create and share scientific libraries, support for
documents that contain live data visualization, ability to
code, equations, share notebooks with others.
visualizations, and narrative
1. Notebook Interface:
text. It is an open-source
Jupyter Notebook
project that supports
provides a web-based
multiple programming
interface that allows users
languages, including Python,
to create and edit
R, and Julia. A web-based
notebooks, which can
interactive computing
contain a mix of code cells
environment that allows
and Markdown cells for
users to create and share
documentation and text
documents that contain live
formatting.
code, equations,
visualizations, and narrative 2. Code Execution: Jupyter
text. Jupyter Notebook is a Notebook allows users to
powerful tool for data run code cells
analysis, scientific interactively and view the
computing, and education, output directly in the
as it provides a flexible and notebook interface.
interactive environment for
working with code and data. 3. Markdown Support:
It is widely used in the Jupyter Notebook
scientific community and is supports Markdown cells,
becoming increasingly which can be used to
popular in other fields as format text, add links and
well. images, and create
headings and lists

IDLE IDLE is the Integrated Basic code editing features,


Development and Learning interactive Python console,
Environment that comes debugger, support for running
bundled with the standard Python scripts.
Python distribution. It is a
1. Interactive Console: IDLE
simple, lightweight IDE
includes a Python shell
designed for beginners to
that allows users to
learn and experiment with
experiment with Python
Python. It provides a basic
code and run individual
set of features that can help
statements or scripts.
users write and run Python
code easily. IDLE is a good 2. File Explorer: IDLE
option for beginners who are includes a file explorer
just starting to learn Python, that allows users to
as it provides a simple and browse and manage files
straightforward environment and directories.
for experimentation and
learning. The official IDE for 3. Code Execution: IDLE
Python, included with the provides a simple way to
standard Python distribution. execute Python code and
run scripts directly from
the editor.

There are many other IDEs available for Python development, but these
are some of the most popular and widely used ones. Ultimately, the
choice of IDE depends on personal preference and the specific needs of
the project.

PANDAS SERIES
Pandas Series is a one-dimensional labeled array-like object provided by
the Pandas library in Python. It is similar to a column in a spreadsheet or
a SQL table. A series can hold various data types, such as integers,
floats, strings, and Python objects. The series has two main components,
the data, and the index, where the index labels the data points in the
series.

In other words, a Pandas series is a collection of data arranged in a one-


dimensional array-like structure, where each element is labeled by an
index. It allows for efficient data analysis and manipulation, making it a
popular data structure in data science and machine learning
applications.

DIFFERENT WAYS TO CREATE PANDAS SERIES

1. CREATING AN EMPTY SERIES:


Series() function of Pandas is used to create a series. A basic
series, which can be created is an Empty Series.
CODE:
import pandas as pd

my_series = pd.Series()
print(my_series)

This creates an empty series with no elements and a dtype of


float64. By default, the data type of Series is float.

2. CREATING A SERIES FROM ARRAY:


In order to create a series from NumPy array, we have to import
numpy module and have to use array() function.

CODE:
import numpy as np
import pandas as pd

my_array = np.array([10, 20, 30, 40, 50])


my_series = pd.Series(my_array)
print(my_series)

By default, the index of the series starts from 0 till the length of
series -1.
3. CREATING A SERIES FROM LIST:
In order to create a series from list, we have to first create a list
after that we can create a series from list.
CODE:
import pandas as pd

my_list = [10, 20, 30, 40, 50]


my_series = pd.Series(my_list)
print(my_series)

4. CREATING A SERIES FROM DICTIONARY:


In order to create a series from the dictionary, we have to first create a
dictionary after that we can make a series using dictionary. Dictionary
keys are used to construct indexes of Series.
CODE:
import pandas as pd

my_dict = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
my_series = pd.Series(my_dict)
print(my_series)

5. CREATING A SERIES FROM DATAFRAME:


To create a Pandas Series from a DataFrame, you can select a
single column from the DataFrame using the column name or
column index.
CODE:
import pandas as pd

# create a sample dataframe


data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'country': ['USA', 'Canada', 'Australia']}
df = pd.DataFrame(data)

# create a Pandas Series from a single column of the DataFrame


age_series = df['age']
print(age_series)

In this example, we create a DataFrame with three columns ('name',


'age', 'country'). We then create a Pandas Series called age_series
by selecting the 'age' column from the DataFrame using df['age'].
PANDAS DATAFRAME

Pandas DataFrame is two-dimensional size-mutable, potentially


heterogeneous tabular data structure with labeled axes (rows and
columns). A Data frame is a two-dimensional data structure, i.e., data is
aligned in a tabular fashion in rows and columns. Pandas DataFrame
consists of three principal components, the data, rows, and columns.

In the real world, a Pandas DataFrame will be created by loading the


datasets from existing storage, storage can be SQL Database, CSV file, and
Excel file. Pandas DataFrame can be created from the lists, dictionary, and
from a list of dictionary etc.
DIFFERENT WAYS TO CREATE PANDAS DATAFRAME

1. CREATING AN EMPTYDATAFRAME :
To create an empty dataframe using pandas library in Python,
you can use the pd.DataFrame() function with no arguments
passed to it.

CODE:

import pandas as pd

# create an empty dataframe


df = pd.DataFrame()

# print the dataframe


print(df)

2. CREATING A DATAFRAME FROM LISTS:


To create a dataframe from lists by passing the lists as values to a
dictionary, where each list represents a column in the dataframe.

CODE:

import pandas as pd

# create lists for the columns


names = ['Alice', 'Bob', 'Charlie', 'David']
ages = [25, 32, 18, 47]
cities = ['New York', 'San Francisco', 'Los Angeles', 'Boston']
# create a dictionary with the lists as values
data = {'name': names, 'age': ages, 'city': cities}

# create the dataframe from the dictionary


df = pd.DataFrame(data)

# print the dataframe


print(df)

3. CREATING A DATAFRAME FROM DICTIONARY:


To create a pandas dataframe from a dictionary, you can pass
the dictionary to the pd.DataFrame() function. Each key in the
dictionary represents a column name, and each value represents
the corresponding values in that column.
CODE:
import pandas as pd

# create a dictionary with the data


data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 32, 18, 47],
'city': ['New York', 'San Francisco', 'Los Angeles', 'Boston']}

# create the dataframe from the dictionary


df = pd.DataFrame(data)

# print the dataframe


print(df)

4. CREATING A DATAFRAME FROM NUMPY ARRAYS:


To create a pandas dataframe from numpy arrays, you can pass the
arrays as values to a dictionary, where each array represents a
column in the dataframe.
CODE:
import pandas as pd
import numpy as np

# create numpy arrays for the columns


names = np.array(['Alice', 'Bob', 'Charlie', 'David'])
ages = np.array([25, 32, 18, 47])
cities = np.array(['New York', 'San Francisco', 'Los Angeles',
'Boston'])
# create a dictionary with the numpy arrays as values
data = {'name': names, 'age': ages, 'city': cities}

# create the dataframe from the dictionary


df = pd.DataFrame(data)

# print the dataframe


print(df)

5. CREATING A DATAFRAME FROM ANOTHER DATAFRAME :


To create a new pandas dataframe from an existing dataframe,
you can use the pd.DataFrame() function again and pass the
existing dataframe as an argument. This will create a new
dataframe with the same data as the existing one.

CODE:
import pandas as pd

# create the original dataframe


data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 32, 18, 47],
'city': ['New York', 'San Francisco', 'Los Angeles', 'Boston']}
df1 = pd.DataFrame(data)

# create a new dataframe from the original one


df2 = pd.DataFrame(df1)

# print both dataframes


print(df1)
print(df2)

6. CREATING A DATAFRAME FROM CSV FILE:


To create a pandas dataframe from a CSV file, you can use the
pd.read_csv() function. This function reads a CSV file into a
dataframe, where each row in the CSV file becomes a row in the
dataframe and each column in the CSV file becomes a column in
the dataframe.
CODE:
Suppose we have a CSV file named data.csv with the following
content:
name,age,city
Alice,25,New York
Bob,32,San Francisco
Charlie,18,Los Angeles
David,47,Boston

We can read this file into a dataframe using the following code:
import pandas as pd

# read the CSV file into a dataframe


df = pd.read_csv('data.csv')

# print the dataframe


print(df)
EXTRACTING ROWS AND COLUMNS FROM DATAFRAME

#Extracting specific rows from dataframe

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

first_row = df.iloc[0]

print(first_row)

second_third_rows = df.loc[1:2]

print(second_third_rows)

#Extracting specific columns from dataframe

a_column = df['A']

print(a_column)

a_c_columns = df[['A', 'C']]

print(a_c_columns)

#Extracting both rows and columns

subset = df.loc[1:2, ['B', 'C']]

print(subset)
DATA CLEANING

import pandas as pd

import numpy as np

from sklearn.preprocessing import LabelEncoder

# Load the dataset

data = pd.read_csv('data.csv')

# Print the shape of the original data

print("Shape of the original data:", data.shape)

# Handle missing values

data = data.dropna()

# Print the shape of the data after handling missing values

print("Shape of the data after handling missing values:", data.shape)

# Remove duplicates

data = data.drop_duplicates()

# Print the shape of the data after removing duplicates


print("Shape of the data after removing duplicates:", data.shape)

# Handle outliers

z = np.abs(stats.zscore(data))

data = data[(z < 3).all(axis=1)]

# Print the shape of the data after handling outliers

print("Shape of the data after handling outliers:", data.shape)

# Handle categorical variables

le = LabelEncoder()

data['category'] = le.fit_transform(data['category'])

# Print the first 5 rows of the data after handling categorical variables

print("First 5 rows of the data after handling categorical variables:\n",


data.head())

# Remove irrelevant features

data = data.drop(['id', 'date'], axis=1)

# Print the first 5 rows of the data after removing irrelevant features

print("First 5 rows of the data after removing irrelevant features:\n",


data.head())

# Save the cleaned data to a new CSV file

data.to_csv('cleaned_data.csv', index=False)
Write a program to extract a subset of data from a data frame:

import pandas as pd

from pandas import *

df = pd.read_csv("C:\\Users\\surjit\\Documents\\pokemon_data.csv")

print(df.head(5))

df = df.loc[df["Type 1"] == "Fire"]

#print(df)

#df.to_csv("C:\\Users\\surjit\\Documents\\pokemon_data1.csv")
Write a program to handle categorical data:

import pandas as pd

#importing this package for One Hot Encoding

from pandas import get_dummies

#importing this package for Label Encoding

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

#Reading CSV file:

df =

pd.read_csv("C:\\Users\\varinder\\Documents\\machine_learning_Lab
\\datasets\\melb_dat

a.csv")

#Finding attributes whose data type is object or have a text values.

object_attributes = (df.dtypes == "object")

#Making list of object attributes

object_attributes_list = list(object_attributes[object_attributes].index)

#extracting data of object attributes from whole dataframe.

catagorical_df = df[object_attributes_list]

#creating new data frame as labeled data for column "Type"

labeled_data = catagorical_df[["Type"]]

#column wise Distinct value count of dataframe labeled data

print("Catagorical Data in Type column


","\n",labeled_data["Type"].value_counts(),"\n")

#applyinga fit transform to the column Type and placing values in new
column

Type(LAbel_Encoding)
labeled_data["Type(Label_Encoding)"] =
le.fit_transform(catagorical_df["Type"])

#Affter using fit transorm, column wise Distinct value count of dataframe
labeled data

print("After using label Encoding ","\n",labeled_data.value_counts(),"\n")

#applying one hot encoding method using pandas dummies method

one_hot_Encoding = pd.get_dummies(catagorical_df["Type"])

#merging one hot encoding datafram with labeled dataframe

for i in range(0,len(one_hot_Encoding.columns)):

labeled_data[one_hot_Encoding.columns[i]] = one_hot_Encoding.iloc[:,i]

#column wise distinct value count of dataframe labeled data

print("After using One Hot Encoding


","\n",labeled_data.value_counts(),"\n")
Write a program to handle missing values and duplicate values:

import pandas as pd #importing pandas

from pandas import *

import numpy as np #importing numpy

from numpy import *

df = pd.read_csv("C:\\Users\\varinder\\Documents\\modified_1.csv")
#reading csv

df["Null column"] = np.nan #generating null column

df["Null column"] = df["Null column"].fillna(df["Attack"].mean()) #removing


null

values

df = df.drop_duplicates(subset=["Total"] , keep= "first" , ignore_index=


True) #droping

duplicate values in total column of dataframe by keeping first appearence


of every value

print(df)

You might also like