0% found this document useful (0 votes)

13 views92 pages

python unit 3 4

The document provides an overview of indexing options in pandas DataFrames, including methods like .iloc, .loc, boolean indexing, and MultiIndexing. It also explains the two main data structures in pandas, Series and DataFrame, detailing their creation, manipulation, and operations. Additionally, it covers various operations that can be performed on Series and DataFrames, such as accessing data, applying functions, handling missing values, and performing arithmetic operations.

Uploaded by

Sharanabasava B Sharanabasava B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views92 pages

python unit 3 4

Uploaded by

Sharanabasava B Sharanabasava B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 92

unit-3

1.list some indexing options with dataframe

Indexing in a pandas DataFrame in Python is essential for accessing, modifying, and managing data efficiently.
Here are some common indexing options available in pandas:

### 1. Basic Indexing with `.iloc` and `.loc`

- **`.iloc`**: Indexing by position. It accepts integer-based indexing for both rows and columns.

```python

import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}

df = pd.DataFrame(data)

# Access the first row

print(df.iloc[0])

# Access the element in the second row and third column

print(df.iloc[1, 2])

```

- **`.loc`**: Indexing by label. It allows label-based indexing for both rows and columns.

```python

# Access the first row by label

print(df.loc[0])
# Access the element in the second row and column 'C'

print(df.loc[1, 'C'])

```

### 2. Boolean Indexing

Boolean indexing uses a boolean mask to filter data.

```python

# Filter rows where column 'A' is greater than 1

filtered_df = df[df['A'] > 1]

print(filtered_df)

```

### 3. Setting and Resetting Index

- Setting an Index: Using `set_index` to change the DataFrame index.

```python

df.set_index('A', inplace=True)

print(df)

```

- Resetting an Index: Using `reset_index` to revert to the default integer index.

```python

df.reset_index(inplace=True)

print(df)

```
### 4. MultiIndex (Hierarchical Indexing)

MultiIndex allows for more complex indexing structures with multiple levels.

```python

arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]

index = pd.MultiIndex.from_arrays(arrays, names=('Upper', 'Lower'))

df_multi = pd.DataFrame({'Value': [10, 20, 30, 40]}, index=index)

print(df_multi)

# Access a specific level

print(df_multi.loc['A'])

```

### 5. Indexing with Slices

Slicing is a powerful tool to access portions of data.

```python

# Slice rows from index 1 to 3

print(df.iloc[1:3])

# Slice columns by name

print(df.loc[:, 'A':'B'])

```

### 6. Indexing with `.at` and `.iat`

- `.at`: Access a single value for a row/column label pair.

```python

# Access the value at row index 0 and column 'A'

print(df.at[0, 'A'])

```

- `.iat`: Access a single value for a row/column pair by integer position.

```python

# Access the value at row position 0 and column position 0

print(df.iat[0, 0])

```

### 7. Indexing with `.xs`

Cross-section selection with `.xs` can be useful for accessing data at a particular level.

```python

# Access data for a specific level in a MultiIndex DataFrame

print(df_multi.xs('A', level='Upper'))

```

### 8. Indexing with `.query`

The `query` method provides a string-based query interface.

```python

# Query to filter rows where column 'A' is greater than 1

query_df = df.query('A > 1')

print(query_df)

```
### 9. Indexing with `.filter`

The `filter` method is useful for selecting specific rows or columns.

```python

# Filter columns

print(df.filter(items=['A', 'B']))

# Filter rows by a specific condition

print(df.filter(like='A', axis=1))

```

### 10. Indexing with `.get`

The `get` method can access a column and return a default value if the column does not exist.

```python

# Get column 'A' or return a default value if 'A' does not exist

print(df.get('A', 'Default Value'))

```

### 11. Indexing with `.take`

The `take` method allows for position-based indexing, especially useful for large DataFrames.

```python

# Take rows by integer positions

print(df.take([0, 2]))

```
### 12. Indexing with `.reindex`

The `reindex` method can change the index of a DataFrame to a new set of labels.

```python

# Reindex DataFrame to a new index

new_index = [0, 1, 2, 3]

df_reindexed = df.reindex(new_index)

print(df_reindexed)

```

These are some of the most common and powerful indexing techniques available in pandas for DataFrame
manipulation. Each method has its unique use case and can be combined for more complex data operations.

2.what are the two data strutured of pandas explain with example

Pandas, a powerful data manipulation library in Python, primarily works with two data structures: `Series` and
`DataFrame`. These structures are designed to handle a wide range of data types and are optimized for
performance and ease of use.

### 1. Series

A `Series` is a one-dimensional labeled array capable of holding any data type (integer, string, float, python
objects, etc.). It is similar to a column in an Excel spreadsheet or a database table.

#### Creating a Series

You can create a `Series` by passing a list or array of values. An optional index can also be provided.

```python
import pandas as pd

# Create a Series from a list

data = [10, 20, 30, 40]

s = pd.Series(data)

print(s)

```

Output:

```

0 10

1 20

2 30

3 40

dtype: int64

```

#### Custom Index in Series

You can provide custom indices for the `Series`.

```python

# Create a Series with a custom index

s = pd.Series(data, index=['a', 'b', 'c', 'd'])

print(s)

```
Output:

```

a 10

b 20

c 30

d 40

dtype: int64

```

#### Accessing Elements in a Series

Elements in a `Series` can be accessed using the index.

```python

# Access element by index

print(s['a'])

# Access multiple elements by indices

print(s[['a', 'c']])

```

### 2. DataFrame

A `DataFrame` is a two-dimensional labeled data structure with columns of potentially different types. It is
similar to a table in a database or an Excel spreadsheet.

#### Creating a DataFrame

You can create a `DataFrame` by passing a dictionary of lists (or other data structures) where the keys are the
column names.

```python

# Create a DataFrame from a dictionary

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'Salary': [50000, 60000, 70000]

df = pd.DataFrame(data)

print(df)

```

Output:

```

Name Age Salary

0 Alice 25 50000

1 Bob 30 60000

2 Charlie 35 70000

```

#### Accessing Elements in a DataFrame

You can access rows and columns using various methods.

- **Accessing Columns**:

```python

# Access a single column

print(df['Name'])

# Access multiple columns

print(df[['Name', 'Salary']])

```

- **Accessing Rows**:

```python

# Access rows by index using iloc (position-based)

print(df.iloc[0])

# Access rows by index using loc (label-based)

print(df.loc[0])

```

#### Adding and Dropping Columns

You can easily add or drop columns in a `DataFrame`.

```python

# Add a new column

df['Department'] = ['HR', 'Engineering', 'Marketing']

print(df)
# Drop a column

df = df.drop('Age', axis=1)

print(df)

```

Output:

```

Name Age Salary Department

0 Alice 25 50000 HR

1 Bob 30 60000 Engineering

2 Charlie 35 70000 Marketing

Name Salary Department

0 Alice 50000 HR

1 Bob 60000 Engineering

2 Charlie 70000 Marketing

```

#### Filtering Data

You can filter the `DataFrame` based on conditions.

```python

# Filter rows where Salary is greater than 55000

filtered_df = df[df['Salary'] > 55000]

print(filtered_df)

```
Output:

```

Name Salary Department

1 Bob 60000 Engineering

2 Charlie 70000 Marketing

```

### Summary

- **Series**: A one-dimensional array-like object with a labeled index, used for handling and operating on
single columns or rows of data.

- **DataFrame**: A two-dimensional table of data with labeled axes (rows and columns), suitable for working
with tabular data with multiple columns of different data types.

Both `Series` and `DataFrame` are essential components of the pandas library and are designed to be highly
flexible and intuitive for data manipulation and analysis.

3.explain the series with possible operations

A `Series` in pandas is a one-dimensional labeled array capable of holding any data type, such as integers,
floats, strings, and Python objects. It can be thought of as a column in a DataFrame or a single row of data.
Here is a detailed explanation of the Series along with possible operations:

### Creating a Series

You can create a `Series` using a variety of inputs such as lists, arrays, dictionaries, or scalar values.

```python

import pandas as pd
# Creating a Series from a list

data = [10, 20, 30, 40]

s = pd.Series(data)

print(s)

```

Output:

```

0 10

1 20

2 30

3 40

dtype: int64

```

### Custom Index

You can create a Series with a custom index.

```python

s = pd.Series(data, index=['a', 'b', 'c', 'd'])

print(s)

```

Output:

```

a 10
b 20

c 30

d 40

dtype: int64

```

### Accessing Data in a Series

You can access data using labels or positions.

- **By Label**:

```python

print(s['a']) # Output: 10

print(s[['a', 'c']]) # Output: a 10, c 30

```

- **By Position**:

```python

print(s[0]) # Output: 10

print(s[[0, 2]]) # Output: 0 10, 2 30

```

### Vectorized Operations

Pandas `Series` support vectorized operations which means operations are performed element-wise.

```python
# Addition

print(s + 5) # Adds 5 to each element

# Multiplication

print(s * 2) # Multiplies each element by 2

```

Output:

```

a 15

b 25

c 35

d 45

dtype: int64

a 20

b 40

c 60

d 80

dtype: int64

```

### Boolean Indexing

You can filter data in a Series using boolean indexing.

```python
print(s[s > 20]) # Output: c 30, d 40

```

### Applying Functions

You can apply functions to a Series using the `apply` method.

```python

def square(x):

return x * x

print(s.apply(square)) # Squares each element in the Series

```

Output:

```

a 100

b 400

c 900

d 1600

dtype: int64

```

### Operations on Index

The index of a Series can be manipulated as well.

```python

# Renaming the index

s.index = ['x', 'y', 'z', 'w']

print(s)

```

Output:

```

x 10

y 20

z 30

w 40

dtype: int64

```

### Handling Missing Data

You can handle missing data (NaN values) using methods like `isnull`, `notnull`, `fillna`, and `dropna`.

```python

data = [10, 20, None, 40]

s = pd.Series(data)

# Checking for missing values

print(s.isnull()) # Output: False, False, True, False

# Filling missing values

print(s.fillna(0)) # Replaces NaN with 0

# Dropping missing values

print(s.dropna()) # Removes NaN values

```

Output:

```

0 False

1 False

2 True

3 False

dtype: bool

0 10.0

1 20.0

2 0.0

3 40.0

dtype: float64

0 10.0

1 20.0

3 40.0

dtype: float64

```

### Arithmetic Operations

You can perform arithmetic operations between Series.

```python

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

s2 = pd.Series([4, 5, 6], index=['a', 'b', 'd'])

# Addition

print(s1 + s2) # Output: a 5, b 7, c NaN, d NaN

# Subtraction

print(s1 - s2) # Output: a -3, b -3, c NaN, d NaN

```

### Descriptive Statistics

You can compute various statistics for a Series.

```python

s = pd.Series([1, 2, 3, 4, 5])

# Sum

print(s.sum()) # Output: 15

# Mean

print(s.mean()) # Output: 3.0

# Standard Deviation

print(s.std()) # Output: 1.58

# Minimum and Maximum

print(s.min()) # Output: 1

print(s.max()) # Output: 5

```

### String Operations

If the Series contains strings, you can use string operations.

```python

s = pd.Series(['apple', 'banana', 'cherry'])

# Convert to uppercase

print(s.str.upper()) # Output: APPLE, BANANA, CHERRY

# Check for substring

print(s.str.contains('a')) # Output: True, True, False

```

### Summary

A `Series` in pandas is a versatile one-dimensional data structure that supports a wide range of operations,
including element-wise arithmetic, boolean indexing, handling missing data, and applying functions. It also
provides powerful statistical and string manipulation capabilities.
5.explain the data frame with possible operations

A `DataFrame` in pandas is a two-dimensional, size-mutable, and potentially heterogeneous tabular data

structure with labeled axes (rows and columns). It can be thought of as a dictionary of Series objects, or a table
of data similar to an Excel spreadsheet or SQL table. Below are various ways to create, manipulate, and analyze
data using a pandas DataFrame, along with examples of possible operations:

### Creating a DataFrame

A `DataFrame` can be created from various data structures such as dictionaries, lists, and NumPy arrays.

```python

import pandas as pd

# Creating a DataFrame from a dictionary

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'Salary': [50000, 60000, 70000]

df = pd.DataFrame(data)

print(df)

```

Output:

```

Name Age Salary

0 Alice 25 50000

1 Bob 30 60000
2 Charlie 35 70000

```

### Accessing Data

- **Accessing Columns**:

```python

# Single column

print(df['Name'])

# Multiple columns

print(df[['Name', 'Salary']])

```

- **Accessing Rows**:

```python

# By integer location

print(df.iloc[0]) # First row

# By label

print(df.loc[0]) # First row

```

- **Accessing a Subset**:

```python

# Specific element

print(df.at[0, 'Name'])
# By condition

print(df[df['Age'] > 30])

```

### Adding and Dropping Data

- **Adding a Column**:

```python

df['Department'] = ['HR', 'Engineering', 'Marketing']

print(df)

```

- **Dropping a Column**:

```python

df = df.drop('Age', axis=1)

print(df)

```

- **Adding a Row**:

```python

df = df.append({'Name': 'David', 'Age': 40, 'Salary': 80000, 'Department': 'Finance'}, ignore_index=True)

print(df)

```

- **Dropping a Row**:

```python
df = df.drop(0) # Drop the first row

print(df)

```

### Modifying Data

- **Renaming Columns**:

```python

df.rename(columns={'Name': 'Employee Name'}, inplace=True)

print(df)

```

- **Updating Values**:

```python

df.at[1, 'Salary'] = 65000

print(df)

```

### Handling Missing Data

- Identifying Missing Data:

```python

print(df.isnull())

```

- Filling Missing Data:

```python
df.fillna(0, inplace=True)

print(df)

```

- Dropping Missing Data:

```python

df.dropna(inplace=True)

print(df)

```

### Data Aggregation and Grouping

- **Summarizing Data**:

```python

print(df.describe())

```

- **Grouping Data**:

```python

group = df.groupby('Department')

print(group.mean())

```

### Merging and Joining Data

- **Merging DataFrames**:

```python
data1 = {'Key': ['A', 'B', 'C'], 'Value1': [1, 2, 3]}

data2 = {'Key': ['A', 'B', 'D'], 'Value2': [4, 5, 6]}

df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)

merged_df = pd.merge(df1, df2, on='Key', how='inner')

print(merged_df)

```

- **Joining DataFrames**:

```python

df1 = df1.set_index('Key')

df2 = df2.set_index('Key')

joined_df = df1.join(df2, how='inner')

print(joined_df)

```

### Reshaping Data

- **Pivoting Data**:

```python

data = {

'Date': ['2021-01-01', '2021-01-01', '2021-01-02'],

'City': ['New York', 'Los Angeles', 'New York'],

'Temperature': [30, 40, 35]

df = pd.DataFrame(data)

pivot_df = df.pivot(index='Date', columns='City', values='Temperature')

print(pivot_df)

```

- **Melting Data**:

```python

melted_df = pd.melt(df, id_vars=['Date'], value_vars=['City', 'Temperature'])

print(melted_df)

```

### Sorting Data

- **Sorting by Columns**:

```python

sorted_df = df.sort_values(by='Salary', ascending=False)

print(sorted_df)

```

- **Sorting by Index**:

```python

sorted_df = df.sort_index()

print(sorted_df)

```

### Working with Dates

- **Converting to Datetime**:

```python
df['Date'] = pd.to_datetime(df['Date'])

print(df)

```

- Extracting Date Components:

```python

df['Year'] = df['Date'].dt.year

df['Month'] = df['Date'].dt.month

df['Day'] = df['Date'].dt.day

print(df)

```

### String Operations

- **String Methods**:

```python

df['Name'] = df['Name'].str.upper()

print(df)

df['Name'] = df['Name'].str.replace('CHARLIE', 'Charles')

print(df)

```

### Visualization

- **Plotting**:

```python
import matplotlib.pyplot as plt

df['Salary'].plot(kind='bar')

plt.show()

```

### Summary

A `DataFrame` in pandas is a versatile and powerful data structure for data manipulation and analysis. It
supports a wide range of operations, including:

- Creation: From dictionaries, lists, arrays, etc.

- Access: By columns, rows, or conditions.

- Modification: Adding/dropping columns and rows, updating values.

- Handling Missing Data: Identifying, filling, or dropping.

- Aggregation and Grouping: Summarizing and grouping data.

- Merging and Joining: Combining multiple DataFrames.

- Reshaping: Pivoting and melting.

- Sorting: By index or columns.

- Date and String Operations: Working with dates and strings.

- Visualization: Basic plotting capabilities.

These capabilities make pandas DataFrames an essential tool for data analysis and manipulation in Python.

6.explain the concept of index objects

In pandas, an `Index` object represents the labeled axis (either rows or columns) of a DataFrame or Series. It
plays a crucial role in aligning data during operations, enabling label-based data access, and ensuring data
integrity. Understanding how to work with `Index` objects is fundamental to effective data manipulation in
pandas.
### Creating Index Objects

An `Index` is automatically created when you create a Series or DataFrame, but you can also create and
manipulate `Index` objects directly.

```python

import pandas as pd

# Creating a Series with a custom index

data = [10, 20, 30, 40]

index = ['a', 'b', 'c', 'd']

s = pd.Series(data, index=index)

print(s.index)

```

Output:

```

Index(['a', 'b', 'c', 'd'], dtype='object')

```

### Types of Index Objects

Pandas provides several types of `Index` objects to accommodate different use cases:

- `Index`: The most basic type of index.

- `RangeIndex`: Represents a range of values (used by default for integer indices).

- **`MultiIndex`**: A hierarchical index for multi-level indexing.

- `DatetimeIndex`: For datetime objects.

- `TimedeltaIndex`: For time delta/duration data.

- `PeriodIndex`: For periods (timespans).

### Basic Index Operations

#### Accessing Index Values

You can access the values of an `Index` object using standard indexing operations.

```python

print(s.index[0]) # Output: 'a'

print(s.index[:2]) # Output: Index(['a', 'b'], dtype='object')

```

#### Modifying Index Values

While the values in an `Index` object are immutable, you can reassign the entire index.

```python

# Reassigning the entire index

s.index = ['w', 'x', 'y', 'z']

print(s)

```

Output:
```

w 10

x 20

y 30

z 40

dtype: int64

```

### Common Index Methods

#### `set_index` and `reset_index`

These methods are used to change and reset the index of a DataFrame.

```python

df = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'Salary': [50000, 60000, 70000]

})

# Setting 'Name' as the index

df = df.set_index('Name')

print(df)

# Resetting the index

df = df.reset_index()
print(df)

```

Output:

```

Age Salary

Name

Alice 25 50000

Bob 30 60000

Charlie 35 70000

Name Age Salary

0 Alice 25 50000

1 Bob 30 60000

2 Charlie 35 70000

```

#### `reindex`

The `reindex` method allows you to conform the DataFrame to a new index, filling in missing values if
necessary.

```python

# Reindexing the DataFrame to a new index

new_index = ['Alice', 'Bob', 'David']

df_reindexed = df.reindex(new_index, fill_value=0)

print(df_reindexed)
```

Output:

```

Name Age Salary

Alice 25 50000

Bob 30 60000

David 0 0

```

#### `sort_index`

Sorting a DataFrame by its index.

```python

# Sorting by index

df_sorted = df.sort_index()

print(df_sorted)

```

#### `rename`

Renaming index labels.

```python

# Renaming index labels

df_renamed = df.rename(index={'Alice': 'Alicia', 'Bob': 'Robert'})

print(df_renamed)

```

Output:

```

Age Salary

Name

Alicia 25 50000

Robert 30 60000

Charlie 35 70000

```

### Hierarchical Indexing with MultiIndex

A `MultiIndex` allows you to work with higher dimensional data in a 2D DataFrame.

```python

# Creating a DataFrame with MultiIndex

arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]

index = pd.MultiIndex.from_arrays(arrays, names=('Upper', 'Lower'))

df_multi = pd.DataFrame({'Value': [10, 20, 30, 40]}, index=index)

print(df_multi)

```

Output:

```

Value
Upper Lower

A one 10

two 20

B one 30

two 40

```

#### Accessing MultiIndex Levels

You can access and manipulate specific levels of a `MultiIndex`.

```python

# Accessing data at a specific level

print(df_multi.loc['A'])

# Accessing a specific element

print(df_multi.loc[('A', 'one')])

```

Output:

```

Value

Lower

one 10

two 20

Value 10
Name: (A, one), dtype: int64

```

### Summary

Index objects in pandas are powerful tools for managing and manipulating the axes of Series and DataFrames.
They provide a consistent way to label and align data, which is crucial for data analysis tasks. Understanding
and effectively using index objects enables you to perform a wide range of operations, from basic data access
to complex hierarchical indexing.

7.explain the reindexing with its operations on data frame series

Reindexing in pandas allows you to align an existing DataFrame or Series to a new set of labels. This can involve
adding, removing, or reordering the labels. Reindexing is particularly useful when you need to conform your
data to a specific structure or when aligning data from different sources. Below is a detailed explanation of
reindexing along with examples of various operations.

### Reindexing a Series

Reindexing a Series involves creating a new Series with the specified index, filling in missing values if necessary.

```python

import pandas as pd

# Creating a Series

s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

print(s)

```

Output:

```
a 1

b 2

c 3

dtype: int64

```

#### Basic Reindexing

You can reindex a Series to match a new set of labels.

```python

# Reindexing the Series

s_reindexed = s.reindex(['a', 'b', 'c', 'd'])

print(s_reindexed)

```

Output:

```

a 1.0

b 2.0

c 3.0

d NaN

dtype: float64

```

#### Filling Missing Values

When reindexing, you can fill missing values using the `fill_value` parameter.

```python

s_reindexed_filled = s.reindex(['a', 'b', 'c', 'd'], fill_value=0)

print(s_reindexed_filled)

```

Output:

```

a 1

b 2

c 3

d 0

dtype: int64

```

### Reindexing a DataFrame

Reindexing a DataFrame can involve changing both row and column labels.

```python

# Creating a DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'Salary': [50000, 60000, 70000]

}
df = pd.DataFrame(data)

print(df)

```

Output:

```

Name Age Salary

0 Alice 25 50000

1 Bob 30 60000

2 Charlie 35 70000

```

#### Reindexing Rows

Reindexing the rows of a DataFrame.

```python

# Reindexing the rows

df_reindexed = df.reindex([0, 2, 1, 3])

print(df_reindexed)

```

Output:

```

Name Age Salary

0 Alice 25.0 50000.0

2 Charlie 35.0 70000.0

1 Bob 30.0 60000.0

3 NaN NaN NaN

```

#### Reindexing Columns

Reindexing the columns of a DataFrame.

```python

# Reindexing the columns

df_reindexed_columns = df.reindex(columns=['Name', 'Salary', 'Age', 'Department'])

print(df_reindexed_columns)

```

Output:

```

Name Salary Age Department

0 Alice 50000 25 NaN

1 Bob 60000 30 NaN

2 Charlie 70000 35 NaN

```

#### Filling Missing Values

Filling missing values when reindexing.

```python
df_reindexed_filled = df_reindexed.fillna({'Age': 0, 'Salary': 0, 'Name': 'Unknown'})

print(df_reindexed_filled)

```

Output:

```

Name Age Salary

0 Alice 25.0 50000.0

2 Charlie 35.0 70000.0

1 Bob 30.0 60000.0

3 Unknown 0.0 0.0

```

### Advanced Reindexing Techniques

#### Reindexing with `method` Parameter

You can use the `method` parameter to fill missing values based on certain rules:

- `ffill` (forward fill): Propagate the last valid observation forward.

- **`bfill` (backward fill)**: Use the next valid observation to fill gaps.

```python

# Creating a Series with missing values

s = pd.Series([1, 2, 3], index=[0, 2, 4])

s_reindexed_ffill = s.reindex(range(5), method='ffill')

print(s_reindexed_ffill)
s_reindexed_bfill = s.reindex(range(5), method='bfill')

print(s_reindexed_bfill)

```

Output:

```

0 1

1 1

2 2

3 2

4 3

dtype: int64

0 1

1 2

2 2

3 3

4 3

dtype: int64

```

#### Reindexing to Align DataFrames

Reindexing is often used to align two DataFrames with different indexes.

```python
# Creating two DataFrames

df1 = pd.DataFrame({'A': [1, 2, 3]}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({'B': [4, 5, 6]}, index=['b', 'c', 'd'])

# Aligning the two DataFrames

df1_aligned, df2_aligned = df1.align(df2, join='outer')

print(df1_aligned)

print(df2_aligned)

```

Output:

```

a 1.0

b 2.0

c 3.0

d NaN

a NaN

b 4.0

c 5.0

d 6.0

```

### Summary
Reindexing in pandas is a powerful tool for aligning and restructuring data. Key operations include:

- Basic Reindexing: Aligning to a new set of labels.

- **Filling Missing Values**: Using `fill_value` or methods like `ffill` and `bfill`.

- Reindexing DataFrames: Changing both row and column labels.

- **Advanced Techniques**: Using the `method` parameter for forward and backward filling, and aligning
multiple DataFrames.

Understanding and utilizing reindexing effectively can greatly enhance your data manipulation capabilities in
pandas.

8.how to drop entries fram an axis to axis

Dropping entries from a DataFrame or Series in pandas is a common operation that allows you to remove rows
or columns based on their labels or indices. Here’s a comprehensive guide on how to do this, with various
examples:

### Dropping Entries from a DataFrame

You can drop rows or columns using the `drop` method. The àxis` parameter is used to specify whether to
drop rows (àxis=0`) or columns (àxis=1`).

#### Dropping Rows

By default, `drop` removes rows (i.e., `axis=0`).

```python

import pandas as pd

# Creating a DataFrame

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40],

'Salary': [50000, 60000, 70000, 80000]

df = pd.DataFrame(data)

# Dropping rows by index labels

df_dropped = df.drop([1, 3])

print(df_dropped)

```

Output:

```

Name Age Salary

0 Alice 25 50000

2 Charlie 35 70000

```

#### Dropping Columns

To drop columns, set the `axis` parameter to 1.

```python

# Dropping columns by column names

df_dropped_columns = df.drop(['Age', 'Salary'], axis=1)

print(df_dropped_columns)

```
Output:

```

Name

0 Alice

1 Bob

2 Charlie

3 David

```

### Dropping Entries from a Series

You can drop entries from a Series in a similar manner.

```python

# Creating a Series

s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

# Dropping entries by index labels

s_dropped = s.drop(['b', 'd'])

print(s_dropped)

```

Output:

```

a 1

c 3
dtype: int64

```

### In-Place Dropping

By default, the `drop` method returns a new DataFrame or Series without the specified entries. To modify the
original object, use the `inplace` parameter.

```python

# Dropping rows in place

df.drop([1, 3], inplace=True)

print(df)

```

Output:

```

Name Age Salary

0 Alice 25 50000

2 Charlie 35 70000

```

### Dropping with Conditions

You can also drop rows based on conditions using boolean indexing.

```python

# Creating a DataFrame
df = pd.DataFrame(data)

# Dropping rows where Age > 30

df_dropped_condition = df[df['Age'] <= 30]

print(df_dropped_condition)

```

Output:

```

Name Age Salary

0 Alice 25 50000

1 Bob 30 60000

```

### Dropping Duplicates

The `drop_duplicates` method is used to remove duplicate rows.

```python

# Creating a DataFrame with duplicates

data_with_duplicates = {

'Name': ['Alice', 'Bob', 'Charlie', 'Bob'],

'Age': [25, 30, 35, 30],

'Salary': [50000, 60000, 70000, 60000]

df_duplicates = pd.DataFrame(data_with_duplicates)
# Dropping duplicate rows

df_no_duplicates = df_duplicates.drop_duplicates()

print(df_no_duplicates)

```

Output:

```

Name Age Salary

0 Alice 25 50000

1 Bob 30 60000

2 Charlie 35 70000

```

### Dropping NaN Values

You can drop rows or columns containing NaN values using the `dropna` method.

```python

# Creating a DataFrame with NaN values

data_with_nan = {

'Name': ['Alice', 'Bob', 'Charlie', None],

'Age': [25, None, 35, 40],

'Salary': [50000, 60000, None, 80000]

df_nan = pd.DataFrame(data_with_nan)

# Dropping rows with any NaN values

df_dropped_nan = df_nan.dropna()

print(df_dropped_nan)

```

Output:

```

Name Age Salary

0 Alice 25.0 50000.0

```

### Dropping Columns with NaN Values

To drop columns with NaN values, set the `axis` parameter to 1.

```python

# Dropping columns with any NaN values

df_dropped_nan_columns = df_nan.dropna(axis=1)

print(df_dropped_nan_columns)

```

Output:

```

Name

0 Alice

1 Bob

2 Charlie

3 None
```

### Summary

Dropping entries in pandas is a flexible operation that can be performed in various ways:

- Dropping Rows: Using `drop` with `axis=0` (default).

- Dropping Columns: Using `drop` with `axis=1`.

- In-Place Dropping: Modifying the original object using `inplace=True`.

- Dropping with Conditions: Using boolean indexing.

- Dropping Duplicates: Using `drop_duplicates`.

- Dropping NaN Values: Using `dropna`.

These operations help in cleaning and preparing data for analysis by removing unnecessary or problematic
data.

9.with examples explain indexing selection &iiltering

Indexing, selection, and filtering are fundamental operations in pandas that allow you to access and manipulate
data in Series and DataFrames. Below are explanations and examples of these operations.

### Indexing

Indexing refers to accessing elements in a Series or DataFrame by their labels or positions.

#### Indexing in Series

```python

import pandas as pd
# Creating a Series

s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

# Accessing elements by index label

print(s['a']) # Output: 1

# Accessing elements by position

print(s[0]) # Output: 1

```

#### Indexing in DataFrames

```python

# Creating a DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40],

'Salary': [50000, 60000, 70000, 80000]

df = pd.DataFrame(data)

# Accessing a single column

print(df['Name'])

# Accessing multiple columns

print(df[['Name', 'Salary']])
# Accessing rows by index label

print(df.loc[0])

# Accessing rows by position

print(df.iloc[0])

```

### Selection

Selection involves accessing subsets of data using different methods like `.loc`, `.iloc`, and boolean indexing.

#### Selection with `.loc`

`.loc` is used for label-based indexing.

```python

# Selecting a single row by index label

print(df.loc[0])

# Selecting multiple rows by index labels

print(df.loc[[0, 2]])

# Selecting a subset of rows and columns

print(df.loc[0:2, ['Name', 'Salary']])

```
#### Selection with `.iloc`

`.iloc` is used for position-based indexing.

```python

# Selecting a single row by position

print(df.iloc[0])

# Selecting multiple rows by positions

print(df.iloc[[0, 2]])

# Selecting a subset of rows and columns

print(df.iloc[0:3, 0:2])

```

### Filtering

Filtering involves selecting elements that meet certain conditions.

#### Filtering in Series

```python

# Filtering elements in a Series

filtered_s = s[s > 2]

print(filtered_s)

```
Output:

```

c 3

d 4

dtype: int64

```

#### Filtering in DataFrames

```python

# Filtering rows based on a column value

filtered_df = df[df['Age'] > 30]

print(filtered_df)

```

Output:

```

Name Age Salary

2 Charlie 35 70000

3 David 40 80000

```

#### Filtering with Multiple Conditions

```python

# Filtering rows based on multiple conditions

filtered_df = df[(df['Age'] > 30) & (df['Salary'] > 60000)]

print(filtered_df)

```

Output:

```

Name Age Salary

3 David 40 80000

```

### Advanced Indexing and Selection

#### Setting a New Index

```python

# Setting a new index

df_indexed = df.set_index('Name')

print(df_indexed)

# Accessing rows by the new index

print(df_indexed.loc['Alice'])

```

Output:

```

Age Salary

Name

Alice 25 50000
Bob 30 60000

Charlie 35 70000

David 40 80000

Age 25

Salary 50000

Name: Alice, dtype: int64

```

#### Conditional Selection with `query`

```python

# Using query for conditional selection

filtered_df_query = df.query('Age > 30 & Salary < 80000')

print(filtered_df_query)

```

Output:

```

Name Age Salary

2 Charlie 35 70000

```

### Summary

Indexing, selection, and filtering in pandas are powerful tools that allow you to access and manipulate data
efficiently:
- **Indexing**: Accessing elements by labels or positions.

- **Selection**: Using `.loc` for label-based and `.iloc` for position-based indexing.

- Filtering: Selecting data based on conditions.

These operations enable you to work effectively with data, facilitating tasks like data cleaning, exploration, and
analysis.

10.explain sorting & ranking with respect to pandas

Sorting and ranking are essential operations in pandas for organizing and analyzing data. Here's an explanation
of both concepts with examples:

### Sorting in Pandas

Sorting refers to ordering the data based on one or more columns or indices. You can sort data in ascending or
descending order.

#### Sorting by Index

You can sort a DataFrame or Series by its index using the `sort_index` method.

```python

import pandas as pd

# Creating a DataFrame

df = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40],

'Salary': [50000, 60000, 70000, 80000]

})

# Setting a custom index

df.set_index('Name', inplace=True)

# Sorting by index

df_sorted_index = df.sort_index()

print(df_sorted_index)

```

Output:

```

Age Salary

Name

Alice 25 50000

Bob 30 60000

Charlie 35 70000

David 40 80000

```

#### Sorting by Values

You can sort a DataFrame by its column values using the `sort_values` method.

```python

# Sorting by a single column

df_sorted_age = df.sort_values(by='Age')
print(df_sorted_age)

# Sorting by multiple columns

df_sorted_age_salary = df.sort_values(by=['Age', 'Salary'])

print(df_sorted_age_salary)

# Sorting in descending order

df_sorted_descending = df.sort_values(by='Age', ascending=False)

print(df_sorted_descending)

```

Output:

```

Age Salary

Name

Alice 25 50000

Bob 30 60000

Charlie 35 70000

David 40 80000

Age Salary

Name

Alice 25 50000

Bob 30 60000

Charlie 35 70000

David 40 80000
Age Salary

Name

David 40 80000

Charlie 35 70000

Bob 30 60000

Alice 25 50000

```

### Ranking in Pandas

Ranking assigns ranks to data, which is useful for understanding the relative standing of elements. The `rank`
method provides this functionality.

#### Ranking in a Series

```python

# Creating a Series

s = pd.Series([7, 1, 4, 2, 6, 3, 5])

# Ranking the Series

s_ranked = s.rank()

print(s_ranked)

# Ranking the Series in descending order

s_ranked_desc = s.rank(ascending=False)

print(s_ranked_desc)

```
Output:

```

0 6.0

1 1.0

2 4.0

3 2.0

4 5.0

5 3.0

6 7.0

dtype: float64

0 2.0

1 7.0

2 4.0

3 6.0

4 3.0

5 5.0

6 1.0

dtype: float64

```

#### Ranking in a DataFrame

```python

# Creating a DataFrame

df_rank = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40],

'Salary': [50000, 60000, 70000, 80000]

})

# Ranking the DataFrame by column

df_rank['Age_rank'] = df_rank['Age'].rank()

print(df_rank)

# Ranking with method options (e.g., 'min', 'max', 'dense', 'first')

df_rank['Salary_rank'] = df_rank['Salary'].rank(method='min')

print(df_rank)

df_rank['Salary_dense_rank'] = df_rank['Salary'].rank(method='dense')

print(df_rank)

```

Output:

```

Name Age Salary Age_rank

0 Alice 25 50000 1.0

1 Bob 30 60000 2.0

2 Charlie 35 70000 3.0

3 David 40 80000 4.0

Name Age Salary Age_rank Salary_rank

0 Alice 25 50000 1.0 1.0

1 Bob 30 60000 2.0 2.0

2 Charlie 35 70000 3.0 3.0

3 David 40 80000 4.0 4.0

Name Age Salary Age_rank Salary_rank Salary_dense_rank

0 Alice 25 50000 1.0 1.0 1.0

1 Bob 30 60000 2.0 2.0 2.0

2 Charlie 35 70000 3.0 3.0 3.0

3 David 40 80000 4.0 4.0 4.0

```

### Summary

**Sorting** and **ranking** in pandas are crucial for organizing and analyzing data:

- **Sorting**:

- `sort_index`: Sorts by index labels.

- `sort_values`: Sorts by column values, with options for ascending/descending order and sorting by multiple
columns.

- **Ranking**:

- `rank` method: Assigns ranks to Series or DataFrame columns, with options for different ranking methods
(`average`, `min`, `max`, `first`, `dense`).

These operations enable you to structure your data in a meaningful way, making it easier to perform further
analysis and derive insights.

unit-4
1.list an text & binary data loading functions in pandas

Pandas provides a variety of functions for loading text and binary data into DataFrames and Series.
Below is a list of these functions along with brief descriptions:

### Text Data Loading Functions

1. `pd.read_csv`: Reads a comma-separated values (CSV) file into a DataFrame.

```python

df = pd.read_csv('file.csv')

```

2. `pd.read_table`: Reads a general delimited file (like TSV) into a DataFrame.

```python

df = pd.read_table('file.tsv')

```

3. `pd.read_excel`: Reads an Excel file into a DataFrame.

```python

df = pd.read_excel('file.xlsx')

```

4. `pd.read_json`: Reads a JSON string or file into a DataFrame.

```python

df = pd.read_json('file.json')

```
5. **`pd.read_html`**: Reads HTML tables into a list of DataFrames.

```python

dfs = pd.read_html('http://example.com')

```

6. `pd.read_sql`: Reads SQL query or database table into a DataFrame.

```python

from sqlalchemy import create_engine

engine = create_engine('sqlite:///mydb.sqlite')

df = pd.read_sql('SELECT * FROM my_table', engine)

```

7. `pd.read_sql_query`: Reads a SQL query into a DataFrame.

```python

df = pd.read_sql_query('SELECT * FROM my_table', engine)

```

8. `pd.read_sql_table`: Reads a SQL database table into a DataFrame.

```python

df = pd.read_sql_table('my_table', engine)

```

9. `pd.read_clipboard`: Reads data from the clipboard into a DataFrame.

```python

df = pd.read_clipboard()
```

10. `pd.read_fwf`: Reads a table of fixed-width formatted lines into a DataFrame.

```python

df = pd.read_fwf('file.txt')

```

11. `pd.read_parquet`: Reads a Parquet file into a DataFrame.

```python

df = pd.read_parquet('file.parquet')

```

12. `pd.read_orc`: Reads an ORC file into a DataFrame.

```python

df = pd.read_orc('file.orc')

```

### Binary Data Loading Functions

1. `pd.read_pickle`: Loads a pickled pandas object (DataFrame or Series) from a file.

```python

df = pd.read_pickle('file.pkl')

```

2. `pd.read_feather`: Loads a Feather-format file into a DataFrame.

```python

df = pd.read_feather('file.feather')

```

3. `pd.read_hdf`: Reads from an HDF5 file into a DataFrame.

```python

df = pd.read_hdf('file.h5', 'key')

```

4. `pd.read_msgpack`: Reads from a MessagePack file into a DataFrame. (Note: As of pandas

1.0.0, the msgpack format is deprecated and will be removed in a future version.)

```python

df = pd.read_msgpack('file.msgpack')

```

5. `pd.read_stata`: Reads a Stata file into a DataFrame.

```python

df = pd.read_stata('file.dta')

```

6. `pd.read_sas`: Reads a SAS file into a DataFrame.

```python

df = pd.read_sas('file.sas7bdat')

```

7. `pd.read_spss`: Reads an SPSS file into a DataFrame.

```python

df = pd.read_spss('file.sav')

```

### Example Usage

Here’s an example of loading different types of data into a pandas DataFrame:

#### Loading a CSV File

```python

import pandas as pd

# Load CSV file

df_csv = pd.read_csv('data.csv')

print(df_csv.head())

```

#### Loading an Excel File

```python

# Load Excel file

df_excel = pd.read_excel('data.xlsx')

print(df_excel.head())

```
#### Loading a JSON File

```python

# Load JSON file

df_json = pd.read_json('data.json')

print(df_json.head())

```

#### Loading a SQL Query

```python

from sqlalchemy import create_engine

# Create a SQLAlchemy engine

engine = create_engine('sqlite:///mydb.sqlite')

# Load data from SQL query

df_sql = pd.read_sql_query('SELECT * FROM my_table', engine)

print(df_sql.head())

```

### Summary

Pandas provides a wide range of functions for loading data from various text and binary formats into
DataFrames and Series. These functions are highly flexible and support numerous options for
customizing the data import process. This makes pandas a powerful tool for data analysis and
manipulation in Python.

2.the optional aragements for reading & writing data in text format

In Python, there are several optional arrangements for reading and writing data in text format. Here's an
overview of some commonly used methods:

### Reading Data:

1. Using `open()` function:

```python

with open('file.txt', 'r') as f:

data = f.read()

```

2. Reading Line by Line:

```python

with open('file.txt', 'r') as f:

for line in f:

# process line

```

3. Using `readlines()` to Get a List of Lines:

```python

with open('file.txt', 'r') as f:

lines = f.readlines()

```
4. **Using `readline()` to Read One Line at a Time:**

```python

with open('file.txt', 'r') as f:

line = f.readline()

```

### Writing Data:

1. Using `open()` with 'w' or 'a' mode:

```python

with open('file.txt', 'w') as f:

f.write("data to write\n")

```

2. Writing Multiple Lines:

```python

lines_to_write = ["line 1\n", "line 2\n", "line 3\n"]

with open('file.txt', 'w') as f:

f.writelines(lines_to_write)

```

3. **Appending to a File:**

```python

with open('file.txt', 'a') as f:

f.write("appending data\n")

```
4. **Using `print()` with File Argument:**

```python

with open('file.txt', 'w') as f:

print("data to write", file=f)

```

These methods offer flexibility depending on your specific needs for reading from and writing to text files in
Python.

3.how do you read text files in pices

Reading text files in pieces can be beneficial when dealing with large files or when you only need to process
parts of a file at a time. One common approach is to read the file in chunks or lines. Here's how you can do it in
Python:

### Reading in Chunks:

```python

chunk_size = 1024 # Adjust the size according to your needs

with open('file.txt', 'r') as f:

while True:

chunk = f.read(chunk_size)

if not chunk:

break

# Process the chunk

```

### Reading Line by Line:

```python

with open('file.txt', 'r') as f:

while True:

line = f.readline()

if not line:

break

# Process the line

```

### Using `readlines()` with a Limit:

```python

num_lines = 100 # Adjust the number of lines you want to read at once

with open('file.txt', 'r') as f:

lines = f.readlines(num_lines)

while lines:

# Process the lines

lines = f.readlines(num_lines)

```

These methods allow you to read text files in manageable pieces, which can be useful for memory
management and processing large files efficiently. Adjust the chunk size or number of lines according to your
specific requirements.

4.explain how to write data to text format in python

Sure, here's a detailed explanation of how to write data to text format in Python:

### Using the `open()` function:

The `open()` function in Python is used to open files. You can specify the file name and mode ('w' for writing) as
arguments. If the file does not exist, it will be created. If the file already exists, its contents will be overwritten.

```python

with open('file.txt', 'w') as f:

f.write("Hello, world!\n")

```

In this example, we open a file named "file.txt" in writing mode ('w'). We then use the `write()` method to write
the string "Hello, world!\n" to the file. The '\n' character represents a newline, so the text will be written on a
new line.

### Appending to an Existing File:

If you want to add new data to an existing file without overwriting its contents, you can use the 'a' mode
(append) instead of 'w'.

```python

with open('file.txt', 'a') as f:

f.write("This is a new line\n")

```

This code will append the string "This is a new line\n" to the end of the file "file.txt".

### Writing Multiple Lines:

You can write multiple lines of text by passing a list of strings to the `writelines()` method.
```python

lines_to_write = ["Line 1\n", "Line 2\n", "Line 3\n"]

with open('file.txt', 'w') as f:

f.writelines(lines_to_write)

```

Each string in the list will be written to the file on a separate line.

### Using `print()` with File Argument:

Another way to write data to a file is by using the `print()` function with a file argument.

```python

with open('file.txt', 'w') as f:

print("Hello, world!", file=f)

```

This will achieve the same result as using the `write()` method.

### Closing the File:

It's important to close the file after writing to it. Using the `with` statement automatically closes the file when
the block of code inside it exits. However, if you're not using `with`, you should call the `close()` method
explicitly.

```python
f = open('file.txt', 'w')

f.write("Hello, world!\n")

f.close()

```

Closing the file ensures that any buffers are flushed and resources are released properly.

That's a comprehensive guide on how to write data to text format in Python!

5.how to handle other delimited formats

Handling other delimited formats in Python involves similar principles to handling text files, but you need to
specify the delimiter used to separate fields or values in the file. Here's how you can handle common delimited
formats like CSV (Comma-Separated Values) and TSV (Tab-Separated Values):

### CSV (Comma-Separated Values):

CSV files are a popular format for storing tabular data, where each line represents a row and commas separate
individual values within each row.

#### Reading CSV:

```python

import csv

with open('data.csv', newline='') as csvfile:

csvreader = csv.reader(csvfile, delimiter=',')

for row in csvreader:

print(', '.join(row)) # Process the row

```
#### Writing CSV:

```python

import csv

data = [

['Name', 'Age', 'City'],

['John', '30', 'New York'],

['Alice', '25', 'Los Angeles'],

with open('data.csv', 'w', newline='') as csvfile:

csvwriter = csv.writer(csvfile, delimiter=',')

for row in data:

csvwriter.writerow(row)

```

### TSV (Tab-Separated Values):

TSV files are similar to CSV files, but tabs ('\t') are used as the delimiter instead of commas.

#### Reading TSV:

```python

with open('data.tsv', 'r') as tsvfile:

tsvreader = csv.reader(tsvfile, delimiter='\t')

for row in tsvreader:

print('\t'.join(row)) # Process the row

```

#### Writing TSV:

```python

data = [

['Name', 'Age', 'City'],

['John', '30', 'New York'],

['Alice', '25', 'Los Angeles'],

with open('data.tsv', 'w', newline='') as tsvfile:

tsvwriter = csv.writer(tsvfile, delimiter='\t')

for row in data:

tsvwriter.writerow(row)

```

### Custom Delimiters:

For other delimited formats (e.g., pipe-delimited, colon-delimited), you can specify the delimiter accordingly
when reading or writing files using the `csv.reader` and `csv.writer` functions.

```python

# Reading custom delimited file

with open('data.txt', 'r') as customfile:

customreader = csv.reader(customfile, delimiter='|')

for row in customreader:

print('|'.join(row)) # Process the row

# Writing custom delimited file

data = [

['Name', 'Age', 'City'],

['John', '30', 'New York'],

['Alice', '25', 'Los Angeles'],

with open('data.txt', 'w', newline='') as customfile:

customwriter = csv.writer(customfile, delimiter='|')

for row in data:

customwriter.writerow(row)

```

This approach allows you to handle various delimited formats in Python effectively. Just make sure to specify
the correct delimiter when reading or writing the file.

6.explain how to react micrsoft excle files

Reacting to Microsoft Excel files typically involves reading, manipulating, and writing Excel files within a Python
environment. The most common library used for this purpose is `openpyxl`. Here's how you can perform
various actions with Excel files using `openpyxl`:

### Reading Excel Files:

1. **Installing `openpyxl`:**

If you haven't already installed `openpyxl`, you can do so via pip:

```

pip install openpyxl

```

2. Reading Excel Data:

```python

from openpyxl import load_workbook

wb = load_workbook('example.xlsx')

sheet = wb.active # Get the active sheet

for row in sheet.iter_rows(values_only=True):

print(row) # Process each row

```

3. Accessing Specific Cells:

```python

cell_value = sheet['A1'].value # Accessing cell A1

```

4. Iterating Through Rows and Columns:

```python

for row in sheet.iter_rows(min_row=2, max_row=10, min_col=1, max_col=3, values_only=True):

print(row) # Process each row within specified range

```

### Writing Excel Files:

1. **Creating a New Workbook:**

```python

from openpyxl import Workbook

wb = Workbook()

sheet = wb.active

```

2. Adding Data to Cells:

```python

sheet['A1'] = 'Data 1'

sheet.cell(row=2, column=2, value='Data 2')

```

3. Saving the Workbook:

```python

wb.save('new_file.xlsx')

```

### Modifying Existing Excel Files:

1. Loading an Existing Workbook:

```python

from openpyxl import load_workbook

wb = load_workbook('existing_file.xlsx')

sheet = wb.active
```

2. Updating Cell Values:

```python

sheet['A1'] = 'New Value'

```

3. **Saving Changes:**

```python

wb.save('existing_file.xlsx')

```

### Additional Functionalities:

- **Formatting Cells:** You can apply various formatting options to cells, such as font style, color, borders, etc.

- **Creating Charts:** `openpyxl` allows you to create basic charts within Excel files.

Reacting to Excel files with Python via `openpyxl` gives you a wide range of options for data manipulation, from
simple reading and writing to more advanced tasks like cell formatting and chart creation.

7.how to i nteract with web apis

Interacting with web APIs (Application Programming Interfaces) in Python involves sending HTTP requests to
the API endpoints and processing the responses. Here's a basic guide on how to interact with web APIs using
Python:

### Using the `requests` Library:

The `requests` library in Python provides a simple way to interact with web APIs. You can install it via pip:
```bash

pip install requests

```

### Making GET Requests:

```python

import requests

response = requests.get('https://api.example.com/data')

if response.status_code == 200:

data = response.json() # Convert response to JSON

# Process the data

else:

print('Error:', response.status_code)

```

### Making POST Requests:

```python

import requests

payload = {'key': 'value'}

response = requests.post('https://api.example.com/endpoint', json=payload)

if response.status_code == 200:

data = response.json() # Convert response to JSON

# Process the data

else:

print('Error:', response.status_code)

```

### Handling Authentication:

```python

import requests

# Basic Authentication

response = requests.get('https://api.example.com/data', auth=('username', 'password'))

# API Key Authentication

headers = {'Authorization': 'Bearer YOUR_API_KEY'}

response = requests.get('https://api.example.com/data', headers=headers)

```

### Handling Query Parameters:

```python

import requests

params = {'param1': 'value1', 'param2': 'value2'}

response = requests.get('https://api.example.com/data', params=params)

```

### Handling Responses:

```python

import requests

response = requests.get('https://api.example.com/data')

if response.ok: # Check if status code is in the range 200-299

data = response.json()

# Process the data

else:

print('Error:', response.status_code)

```

### Error Handling:

```python

import requests

try:

response = requests.get('https://api.example.com/data')

response.raise_for_status() # Raise an exception for HTTP errors

data = response.json()

# Process the data

except requests.exceptions.HTTPError as err:

print('HTTP error:', err)

except requests.exceptions.RequestException as err:

print('Connection error:', err)

```
### Rate Limiting and Throttling:

Some APIs have rate limits or throttling mechanisms to prevent abuse. Be sure to read the documentation of
the API you're using and implement appropriate rate limiting in your code if required.

By following these steps, you can effectively interact with various web APIs in Python using the `requests`
library. Remember to always handle errors gracefully and follow any rate limits or guidelines provided by the
API documentation.

8.brief about how to interact with database

Interacting with databases in Python involves connecting to the database, executing SQL queries, and
processing the results. Here's a brief overview of how you can interact with databases using Python:

### Choosing a Database Library:

There are several libraries in Python for interacting with different types of databases. Some popular ones
include:

1. **SQLite:** Included with Python's standard library, suitable for small to medium-sized databases.

2. **MySQL and MariaDB:** `mysql-connector-python`, `pymysql`, and `mysqlclient` are commonly used
libraries.

3. PostgreSQL: `psycopg2` is a popular library for interacting with PostgreSQL databases.

4. MongoDB: `pymongo` is a Python driver for MongoDB.

### Connecting to the Database:

```python

import sqlite3
# For SQLite

conn = sqlite3.connect('example.db')

# For other databases (e.g., MySQL, PostgreSQL)

# conn = psycopg2.connect(database="dbname", user="username", password="password", host="host",

port="port")

```

### Executing SQL Queries:

```python

# Create a cursor object

cursor = conn.cursor()

# Execute SQL queries

cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)")

cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", ("Alice", 30))

# Commit changes

conn.commit()

```

### Fetching Data:

```python

# Fetch data

cursor.execute("SELECT * FROM users")

rows = cursor.fetchall()

for row in rows:

print(row)

```

### Closing the Connection:

```python

# Close the cursor and connection

cursor.close()

conn.close()

```

### Error Handling:

```python

try:

# Database operations

except sqlite3.Error as e:

print("SQLite error:", e)

except psycopg2.Error as e:

print("PostgreSQL error:", e)

```

### ORM (Object-Relational Mapping):

Alternatively, you can use ORM libraries like SQLAlchemy or Django ORM for working with databases in a more
object-oriented way.

```python

# Example using SQLAlchemy

from sqlalchemy import create_engine, Column, Integer, String

from sqlalchemy.ext.declarative import declarative_base

from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///example.db')

Base = declarative_base()

class User(Base):

__tablename__ = 'users'

id = Column(Integer, primary_key=True)

name = Column(String)

age = Column(Integer)

Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)

session = Session()

user = User(name='Bob', age=25)

session.add(user)

session.commit()

```

These are the basic steps for interacting with databases in Python. Depending on your specific requirements
and the type of database you're using, you may need to adjust the code accordingly and handle more complex
scenarios such as transactions, concurrency, and security.

09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas
No ratings yet
Pandas
27 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Transaction Management Unit III
No ratings yet
Transaction Management Unit III
28 pages
Pandas
No ratings yet
Pandas
9 pages
COGNIZANT Data Analyst Interview Questions Part 2-11
No ratings yet
COGNIZANT Data Analyst Interview Questions Part 2-11
17 pages
DV
No ratings yet
DV
53 pages
S Pig Hive HBase Zookeeper 07
No ratings yet
S Pig Hive HBase Zookeeper 07
21 pages
Pandas
No ratings yet
Pandas
8 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas
No ratings yet
Pandas
49 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Panda
No ratings yet
Panda
46 pages
User Manual Paver 5.2
100% (1)
User Manual Paver 5.2
141 pages
Line By Line 12 IP
No ratings yet
Line By Line 12 IP
21 pages
UNIT 1 INTRODUCTION TO DATASCIENCE
No ratings yet
UNIT 1 INTRODUCTION TO DATASCIENCE
14 pages
LIst of practicals 2024 - 25 class xii
No ratings yet
LIst of practicals 2024 - 25 class xii
10 pages
ACFrOgCuxzI7id1LCXi9yoyuvISxGard75NvAshCzyRkhz0Fv_jimN6GuJsUI3qR2_jr7vxbRmHlwJPmcpRa7v3zCXyCokAXM23U17GlLnoA-5jSOz-osgZwdAL-ghXvjz5yld44_1rLLZaDMrebwXv-HRUry-kJjWFBo4Jkhw==
No ratings yet
ACFrOgCuxzI7id1LCXi9yoyuvISxGard75NvAshCzyRkhz0Fv_jimN6GuJsUI3qR2_jr7vxbRmHlwJPmcpRa7v3zCXyCokAXM23U17GlLnoA-5jSOz-osgZwdAL-ghXvjz5yld44_1rLLZaDMrebwXv-HRUry-kJjWFBo4Jkhw==
12 pages
DBMS Series Part-1
No ratings yet
DBMS Series Part-1
487 pages
lecture-9-pandas
No ratings yet
lecture-9-pandas
176 pages
unit 3
No ratings yet
unit 3
10 pages
PANDAS
No ratings yet
PANDAS
24 pages
Data Science Notes Unit-1 Part -2
No ratings yet
Data Science Notes Unit-1 Part -2
22 pages
DAP_3_module
No ratings yet
DAP_3_module
62 pages
Pandas
No ratings yet
Pandas
29 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
JOINS (1)
No ratings yet
JOINS (1)
10 pages
data handling module
No ratings yet
data handling module
10 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Phan1_Pandas_Numpy_Matplotlib
No ratings yet
Phan1_Pandas_Numpy_Matplotlib
158 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
05Getting Started With Pandas
No ratings yet
05Getting Started With Pandas
44 pages
python 2.1.2 (2)
No ratings yet
python 2.1.2 (2)
7 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
a5
No ratings yet
a5
28 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Unit 4
No ratings yet
Unit 4
36 pages
ML UNIT-2 NOTES
No ratings yet
ML UNIT-2 NOTES
17 pages
mongodb
No ratings yet
mongodb
37 pages
DBMS Series Part-2
No ratings yet
DBMS Series Part-2
80 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
IP NOTES
No ratings yet
IP NOTES
20 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
SAP Administration Tcodes
No ratings yet
SAP Administration Tcodes
6 pages
On Base Pru 04052021
No ratings yet
On Base Pru 04052021
101 pages
Introduction to Pandas & Data Structures
No ratings yet
Introduction to Pandas & Data Structures
11 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python Pandas
No ratings yet
Python Pandas
21 pages
Functional Dep Biruk Tsegaye 75721
No ratings yet
Functional Dep Biruk Tsegaye 75721
12 pages
Database Systems An Application Oriented Approach Second Edi
No ratings yet
Database Systems An Application Oriented Approach Second Edi
324 pages
Unit 2
No ratings yet
Unit 2
81 pages
OTDS Tenant Gateway Issue
100% (1)
OTDS Tenant Gateway Issue
9 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Using DDL Statements To Create and Manage Tables
No ratings yet
Using DDL Statements To Create and Manage Tables
41 pages
Shivali 21020203006
No ratings yet
Shivali 21020203006
51 pages
CH 13
No ratings yet
CH 13
14 pages
MongoDB Operations - Basics Guide
No ratings yet
MongoDB Operations - Basics Guide
10 pages
SAP BEx FF Extended
No ratings yet
SAP BEx FF Extended
15 pages
Customer Name Item Shipping Address
No ratings yet
Customer Name Item Shipping Address
13 pages
Practical No 24 Gad 22034 1
No ratings yet
Practical No 24 Gad 22034 1
3 pages
Jbase Intro
No ratings yet
Jbase Intro
100 pages
Pega Training Syllabus: BPM Overview, Project Implementation Methodology, Class Structures & Hierarchy
No ratings yet
Pega Training Syllabus: BPM Overview, Project Implementation Methodology, Class Structures & Hierarchy
4 pages
PLSQL EXAM
No ratings yet
PLSQL EXAM
24 pages
2
No ratings yet
2
7 pages
GUESS 2025 COMP
No ratings yet
GUESS 2025 COMP
4 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
4
No ratings yet
4
8 pages
Quick Guide Spring
No ratings yet
Quick Guide Spring
14 pages
1
No ratings yet
1
5 pages
Creating A Connection String and Working With SQL Server LocalDB - Microsoft Docs
No ratings yet
Creating A Connection String and Working With SQL Server LocalDB - Microsoft Docs
4 pages
Creating Attribute Views
No ratings yet
Creating Attribute Views
4 pages
UVCE BTech Cognizant AptitudeShortlist 2025
No ratings yet
UVCE BTech Cognizant AptitudeShortlist 2025
3 pages
mysql question dec2023
No ratings yet
mysql question dec2023
3 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Lab 2
No ratings yet
Lab 2
1 page
QAIzqQKVsY6bxps
No ratings yet
QAIzqQKVsY6bxps
2 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
bddccc29-9973-4fc8-a7f0-b4a61bbf852d
No ratings yet
bddccc29-9973-4fc8-a7f0-b4a61bbf852d
1 page
Pandas
No ratings yet
Pandas
5 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
CS8091 Bigdata Analytics Lessonplan With Date
No ratings yet
CS8091 Bigdata Analytics Lessonplan With Date
11 pages
Computer Science Project
No ratings yet
Computer Science Project
21 pages
Complete Notes of Structured Query Language ch-12
No ratings yet
Complete Notes of Structured Query Language ch-12
12 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

python unit 3 4

Uploaded by

python unit 3 4

Uploaded by

unit-3

1.list some indexing options with dataframe

### 1. Basic Indexing with `.iloc` and `.loc`

# Access the first row

# Access the element in the second row and third column

# Access the first row by label

### 2. Boolean Indexing

Boolean indexing uses a boolean mask to filter data.

# Filter rows where column 'A' is greater than 1

filtered_df = df[df['A'] > 1]

### 3. Setting and Resetting Index

- **Setting an Index**: Using `set_index` to change the DataFrame index.

- **Resetting an Index**: Using `reset_index` to revert to the default integer index.

arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]

index = pd.MultiIndex.from_arrays(arrays, names=('Upper', 'Lower'))

df_multi = pd.DataFrame({'Value': [10, 20, 30, 40]}, index=index)

# Access a specific level

### 5. Indexing with Slices

Slicing is a powerful tool to access portions of data.

# Slice rows from index 1 to 3

# Slice columns by name

### 6. Indexing with `.at` and `.iat`

- **`.at`**: Access a single value for a row/column label pair.

# Access the value at row index 0 and column 'A'

- **`.iat`**: Access a single value for a row/column pair by integer position.

# Access the value at row position 0 and column position 0

### 7. Indexing with `.xs`

# Access data for a specific level in a MultiIndex DataFrame

### 8. Indexing with `.query`

The `query` method provides a string-based query interface.

# Query to filter rows where column 'A' is greater than 1

query_df = df.query('A > 1')

The `filter` method is useful for selecting specific rows or columns.

# Filter rows by a specific condition

### 10. Indexing with `.get`

print(df.get('A', 'Default Value'))

### 11. Indexing with `.take`

# Take rows by integer positions

# Reindex DataFrame to a new index

#### Creating a Series

# Create a Series from a list

data = [10, 20, 30, 40]

#### Custom Index in Series

You can provide custom indices for the `Series`.

# Create a Series with a custom index

s = pd.Series(data, index=['a', 'b', 'c', 'd'])

#### Accessing Elements in a Series

Elements in a `Series` can be accessed using the index.

# Access element by index

# Access multiple elements by indices

#### Creating a DataFrame

# Create a DataFrame from a dictionary

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'Salary': [50000, 60000, 70000]

Name Age Salary

#### Accessing Elements in a DataFrame

You can access rows and columns using various methods.

# Access a single column

# Access multiple columns

# Access rows by index using iloc (position-based)

# Access rows by index using loc (label-based)

#### Adding and Dropping Columns

You can easily add or drop columns in a `DataFrame`.

# Add a new column

df['Department'] = ['HR', 'Engineering', 'Marketing']

Name Age Salary Department

1 Bob 30 60000 Engineering

2 Charlie 35 70000 Marketing

Name Salary Department

1 Bob 60000 Engineering

2 Charlie 70000 Marketing

#### Filtering Data

You can filter the `DataFrame` based on conditions.

# Filter rows where Salary is greater than 55000

filtered_df = df[df['Salary'] > 55000]

- Setting an Index: Using `set_index` to change the DataFrame index.

- Resetting an Index: Using `reset_index` to revert to the default integer index.

- `.at`: Access a single value for a row/column label pair.

- `.iat`: Access a single value for a row/column pair by integer position.

- Identifying Missing Data:

- Filling Missing Data:

- Dropping Missing Data: