0% found this document useful (0 votes)

26 views25 pages

KPMG Interview

The document discusses various aspects of handling data in Apache Spark, including dictionary manipulation, data masking, SQL queries for course completion statistics, partitioning behavior, managed identities, and schema management for CSV files. It highlights the importance of understanding how Spark processes data, the implications of schema mismatches, and the methods for creating shallow and deep copies of data structures. Additionally, it provides insights into optimizing data reading and processing in Spark, including the impact of configuration settings on partitioning and performance.

Uploaded by

er.evil323476

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views25 pages

KPMG Interview

Uploaded by

er.evil323476

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Value of below output

emp_detail_dict = {"emp_name": "Gokul",

"emp_id": "KGS123",

"emp_contact": "9876543210",

"emp_location": "Bangalore"}

emp_detail_dict_clone = emp_detail_dict

emp_detail_dict["emp_contact"] = "0123456789"

emp_detail_dict["emp_location"] = "Kochin"

print(emp_detail_dict_clone)

Assignment: emp_detail_dict_clone = emp_detail_dict does not create a copy but rather assigns
emp_detail_dict_clone to the same dictionary object as emp_detail_dict. Both variables now point
to the same dictionary in memory.

Modification: When you update emp_detail_dict, you are modifying the original dictionary object.
Since emp_detail_dict_clone references the same object, these changes are reflected when you
access emp_detail_dict_clone.

Output:

{'emp_name': 'Gokul', 'emp_id': 'KGS123', 'emp_contact': '9876543210', 'emp_location': 'Bangalore'}

if done
# Creating a shallow copy

emp_detail_dict_clone = copy.copy(emp_detail_dict)

in this case, the changes to emp_detail_dict are not reflected in emp_detail_dict_clone because
copy.copy() created a new dictionary object but with the same references to nested objects.

# Creating a deep copy

if emp_detail_dict_clone = copy.deepcopy(emp_detail_dict)

Here, copy.deepcopy() creates a completely independent copy of the dictionary, so changes to the
original dictionary do not affect the deep copy.

Data Masking, need only last 3 digits, rest should be X

|emp_id| name| phone|

+------+------+-------------+

| 1| Gokul|+919098903987|

| 2|Deepak|+917898920928|

| 12| Sunil|+918987678923|

| 34| Atul| +1233909238|

| 10| Pawan|+443290390924|

+------+------+-------------+

last_3_digits = substring(col("phone"), -3, 3)

# Mask all other digits with 'X'

masked_phone = concat(lit('X').repeat(length(col("phone")) - 3), last_3_digits)

# Create a new DataFrame with the masked phone number

masked_df = df.withColumn("masked_phone", masked_phone)

# Show the result

masked_df.select("emp_id", "name", "masked_phone").show(truncate=False)

==================================================================================
=======

Table: OnlineCourses

Columns Expected: CourseID,TotalEnrollments,TotalCompletions,DropRate

+--------+--------------+---------+--------------+

| 1| 2021-01-01| 1| 2021-01-15|

| 1| 2021-01-01| 2| null|

| 1| 2021-01-01| 3| 2021-01-10|

| 2| 2021-02-01| 1| 2021-02-10|

| 2| 2021-02-01| 2| null|

| 2| 2021-02-01| 3| null|

| 3| 2021-03-01| 1| 2021-03-10|

| 3| 2021-03-01| 2| 2021-03-15|

| 3| 2021-03-01| 3| null|
+--------+--------------+---------+--------------+

Here’s a SQL query that will produce the desired output:

sql

Copy code

SELECT

CourseID,

COUNT(*) AS TotalEnrollments,

SUM(CASE WHEN CompletionDate IS NOT NULL THEN 1 ELSE 0 END) AS TotalCompletions,

ROUND(

(1.0 * SUM(CASE WHEN CompletionDate IS NULL THEN 1 ELSE 0 END) / COUNT(*)) * 100,

) AS DropRate

FROM

OnlineCourses

GROUP BY

CourseID;

Explanation:

TotalEnrollments: COUNT(*) gives the total number of enrollments for each course.

TotalCompletions: SUM(CASE WHEN CompletionDate IS NOT NULL THEN 1 ELSE 0 END) counts the
number of students who completed the course.

DropRate: To calculate the drop rate:

SUM(CASE WHEN CompletionDate IS NULL THEN 1 ELSE 0 END)

-------counts the number of students who did not complete the course.
COUNT(*) gives the total number of enrollments.

1.0 * ensures that the division is done in floating-point arithmetic to get a decimal result.

(1.0 * SUM(CASE WHEN CompletionDate IS NULL THEN 1 ELSE 0 END) / COUNT(*)) * 100 computes
the percentage of students who dropped out.

ROUND(..., 2) rounds the drop rate to two decimal places.

This query will give you the CourseID, total number of enrollments, total number of completions, and
the drop rate as a percentage for each course in the OnlineCourses table.

Given Data

Here’s the OnlineCourses table again:

CourseID EnrollmentDate StudentID CompletionDate

1 2021-01-01 1 2021-01-15

1 2021-01-01 2 NULL

1 2021-01-01 3 2021-01-10

2 2021-02-01 1 2021-02-10

2 2021-02-01 2 NULL

2 2021-02-01 3 NULL

3 2021-03-01 1 2021-03-10

3 2021-03-01 2 2021-03-15

3 2021-03-01 3 NULL

SELECT

CourseID,

COUNT(*) AS TotalEnrollments,

SUM(CASE WHEN CompletionDate IS NOT NULL THEN 1 ELSE 0 END) AS TotalCompletions,

ROUND(

(1.0 * SUM(CASE WHEN CompletionDate IS NULL THEN 1 ELSE 0 END) / COUNT(*)) * 100,

) AS DropRate
FROM

OnlineCourses

GROUP BY

CourseID;

Step-by-Step Calculation

For CourseID 1:

TotalEnrollments: 3 (three entries for CourseID 1)

TotalCompletions: 2 (students with CompletionDate not NULL: StudentID 1 and 3)

DropRate:

Number of drops = 1 (student with CompletionDate NULL: StudentID 2)

Drop rate = (1 / 3) * 100 = 33.33%

For CourseID 2:

TotalEnrollments: 3 (three entries for CourseID 2)

TotalCompletions: 1 (student with CompletionDate not NULL: StudentID 1)

DropRate:

Number of drops = 2 (students with CompletionDate NULL: StudentID 2 and 3)

Drop rate = (2 / 3) * 100 = 66.67%

For CourseID 3:

TotalEnrollments: 3 (three entries for CourseID 3)

TotalCompletions: 2 (students with CompletionDate not NULL: StudentID 1 and 2)

DropRate:

Number of drops = 1 (student with CompletionDate NULL: StudentID 3)

Drop rate = (1 / 3) * 100 = 33.33%

Result Set

After running the query, you’ll get:

CourseID TotalEnrollments TotalCompletions DropRate

1 3 2 33.33

2 3 1 66.67

3 3 2 33.33

This result shows the total number of enrollments, completions, and the calculated drop rate for
each course.

=========================================================================

maximum partition size is 128 MB, & we have a 10 mb file,

but still 2 partitions internally? why?

In Apache Spark, the partitioning of data when reading from a file is influenced by several factors.
Even if you have a small file (e.g., 10 MB) and the default maximum partition size is 128 MB, the file
may still be divided into multiple partitions. Here are some reasons why this can happen:

Reasons for Multiple Partitions

File Size vs. Partitioning Strategy:

Spark's partitioning strategy is not solely based on file size. The default behavior when reading files
into a DataFrame or RDD involves creating partitions based on the file's block size and other factors,
such as the number of splits Spark makes.

Default Behavior and Splitting:

By default, Spark may create multiple partitions based on the underlying file system's block size and
the way it splits files. This can result in files being split into multiple partitions even if they are smaller
than the maximum partition size.

File Format and Splits:

The behavior can vary based on the file format. For example, text files might be split differently
compared to Parquet or ORC files. Text files can be split by line breaks or other delimiters, which
might result in multiple partitions even if the total file size is small.

Hadoop Configuration:

In the case of files read from Hadoop-compatible file systems (like HDFS), the Hadoop block size
configuration can influence how Spark splits the file. If the Hadoop block size is smaller than Spark's
partition size, it can affect the number of partitions created.

Spark Configuration Settings:

Certain Spark configuration settings, such as spark.sql.files.maxPartitionBytes, control the maximum

partition size. Even if the file size is smaller, settings like spark.sql.files.minPartitionNum or
spark.sql.files.maxPartitionBytes might affect the number of partitions.

Example Configuration

Here’s how Spark configuration settings can impact partitioning:

spark.sql.files.maxPartitionBytes: Defines the maximum number of bytes to pack into a single

partition when reading files. If the file is smaller than this threshold, Spark might still create multiple
partitions based on other factors.

spark.sql.files.minPartitionNum: Defines the minimum number of partitions to use when reading

files. If this setting is specified, Spark might create more partitions to meet this minimum
requirement.

Example Code for Checking Partitions

To verify how many partitions a DataFrame has and to investigate further, you can use the following
code:

python

Copy code

df = spark.read.format("csv").option("header", "true").load("path/to/your/file.csv")

# Check the number of partitions

num_partitions = df.rdd.getNumPartitions()

print(f"Number of partitions: {num_partitions}")

Summary
Even though the file size (10 MB) is smaller than the default maximum partition size (128 MB), Spark
may still create multiple partitions due to its internal partitioning strategy, file format, block size, and
configuration settings. Understanding these factors can help you fine-tune partitioning behavior in
your Spark jobs.

======================================================================

managed identity?

A managed identity is a feature provided by cloud platforms like Microsoft Azure that allows services
and applications to authenticate and interact with other services securely, without the need to
manage credentials directly. Managed identities simplify the authentication process by providing a
system-assigned or user-assigned identity that can be used to access resources and services.

==============================================================================

df = spark.read.format("csv").load(path)
2) df =
spark.read.format("csv").option("inferSchema",True).load(path)
3) df = spark.read.format("csv").schema(schema).load(path)

which will have used more processing time and compute power?
how many jobs for three lines above?

Processing Time and Compute Power:

Basic CSV Reading: Least amount of processing, as it does not involve schema inference.

CSV Reading with Schema Inference: More processing time and compute power due to the additional
schema inference step.

CSV Reading with Explicit Schema: Generally the most efficient, as it avoids the overhead of schema
inference.

Number of Jobs:

Basic CSV Reading: 1 job.

CSV Reading with Schema Inference: Typically 2 jobs (one for schema inference, one for the actual
data reading).

CSV Reading with Explicit Schema: 1 job.

By specifying the schema explicitly, you can reduce the overall processing time and computational
overhead, especially for large datasets or when schema inference is costly.

===========================================================================

file_path = "/mnt/rawcontainer/empinfo/empdata.csv"

if mistakenly specify a schema where a column intended to be of type int is

defined as string
emp_id --> int

emp_name --> string

emp_age --> int

emp_location --> string

If you mistakenly specify a schema where a column intended to be of type int is defined as string, it
can lead to various issues depending on how Spark processes the data. Here's a detailed explanation
of what can happen and how to resolve the issue:

What Happens

Schema Mismatch During Data Read: When you provide a schema with emp_id as string while your
data actually contains integer values for emp_id, Spark will interpret the emp_id column as a string.
This mismatch may not immediately raise an error but can lead to data inconsistency or issues in
downstream processing.

Potential Errors and Warnings:

Type Inconsistency: If you have data that does not conform to the expected schema, you might
encounter type conversion issues when performing operations that require numeric types (e.g.,
sorting, aggregations).

Conversion Issues: If you later try to convert this column to int or perform operations assuming it is
an integer, Spark might throw errors or produce incorrect results.

Data Processing Errors:

Type Casting Errors: When performing transformations or actions that require integer operations,
such as mathematical calculations or aggregations, Spark might produce runtime errors or
unexpected behavior if it encounters non-numeric strings in a column expected to be numeric.

Example Code and Error

Assuming you have the following schema and CSV data:

python

Copy code

schema = StructType([

StructField("emp_id", StringType(), True),

StructField("emp_name", StringType(), True),

StructField("emp_age", IntegerType(), True),

StructField("emp_location", StringType(), True)

])
df = spark.read.format("csv").schema(schema).load("path/to/csv")

If your data file looks like this:

sql

Copy code

emp_id,emp_name,emp_age,emp_location

1,John Doe,30,New York

2,Jane Smith,25,Los Angeles

And you have specified emp_id as StringType, no error will occur during the read operation, but
issues may arise later if you attempt to perform operations assuming emp_id is an integer.

How to Resolve the Issue

Correct the Schema: If you realize that the column should be an integer, modify the schema
definition to use IntegerType for the emp_id column.

python

Copy code

schema_correct = StructType([

StructField("emp_id", IntegerType(), True),

StructField("emp_name", StringType(), True),

StructField("emp_age", IntegerType(), True),

StructField("emp_location", StringType(), True)

])

df_correct = spark.read.format("csv").schema(schema_correct).load("path/to/csv")

Cast Column to Correct Type: If you've already loaded the DataFrame and need to correct the column
type, you can use the cast method to convert the column to the appropriate type.

python

Copy code

from pyspark.sql.functions import col

df_corrected = df.withColumn("emp_id", col("emp_id").cast(IntegerType()))

Ensure that all values in the emp_id column are valid integers; otherwise, casting will result in null
values for invalid entries.

Handle Conversion Errors: Before casting, you might want to handle or clean invalid values to avoid
conversion errors:

python

Copy code

from pyspark.sql.functions import when

df_filtered = df.withColumn(

"emp_id",

when(col("emp_id").rlike("^[0-9]+$"), col("emp_id").cast(IntegerType()))

.otherwise(None) # or handle invalid entries differently

Summary

Immediate Effect: No error will be raised just by specifying a wrong type during schema definition,
but you may face issues with data processing or transformations later on.

Resolution: Update the schema to the correct type, cast columns if needed, and handle any
conversion issues to ensure data consistency and correctness.

=============================================================================
multiple csv in one folder with schema change?

Below answer is without schema change--

# Read all CSV files from the folder into a DataFrame

df = spark.read.format("csv") \

.option("header", "true") \ # Indicates that the first row of each CSV file contains column headers

.option("inferSchema", "true") \ # Automatically infers schema from the data; can be used if
schema is unknown

.load(path)

Detailed Explanation

format("csv"): Specifies that the file format is CSV.

option("header", "true"): Indicates that the first line of each CSV file contains column headers. Set
this to "false" if your files do not have headers.

option("inferSchema", "true"): Allows Spark to automatically infer the schema based on the data. If
you have a predefined schema and want to avoid schema inference, you can skip this option or set it
to "false".

load(path): Loads all CSV files from the specified path. Spark will read all files in the folder and
concatenate them into a single DataFrame.

WITH SCHEMA CHANGE

When dealing with multiple CSV files that have different schemas (i.e., some files have more columns
than others), you need to handle schema variations gracefully to ensure you can read all files into a
single DataFrame. Here’s how you can approach this:
Strategies for Handling Schema Variations

Schema Merging: Use schema merging to handle different schemas. Spark can automatically merge
schemas when reading files, but it might not always work perfectly if there are significant differences
between schemas.

Read and Process Files Individually: Read each set of files with their respective schemas and then
combine them.

Define a Union Schema: Define a schema that encompasses all possible columns and use it to read
the files. This schema should include all columns from all files, with some columns possibly
containing null values if they don't exist in all files.

Approach 1: Schema Merging

Spark can merge schemas when reading files. However, this only works if the files have the same
column names and types. If the column names or types differ significantly, schema merging might
not produce the expected results.

python

Copy code

from pyspark.sql import SparkSession

# Create a Spark session

spark = SparkSession.builder \

.appName("ReadCSVWithSchemaMerging") \

.getOrCreate()

# Read all CSV files from the folder with schema merging

df = spark.read.format("csv") \

.option("header", "true") \

.option("mergeSchema", "true") \

.load("path/to/csv/folder")

df.printSchema()
Approach 2: Read and Process Files Individually

If schema merging doesn’t work as expected, you can read files in groups according to their schema
and then merge them.

python

Copy code

from pyspark.sql import SparkSession

from pyspark.sql.types import StructType, StructField, IntegerType, StringType

# Create a Spark session

spark = SparkSession.builder \

.appName("ReadAndMergeCSV") \

.getOrCreate()

# Define schemas for different sets of files

schema_set1 = StructType([

StructField("emp_id", IntegerType(), True),

StructField("emp_name", StringType(), True),

StructField("emp_age", IntegerType(), True),

StructField("emp_location", StringType(), True),

# ... more columns if necessary

])

schema_set2 = StructType([

StructField("emp_id", IntegerType(), True),

StructField("emp_name", StringType(), True),

# Only the columns that are common in this set of files

])

# Read files with different schemas

df_set1 = spark.read.format("csv") \
.option("header", "true") \

.schema(schema_set1) \

.load("path/to/csv/folder/set1/*.csv")

df_set2 = spark.read.format("csv") \

.option("header", "true") \

.schema(schema_set2) \

.load("path/to/csv/folder/set2/*.csv")

# Union the DataFrames

df_combined = df_set1.unionByName(df_set2, allowMissingColumns=True)

df_combined.printSchema()

Approach 3: Define a Union Schema

Define a comprehensive schema that includes all columns from all files, then read the files using this
schema. Columns not present in some files will be filled with null values.

python

Copy code

from pyspark.sql import SparkSession

from pyspark.sql.types import StructType, StructField, IntegerType, StringType

# Create a Spark session

spark = SparkSession.builder \

.appName("ReadCSVWithUnionSchema") \

.getOrCreate()

# Define a comprehensive schema including all possible columns

union_schema = StructType([

StructField("emp_id", IntegerType(), True),

StructField("emp_name", StringType(), True),

StructField("emp_age", IntegerType(), True),

StructField("emp_location", StringType(), True),

StructField("additional_column1", StringType(), True), # Include all possible columns

StructField("additional_column2", StringType(), True)

# ... add more fields if necessary

])

# Read all CSV files with the comprehensive schema

df = spark.read.format("csv") \

.option("header", "true") \

.schema(union_schema) \

.load("path/to/csv/folder")

df.printSchema()

Summary

Schema Merging: Use mergeSchema option for automatic schema merging, though it might not
always handle complex cases perfectly.

Read and Process Individually: Handle files with different schemas separately and then union them.

Union Schema: Define a schema that includes all possible columns from all files and use it to read the
files.

read excel with 10 sheets?in 1 df

Here’s how you can read an Excel file with multiple sheets into a single DataFrame in Spark:

Using com.crealytics.spark.excel Library

Add the Spark Excel Library

You need to include the spark-excel library in your Spark session. This can be done by including the
dependency when starting your Spark session.

python

Copy code

spark = SparkSession.builder \

.appName("ReadExcelWithSheets") \

.config("spark.jars.packages", "com.crealytics:spark-excel_2.12:0.14.0") \

.getOrCreate()

Replace spark-excel_2.12:0.14.0 with the version that matches your Spark and Scala versions. Check
spark-excel documentation for the correct version.

Read Excel Sheets

Since the library supports reading individual sheets, you can read each sheet into a separate
DataFrame and then union them into a single DataFrame. Here’s how you can do it:

python

Copy code

from pyspark.sql import SparkSession

# Create a Spark session

spark = SparkSession.builder \

.appName("ReadExcelWithSheets") \

.config("spark.jars.packages", "com.crealytics:spark-excel_2.12:0.14.0") \

.getOrCreate()

# Path to the Excel file

excel_path = "path/to/excel/file.xlsx"
# List of sheet names

sheet_names = ["Sheet1", "Sheet2", "Sheet3", "Sheet4", "Sheet5",

"Sheet6", "Sheet7", "Sheet8", "Sheet9", "Sheet10"]

# Read all sheets into separate DataFrames and union them

df_list = []

for sheet in sheet_names:

df = spark.read.format("com.crealytics.spark.excel") \

.option("sheetName", sheet) \

.option("header", "true") \

.load(excel_path)

df_list.append(df)

# Union all DataFrames into one

df_combined = df_list[0]

for df in df_list[1:]:

df_combined = df_combined.unionByName(df, allowMissingColumns=True)

df_combined.printSchema()

Explanation:

option("sheetName", sheet): Specifies the sheet to read.

unionByName(df, allowMissingColumns=True): Combines DataFrames, allowing for missing columns.

Handling Varying Schemas

If the sheets have different schemas, ensure that:

Column names across sheets are consistent if you want to union them directly.

Use allowMissingColumns=True in unionByName to handle cases where columns might be missing in

some DataFrames.

Alternative Approach: Using Pandas

If you prefer or need to use Pandas, you can read the Excel file with Pandas and then convert it to a
Spark DataFrame. This approach is useful if you're working in a non-distributed environment or for
initial data exploration.

python

Copy code

import pandas as pd

from pyspark.sql import SparkSession

# Create a Spark session

spark = SparkSession.builder \

.appName("ReadExcelWithSheets") \

.getOrCreate()

# Path to the Excel file

excel_path = "path/to/excel/file.xlsx"

# Read all sheets into a dictionary of DataFrames

dfs = pd.read_excel(excel_path, sheet_name=None)

# Combine all sheets into a single DataFrame

combined_df = pd.concat(dfs.values(), ignore_index=True)

# Convert Pandas DataFrame to Spark DataFrame

spark_df = spark.createDataFrame(combined_df)

spark_df.printSchema()

Explanation:

pd.read_excel(excel_path, sheet_name=None): Reads all sheets into a dictionary of DataFrames.

pd.concat(dfs.values(), ignore_index=True): Combines all DataFrames.

spark.createDataFrame(combined_df): Converts the combined Pandas DataFrame to a Spark
DataFrame.

Summary

com.crealytics.spark.excel Library: Use it to read and union multiple sheets from an Excel file directly
in Spark.

Pandas: Read Excel with Pandas, combine sheets, and convert to a Spark DataFrame if needed.

Choose the method that best fits your environment and data processing requirements.

You said:

what can be output of this, give exaample

dfs = pd.read_excel(excel_path, sheet_name=None)

ChatGPT said:

ChatGPT

The command pd.read_excel(excel_path, sheet_name=None) in Pandas reads an Excel file and

returns a dictionary where each key is a sheet name and each value is a DataFrame corresponding to
that sheet. This allows you to work with all sheets in the Excel file individually.

Here’s a detailed breakdown and example of the output you can expect from this command:

Example Excel File

Assume you have an Excel file example.xlsx with two sheets:

Sheet1:

lua

Copy code

| ID | Name | Age |

|----|-------|-----|

| 1 | Alice | 30 |
| 2 | Bob | 25 |

Sheet2:

lua

Copy code

| ID | Product | Price |

|----|-----------|-------|

| 1 | Laptop | 1000 |

| 2 | Smartphone| 500 |

Code to Read Excel File

python

Copy code

import pandas as pd

# Path to the Excel file

excel_path = "path/to/example.xlsx"

# Read all sheets into a dictionary of DataFrames

dfs = pd.read_excel(excel_path, sheet_name=None)

# Print the keys of the dictionary to see sheet names

print(dfs.keys())

Output

When you run pd.read_excel(excel_path, sheet_name=None), dfs will be a dictionary with the
following structure:

python

Copy code

'Sheet1': pd.DataFrame({

'ID': [1, 2],

'Name': ['Alice', 'Bob'],

'Age': [30, 25]

}),

'Sheet2': pd.DataFrame({

'ID': [1, 2],

'Product': ['Laptop', 'Smartphone'],

'Price': [1000, 500]

})

Explanation

Keys: The keys of the dictionary are the sheet names from the Excel file ('Sheet1', 'Sheet2' in this
case).

Values: The values are DataFrames corresponding to each sheet. Each DataFrame contains the data
from the respective sheet in the Excel file.

Example Output

Printing the Keys:

python

Copy code

print(dfs.keys())

Output:

css

Copy code

dict_keys(['Sheet1', 'Sheet2'])

This shows that there are two sheets in the Excel file, named 'Sheet1' and 'Sheet2'.

Accessing DataFrames:

python

Copy code

sheet1_df = dfs['Sheet1']
sheet2_df = dfs['Sheet2']

print(sheet1_df)

print(sheet2_df)

Output:

shell

Copy code

# DataFrame for 'Sheet1'

ID Name Age

0 1 Alice 30

1 2 Bob 25

# DataFrame for 'Sheet2'

ID Product Price

0 1 Laptop 1000

1 2 Smartphone 500

Summary

The output of pd.read_excel(excel_path, sheet_name=None) is a dictionary where each key is a

sheet name and each value is a DataFrame with the content of that sheet. This allows you to access
and manipulate the data from multiple sheets easily within your Python code.

SAFe Scrum Master Roles and Responsibilities
100% (1)
SAFe Scrum Master Roles and Responsibilities
26 pages
Quarter 1 Week 1 - Computer Programming 1
No ratings yet
Quarter 1 Week 1 - Computer Programming 1
3 pages
CS201 Grand Quiz File (New) by Nimra Khan
No ratings yet
CS201 Grand Quiz File (New) by Nimra Khan
100 pages
1) What Is The Collection Framework in Java?
No ratings yet
1) What Is The Collection Framework in Java?
14 pages
Real-Time Encapsulation Interview Questions and Answers
No ratings yet
Real-Time Encapsulation Interview Questions and Answers
20 pages
CGR Final Report
No ratings yet
CGR Final Report
11 pages
Applications: David Fowler @davidfowl Damian Edwards @damianedwards
No ratings yet
Applications: David Fowler @davidfowl Damian Edwards @damianedwards
40 pages
Lec 4 Software
No ratings yet
Lec 4 Software
22 pages
02-kl 009.12 SM Student Guide en v0.2
No ratings yet
02-kl 009.12 SM Student Guide en v0.2
75 pages
CSE 3rd
No ratings yet
CSE 3rd
63 pages
Programming Concepts and Embedded Programming in C and C++
No ratings yet
Programming Concepts and Embedded Programming in C and C++
55 pages
Dzone Kubernetes in The Enterprise 2022 1669219453241
No ratings yet
Dzone Kubernetes in The Enterprise 2022 1669219453241
52 pages
Test Through Lesson 3
No ratings yet
Test Through Lesson 3
8 pages
Iprf
No ratings yet
Iprf
78 pages
SE463 Practice Problems
No ratings yet
SE463 Practice Problems
67 pages
Analyzing Data Using Python - Cleaning and Analyzing Data in Pandas
No ratings yet
Analyzing Data Using Python - Cleaning and Analyzing Data in Pandas
81 pages
How To Move and Click The Mouse in VBA - Excel Help HQ
No ratings yet
How To Move and Click The Mouse in VBA - Excel Help HQ
5 pages
Logcat 1694016949869
No ratings yet
Logcat 1694016949869
14 pages
AL Notes
No ratings yet
AL Notes
61 pages
Solution of Board Theory Question
No ratings yet
Solution of Board Theory Question
4 pages
Bounouh
No ratings yet
Bounouh
13 pages
Hi Boya Pravallika,: Attempt FREE NEET Mock Test From Unacademy With AIR & Know Where You Stand
No ratings yet
Hi Boya Pravallika,: Attempt FREE NEET Mock Test From Unacademy With AIR & Know Where You Stand
3 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Ambassify Security Audit 2021
No ratings yet
Ambassify Security Audit 2021
2 pages
Spark and Scala 2
No ratings yet
Spark and Scala 2
11 pages
Sail
No ratings yet
Sail
2 pages
Py Curses
No ratings yet
Py Curses
8 pages
What Is Meant by Unpacking Columns ?: (X, Y) X y (A, B, C) A B C
No ratings yet
What Is Meant by Unpacking Columns ?: (X, Y) X y (A, B, C) A B C
8 pages
Practical File (Xii - Ip Final)
No ratings yet
Practical File (Xii - Ip Final)
35 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Osmania University Web Programming Lab Manual
No ratings yet
Osmania University Web Programming Lab Manual
61 pages
Python For Machine Learning From Basics To Advance Part 3
No ratings yet
Python For Machine Learning From Basics To Advance Part 3
89 pages
Ip Sample Paper 6 Answer Key
No ratings yet
Ip Sample Paper 6 Answer Key
6 pages
Details of Event
No ratings yet
Details of Event
4 pages
Sqlrealtion Albra
No ratings yet
Sqlrealtion Albra
20 pages
Kushal - Project
No ratings yet
Kushal - Project
37 pages
PRACTICALS
No ratings yet
PRACTICALS
52 pages
DS (Pandas)
No ratings yet
DS (Pandas)
17 pages
IP Practical File Project
No ratings yet
IP Practical File Project
60 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
The Orbis School Keshavnagar, Pune: Name of The Student: Class: Cbse Roll No: Subject: Academic Year
No ratings yet
The Orbis School Keshavnagar, Pune: Name of The Student: Class: Cbse Roll No: Subject: Academic Year
35 pages
Software Engineering Unit-V (Se R23 Jntuk)
No ratings yet
Software Engineering Unit-V (Se R23 Jntuk)
14 pages
Apache Spark
No ratings yet
Apache Spark
5 pages
Rajendra Reddy Task-1
No ratings yet
Rajendra Reddy Task-1
9 pages
I.P File
No ratings yet
I.P File
20 pages
SQL Aggregate Functions
No ratings yet
SQL Aggregate Functions
9 pages
NTU AB0403 Quiz Notes
No ratings yet
NTU AB0403 Quiz Notes
18 pages
Explain Delegation Pattern
No ratings yet
Explain Delegation Pattern
4 pages
Ip Project Work 2
No ratings yet
Ip Project Work 2
52 pages
SRITAN KAR - Doc
No ratings yet
SRITAN KAR - Doc
2 pages
Python
No ratings yet
Python
14 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Panda
No ratings yet
Panda
39 pages
Lesson 10 Repair Process
No ratings yet
Lesson 10 Repair Process
29 pages
Ip Practical 2024 25 1 To 34
No ratings yet
Ip Practical 2024 25 1 To 34
32 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Sclref
No ratings yet
Sclref
896 pages
Zomatocasestudydata
No ratings yet
Zomatocasestudydata
9 pages
IP Practicals
No ratings yet
IP Practicals
30 pages
Python Pandas - 2 2020-21
No ratings yet
Python Pandas - 2 2020-21
21 pages
IP Practical
No ratings yet
IP Practical
15 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
Pandas Data Analytics
No ratings yet
Pandas Data Analytics
61 pages
Half Yearly Answers
No ratings yet
Half Yearly Answers
10 pages
Pyspark Distinct and Filter
No ratings yet
Pyspark Distinct and Filter
3 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Practical Questions
No ratings yet
Practical Questions
7 pages
XII CBSE IP Lab Solutions (2024-25)
No ratings yet
XII CBSE IP Lab Solutions (2024-25)
15 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Flipkart Business Analyst Interview Questions
No ratings yet
Flipkart Business Analyst Interview Questions
16 pages
Walmart Data Analyst Interview Experience
No ratings yet
Walmart Data Analyst Interview Experience
10 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
SCL Doc1
No ratings yet
SCL Doc1
16 pages
Spark Optimization Techniques 1676610430
No ratings yet
Spark Optimization Techniques 1676610430
15 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
CFE
No ratings yet
CFE
5 pages
Exp 3
No ratings yet
Exp 3
10 pages
Relational Algebra in DBMS
No ratings yet
Relational Algebra in DBMS
16 pages
Practical File IP Class 12 2024 25 Sharing Removed
No ratings yet
Practical File IP Class 12 2024 25 Sharing Removed
29 pages
SCL Doc2
No ratings yet
SCL Doc2
9 pages
AI Agents For Resumes
No ratings yet
AI Agents For Resumes
4 pages
SQL Real Time Interview Questions and Answers
No ratings yet
SQL Real Time Interview Questions and Answers
3 pages
PostgreSQL Data Base Design Part 2
No ratings yet
PostgreSQL Data Base Design Part 2
40 pages
Chapter 2 Q & A
No ratings yet
Chapter 2 Q & A
2 pages
Data Mining - Week - 4
No ratings yet
Data Mining - Week - 4
8 pages
Practical Exam
No ratings yet
Practical Exam
13 pages
Myinterview Qs
No ratings yet
Myinterview Qs
9 pages
Interview Questions
No ratings yet
Interview Questions
24 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
Operating System
No ratings yet
Operating System
7 pages
Binary Operations
No ratings yet
Binary Operations
7 pages
Pandas Basics Guide
No ratings yet
Pandas Basics Guide
4 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Data Frames
No ratings yet
Data Frames
10 pages
Software Design Simplified
From Everand
Software Design Simplified
Liviu Catalin Dorobantu
No ratings yet