Pyspark and python preparation notes

Useful for python and pyspark interview preparation

Uploaded by

vishnutej016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Pyspark and python preparation notes

Useful for python and pyspark interview preparation

Uploaded by

vishnutej016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

PYTHON CODING

1. Function Definition: The function find_most_occuring(my_string) accepts a string

and is supposed to calculate the frequency of each character.
2. Dictionary Usage: A dictionary dict_1 is initialized to store each character as a key, with its
count as the value.
3. Character Counting: A loop iterates through my_string. For each character, it checks if the
character already exists in dict_1. If it does, it increments the count by 1; otherwise, it
initializes the character's count at 1.
4. Sort by Frequency: Once the counts are in dict_1, sort the dictionary items by values in
descending order.
5. Output the Top 3: Extract and print the top three characters and their counts

INT QUESTIONS On ability to write efficient code, handle edge cases, and think
about improvements or alternative methods

1. How does your code handle uppercase and lowercase letters?

2. Can you explain why you chose a dictionary for counting?
3. What is the time complexity of your solution?
4. How would you handle ties in the top three characters if two characters have the same
frequency?
5. How could you modify this function to return the result as a list of tuples instead of
printing it?
6. How would you modify this code to handle very large strings?
7. Can you think of a built-in Python function or library that could simplify this task?
8. How would you test this function?

Finding the paint color with the lowest price from a dictionary
of paint colors and prices.
This task tests their knowledge of dictionary operations, function definitions, and the
use of the min function with a key argument in Python.

1. Why did you use min with key=paints.get?

2. What would happen if two colors had the same minimum price?
3. What is the time complexity of this solution?
4. How would you modify the function to return the top three cheapest colors instead of
just one?
5. What are some edge cases to consider for this function?
6. Can you implement this using list comprehension instead of min?
7. What is the advantage of using a dictionary over a list in this scenario?

Spark-based data processing tasks

1. Create a list of tuples (rows): Each tuple contains a numeric value and a string
identifier (e.g., (1, 'id1')).
2. Convert list to an RDD: The candidate is converting the list of rows into a Resilient
Distributed Dataset (RDD) using spark.sparkContext.parallelize(rows).
3. Define a schema: The schema includes two fields:
o value (IntegerType)
o id (StringType)
4. Convert the RDD to a DataFrame: The candidate converts the RDD to a DataFrame
using the defined schema and then displays it with df.show().
5. Creating a DataFrame from raw data (such as a list of tuples).
6. Structuring data with a schema definition in PySpark.
7. Data transformation in a distributed environment using Spark.
8. Demonstrating how to create and structure a DataFrame in PySpark.
9. Working with RDDs and schemas in PySpark.
10. Displaying or querying data in a DataFrame.

Coding interview involves PySpark and DataFrames/ PySpark’s architecture, data

manipulation, and performance optimization techniques.

1. What is PySpark, and how does it differ from traditional Python data processing?
2. Can you explain the difference between an RDD and a DataFrame in PySpark?
3. Why do we need to define a schema in PySpark DataFrames?
4. How would you create a DataFrame from a list of tuples in PySpark?
5. What are some common use cases for Spark in a data engineering context?
6. How would you filter rows in this DataFrame where value is greater than 1?
7. What is lazy evaluation in Spark, and how does it apply to transformations in this
DataFrame?
8. Can you explain how Spark handles data across partitions, and why this is beneficial
for big data?
9. How would you perform a groupBy operation on this DataFrame, for example,
grouping by value?
10. What are some performance optimization techniques you could apply in PySpark?
11. How would you add a new column to the DataFrame with a transformed version of
value, for example, doubling each value?
12. If we need to save this DataFrame to a file or a database, how would we do that in
PySpark?
13. Can you explain what happens when you use df.show() versus df.collect()?

Textbook of Orthodontics - Ebook - R
100% (12)
Textbook of Orthodontics - Ebook - R
781 pages
Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Important (Python Built-In Methods) (CheatSheet)
No ratings yet
Important (Python Built-In Methods) (CheatSheet)
6 pages
Learning R Programming
From Everand
Learning R Programming
Kun Ren
5/5 (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Python Refcard
100% (5)
Python Refcard
2 pages
Mysticism and Philosophy in Al-Andalus
No ratings yet
Mysticism and Philosophy in Al-Andalus
290 pages
Notes For Fintech Assesment, Cheatsheet
No ratings yet
Notes For Fintech Assesment, Cheatsheet
19 pages
Cst362 Draft Scheme
No ratings yet
Cst362 Draft Scheme
9 pages
Python Lab Programs
No ratings yet
Python Lab Programs
58 pages
PDS Praticals List
No ratings yet
PDS Praticals List
7 pages
Constitution
No ratings yet
Constitution
3 pages
Practical Guidelines & Questions for Grade X Term-1
No ratings yet
Practical Guidelines & Questions for Grade X Term-1
3 pages
Iae 2 Answer Key
No ratings yet
Iae 2 Answer Key
4 pages
ANSWER KEY FOR PB-II
No ratings yet
ANSWER KEY FOR PB-II
12 pages
CS-PB- Sample Paper
No ratings yet
CS-PB- Sample Paper
7 pages
ENGG1810 Recap
No ratings yet
ENGG1810 Recap
28 pages
Data Science With Machine Learning Level 1-5
No ratings yet
Data Science With Machine Learning Level 1-5
7 pages
12_CS_MODEL1_QP_2023
No ratings yet
12_CS_MODEL1_QP_2023
16 pages
Python Interview Questions.docx
No ratings yet
Python Interview Questions.docx
23 pages
nss-1
No ratings yet
nss-1
2 pages
Questions Practical File
No ratings yet
Questions Practical File
13 pages
XII_CS_WC_MS_SET 2
No ratings yet
XII_CS_WC_MS_SET 2
5 pages
Kendriya Vidyalaya Sangathan: Kolkata Region First Preboard E Informatics Practices New (065) - Class Xii
No ratings yet
Kendriya Vidyalaya Sangathan: Kolkata Region First Preboard E Informatics Practices New (065) - Class Xii
15 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
Kendriya Vidyalaya Sangathan, Mumbai Region 1 Pre-Board Examination 2019-20
No ratings yet
Kendriya Vidyalaya Sangathan, Mumbai Region 1 Pre-Board Examination 2019-20
11 pages
Python Content
No ratings yet
Python Content
7 pages
22616-2024-summer-model-answer-paper (1)
No ratings yet
22616-2024-summer-model-answer-paper (1)
28 pages
Most Probable Questions
No ratings yet
Most Probable Questions
6 pages
NTU AB0403 Quiz Notes
No ratings yet
NTU AB0403 Quiz Notes
18 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
Httppython Mykvs inuploadsfilesXIIInfo Pract S E 150 PDF
No ratings yet
Httppython Mykvs inuploadsfilesXIIInfo Pract S E 150 PDF
15 pages
PDS Viva
No ratings yet
PDS Viva
3 pages
Python
No ratings yet
Python
14 pages
Sac QB 2023-2024
No ratings yet
Sac QB 2023-2024
2 pages
Question Paper Monthly Test October19
No ratings yet
Question Paper Monthly Test October19
3 pages
Class XII CS FP3 Exam Question Paper Set 4 Maths CS-MS
No ratings yet
Class XII CS FP3 Exam Question Paper Set 4 Maths CS-MS
10 pages
5 WEEK Python Programs
No ratings yet
5 WEEK Python Programs
20 pages
PPS micro
No ratings yet
PPS micro
3 pages
computer science programs
No ratings yet
computer science programs
13 pages
nRQgi8EgDUNFS451K4xQXA
No ratings yet
nRQgi8EgDUNFS451K4xQXA
61 pages
CheatSheet PA2
100% (1)
CheatSheet PA2
2 pages
XII-CS (Ch. 2 ASSIGNMENT-2) - Soln
No ratings yet
XII-CS (Ch. 2 ASSIGNMENT-2) - Soln
8 pages
Python Programming_223601105_12_04_2022
No ratings yet
Python Programming_223601105_12_04_2022
4 pages
a2
No ratings yet
a2
3 pages
12 Comp
No ratings yet
12 Comp
7 pages
SET I MS
No ratings yet
SET I MS
10 pages
Python CheatSheet horizontal
No ratings yet
Python CheatSheet horizontal
2 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
PYTHONa 7
No ratings yet
PYTHONa 7
15 pages
XII IP Practical List 2023-24
No ratings yet
XII IP Practical List 2023-24
4 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Scenario Based Python Questions-Unit 1
No ratings yet
Scenario Based Python Questions-Unit 1
30 pages
2020-21 XIIInfo - Pract.S.E.155
No ratings yet
2020-21 XIIInfo - Pract.S.E.155
11 pages
Python
No ratings yet
Python
5 pages
Class Xii (Informatics Practices) Half Yearly QP & Ms Ernakulam Region
No ratings yet
Class Xii (Informatics Practices) Half Yearly QP & Ms Ernakulam Region
5 pages
Pp & Dsa Journal
No ratings yet
Pp & Dsa Journal
69 pages
Python Interview Questions 1653100147
No ratings yet
Python Interview Questions 1653100147
24 pages
De Interview Raamashaamy Qna Bank
No ratings yet
De Interview Raamashaamy Qna Bank
11 pages
Full Test-1
No ratings yet
Full Test-1
9 pages
Python For Data Science
100% (1)
Python For Data Science
4 pages
Chap007 Finding and Using Negotiation Po
No ratings yet
Chap007 Finding and Using Negotiation Po
6 pages
DQE January 2001: Additional Information
No ratings yet
DQE January 2001: Additional Information
12 pages
The Historical Research: Theory, Methodology and Historiography
No ratings yet
The Historical Research: Theory, Methodology and Historiography
2 pages
GR 12 LFSC TERM 1 PRATICAL TASK_040921
No ratings yet
GR 12 LFSC TERM 1 PRATICAL TASK_040921
6 pages
OceanofPDF - Com Tarot Journaling Using The Celtic Cross T - Corrine Kenner
No ratings yet
OceanofPDF - Com Tarot Journaling Using The Celtic Cross T - Corrine Kenner
249 pages
The Complete Verilog Book
No ratings yet
The Complete Verilog Book
1 page
(For Student College Physics Gen - Zoology) (For RT Major Subject)
No ratings yet
(For Student College Physics Gen - Zoology) (For RT Major Subject)
1 page
Question Bank Elec &recorders Level 2 (15-II)
No ratings yet
Question Bank Elec &recorders Level 2 (15-II)
29 pages
SNPP2019 Eng
No ratings yet
SNPP2019 Eng
16 pages
For Defense
No ratings yet
For Defense
15 pages
IV-6 Assigmt - Module 5 Abuse and Neglect - GROUP 2
No ratings yet
IV-6 Assigmt - Module 5 Abuse and Neglect - GROUP 2
10 pages
LFR Local Flight Rules Rev 3
No ratings yet
LFR Local Flight Rules Rev 3
97 pages
Protecting Victims of Violent Patients While Protecting Confidentiality
No ratings yet
Protecting Victims of Violent Patients While Protecting Confidentiality
7 pages
Handiman Services - Corporate Profile 2023
No ratings yet
Handiman Services - Corporate Profile 2023
23 pages
Maximum Overpull Worksheet: Well:Grenadier EDC - 90
No ratings yet
Maximum Overpull Worksheet: Well:Grenadier EDC - 90
1 page
VGP Method Statement1
No ratings yet
VGP Method Statement1
13 pages
Power One Aurora Station
No ratings yet
Power One Aurora Station
4 pages
74 HC 74
No ratings yet
74 HC 74
6 pages
PHYLUM PORIFERA-WPS Office
No ratings yet
PHYLUM PORIFERA-WPS Office
3 pages
S1.0 Witzel Page 1-11
No ratings yet
S1.0 Witzel Page 1-11
6 pages
Reflected Ceiling Plan For Brain Laboratory in NYC
No ratings yet
Reflected Ceiling Plan For Brain Laboratory in NYC
1 page
Assignment - Brewer Career Research Paper
No ratings yet
Assignment - Brewer Career Research Paper
2 pages
Representing Emotions New Connections In The Histories Of Art Music And Medicine 1 Helen Hills download
100% (1)
Representing Emotions New Connections In The Histories Of Art Music And Medicine 1 Helen Hills download
89 pages
Mock Test Maths
No ratings yet
Mock Test Maths
3 pages
Generac Protector Series - Brochure
100% (1)
Generac Protector Series - Brochure
4 pages
Antologie - The Science of Fractal Images
No ratings yet
Antologie - The Science of Fractal Images
197 pages
Identification: Group 6 - Reack
No ratings yet
Identification: Group 6 - Reack
8 pages
Exercise 4
No ratings yet
Exercise 4
3 pages

Pyspark and python preparation notes

Uploaded by

Pyspark and python preparation notes

Uploaded by

PYTHON CODING

1. Function Definition: The function find_most_occuring(my_string) accepts a string

1. How does your code handle uppercase and lowercase letters?

1. Why did you use min with key=paints.get?

Spark-based data processing tasks

Coding interview involves PySpark and DataFrames/ PySpark’s architecture, data

You might also like