0% found this document useful (0 votes)
19 views36 pages

Report Zazmic Inc. Senior Middle Data Engineer Hiring Test AWS Snowflake Databricks Python SQL Kalgaonkarsiddhesh

Siddhesh Kalgaonkar scored 75.2% on the Zazmic Inc. Senior/Middle+ Data Engineer Hiring Test, demonstrating proficiency in SQL and AWS, but showing weaknesses in Databricks and Python. The test included various questions on data engineering concepts, with suspicious activity detected in coding patterns and code similarity. The report details his performance across different skills and specific questions answered correctly or incorrectly.

Uploaded by

sidkala1992
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views36 pages

Report Zazmic Inc. Senior Middle Data Engineer Hiring Test AWS Snowflake Databricks Python SQL Kalgaonkarsiddhesh

Siddhesh Kalgaonkar scored 75.2% on the Zazmic Inc. Senior/Middle+ Data Engineer Hiring Test, demonstrating proficiency in SQL and AWS, but showing weaknesses in Databricks and Python. The test included various questions on data engineering concepts, with suspicious activity detected in coding patterns and code similarity. The report details his performance across different skills and specific questions answered correctly or incorrectly.

Uploaded by

sidkala1992
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Siddhesh Kalgaonkar

75.2 Siddhesh Kalgaonkar PDF generated at: 13 May 2024 08:18:27 UTC

Data Engineer View this report on HackerRank

Score
75.2% • 225 / 300
scored in Zazmic Inc. Senior/Middle+ Data Engineer Hiring Test (AWS, Snowflake, Databricks, Python, SQL) in 40 min 2 sec
on 13 May 2024 10:36:26 EEST

Candidate Information
Email [email protected]

Test Zazmic Inc. Senior/Middle+ Data Engineer Hiring Test (AWS, Snowflake, Databricks, Python, SQL)

Candidate Packet View

Taken on 13 May 2024 10:36:26 EEST

Time taken 40 min 2 sec/ 106 min

Work Experience > 5 years

Invited by Stanislav

Invited on 12 May 2024 13:49:29 EEST

Suspicious Activity detected


Coding patterns, Code similarity

Coding patterns Code similarity


2 questions 1 question

Skill Distribution

No. Skill Score

Candidate Report Page 1 of 36


Siddhesh Kalgaonkar

SQL
1 100%
Basic

Databricks
2 0%
Basic

3 Snowflake 50%

Databricks
4 50%
Intermediate

AWS
5 100%
Basic

AWS
6 0%
Advanced

Python
7 10%
Basic

Python
8 95%
Intermediate

SQL
9 100%
Intermediate

Tags Distribution
SQL 100% Easy 25%

Databricks 33% Databricks Workflow 0%

Data Engineering 50% Data Visualization 0%

Data Analysis 0% Performance Tuning 100%

Azure Data Factory 100% Azure Datalake Storage 100%

Medium 93% Data Warehouse Database 50%

Candidate Report Page 2 of 36


Siddhesh Kalgaonkar

AWS 50% Hard 50%

Amazon S3 0% Querying data 0%

Spark 0% Jobs 0%

Azure Datalake 0% Azure Workflow 0%

Python 62% Strings 63%

OOPS 100% Exception Handling 100%

Collections 100% Simple Joins 100%

Interviewer Guidelines 100% Database 100%

Hash Map 100% Sub-Queries 100%

Union 100%

Questions

Time
Status No. Question Skill Score
Taken

SQL Order of Operations 1 min


1 SQL (Basic) 5/5
Multiple Choice 1 sec

Databricks Workflow 53 Databricks


2 0/5
Multiple Choice sec (Basic)

In SQL, what's the difference between


23
3 an inner join versus an outer join? - 5/5
sec
Multiple Choice

Performance Tuning 21
4 Snowflake 5/5
Multiple Choice sec

Data Ingestion in Databricks 1 min Databricks


5 5/5
Multiple Choice 4 sec (Intermediate)

Candidate Report Page 3 of 36


Siddhesh Kalgaonkar

What data structure is often referred to


6 as "Last in, first out"? 6 sec - 5/5
Multiple Choice

1 min
Data Warehousing
7 18 Snowflake 0/5
Multiple Choice
sec

AWS IAM Identities 25


8 AWS (Basic) 5/5
Multiple Choice sec

At the command line, what command


sends the contents of a file to standard 10
9 - 5/5
out? sec
Multiple Choice

1 min
AWS: Optimize Cloud Storage
10 11 AWS (Advanced) 0/5
Multiple Choice
sec

What HTTP request method/methods


should not affect the state on the 20
11 - 5/5
server? sec
Multiple Choice

Data Warehouse Database - 2 2 min


12 Snowflake 5/5
Multiple Choice 2 sec

1 min
Querying Data
13 42 Snowflake 0/5
Multiple Choice
sec

Databricks Job Configuration 30 Databricks


14 0/5
Multiple Choice sec (Intermediate)

Which of the following statements are


19 Python +1
15 true about Python 3 vs Python 2? 1.67/5
sec (Basic)
Multiple Choice

Candidate Report Page 4 of 36


Siddhesh Kalgaonkar

3 min
Python: Fancy tuple Python
16 37 75/75
Coding (Intermediate)
sec

13
Stop Words (hackerrank) min
17 Python (Basic) 4/50
Coding 56
sec

3 min
Counting Votes (hackerrank) SQL
18 32 50/50
DbRank (Intermediate)
sec

6 min
Count of Blood Groups (hackerrank) SQL
19 42 50/50
DbRank (Intermediate)
sec

1. SQL Order of Operations Correct

Multiple Choice SQL Easy

Question description

Which is the correct order of execution of operations in SQL?

Candidate's Solution

Options: (Expected answer indicated with a tick)

SELECT, FROM, JOIN, WHERE, GROUP BY, HAVING, ORDER BY

SELECT, FROM, GROUP BY, WHERE, HAVING, ORDER BY

FROM, GROUP BY, WHERE, ORDER BY, HAVING, SELECT

Candidate Report Page 5 of 36


Siddhesh Kalgaonkar

SELECT, FROM, GROUP BY, HAVING, WHERE, ORDER BY

No comments.

2. Databricks Workflow Incorrect

Multiple Choice Databricks Databricks Workflow Data Engineering Data Visualization Data Analysis Easy

Question description

User A is working as a data engineer for a fintech startup. The compliance department has made a policy to hide
personal information in order to meet the GDPR (General Data Protection Regulation).

Which of the following approaches is most appropriate to ensure data privacy and security while processing sensitive
data in Databricks clusters in order to meet the compliance requirements?

Interviewer guidelines
Fine-grained access controls provide the ability to restrict access to sensitive data at a granular level. This ensures that
only authorized individuals or processes have access to the sensitive data within the Databricks environment. The
User A can then enforce data privacy and security by granting permissions only to the necessary personnel and
preventing unauthorized access to sensitive information.

Candidate's Solution

Options: (Expected answer indicated with a tick)

Enable encryption at rest for Databricks storage.

Use fine-grained access controls to restrict access to sensitive data.

Implement network isolation for Databricks clusters.

Candidate Report Page 6 of 36


Siddhesh Kalgaonkar

Enable audit logging for all Databricks activities.

No comments.

3. In SQL, what's the difference between an inner join versus an outer join? Correct

Multiple Choice

Question description

In SQL, what's the difference between an inner join versus an outer join?

Candidate's Solution

Options: (Expected answer indicated with a tick)

Inner join includes only rows present in either table but not present in both tables. Outer join

includes all rows from both tables.

Inner join includes only rows present in both tables. Outer join includes all rows present in either

table but not present in both tables.

Inner join includes only rows present in both tables. Outer join includes all rows from the first table
excluding the rows from the second table.

Inner join includes only rows present in both tables. Outer join includes all rows from both tables.

No comments.

Candidate Report Page 7 of 36


Siddhesh Kalgaonkar

4. Performance Tuning Correct

Multiple Choice Easy Performance Tuning

Question description

Which of the following Snowflake features can be used to boost the efficiency of complex queries with several joins and
subqueries?

Interviewer guidelines
Precomputed views, called materialised views, enable for quicker access to data by storing the results of a query in a
table-like format. By generating precomputed views of the data used in the queries, materialised views can be used to
enhance the efficiency of complex queries with several joins and subqueries. This lessens the requirement for
processing the same data again and enhances query performance.

Candidate's Solution

Options: (Expected answer indicated with a tick)

Snowpipe, which provides continuous data ingestion into a data warehouse

external functions, which allow users to run custom code outside of Snowflake and integrate the

results into their queries

data sharing, which allows for the sharing of data between different Snowflake accounts

materialized views, which create precomputed views of complex queries for faster access to data

No comments.

5. Data Ingestion in Databricks Correct

Multiple Choice Databricks Azure Data Factory Data Engineering Azure Datalake Storage Medium

Candidate Report Page 8 of 36


Siddhesh Kalgaonkar

Question description

A data engineer needs to ingest data in Databricks after every five minutes for the next sixty days. The engineer has
decided to use the advanced feature Auto Loader instead of COPY INTO.

What could be the reason for this?

Interviewer guidelines
Auto Loader is specifically designed for structured streaming workloads in Databricks. It is optimized to handle
continuous and incremental data ingestion, making it well-suited for scenarios where data needs to be ingested every
five minutes for a long duration. By leveraging structured streaming capabilities, Auto Loader processes the incoming
data in multiple batches, allowing for efficient loading and handling of re-uploaded data. This means that if a portion
of the data has already been ingested in a previous batch, Auto Loader will only process and load the new or modified
data, minimizing redundant processing and improving performance

Candidate's Solution

Options: (Expected answer indicated with a tick)

Autoloader is made for structured streaming. Also, it splits the processing into multiple batches

which helps to load a subset of re-uploaded a bit easier.

Autoloader is expensive for large data but provides better primitives around schema inference and

evolution.

Autoloader provides the additional layer of security which supersedes COPY INTO.

Auto Loader provides an additional protective layer, but COPY INTO does not.

No comments.

6. What data structure is often referred to as "Last in, first out"? Correct

Candidate Report Page 9 of 36


Siddhesh Kalgaonkar

Multiple Choice

Question description

What data structure is often referred to as "Last in, first out"?

Candidate's Solution

Options: (Expected answer indicated with a tick)

Stack

Queue

List

Set

No comments.

7. Data Warehousing Incorrect

Multiple Choice Data Warehouse Database Easy

Question description

A healthcare organization that has recently migrated its data warehouse to Snowflake wants to design the schema
for storing patient records including patient demographics, medical history, and medication orders.

Which of the following Snowflake schema design options is most suitable in this situation?

Interviewer guidelines

Candidate Report Page 10 of 36


Siddhesh Kalgaonkar

Snowflake does not require primary keys, although they are still a good idea for indexing and data structure. In this
instance, the patient id is the ideal primary key because it is an exclusive identifier for every patient. The ideal schema
design is Option 2, which uses the patient id as the primary key without adding any extra indexing or partitioning that
can slow down queries.

Candidate's Solution

Options: (Expected answer indicated with a tick)

<pre> <code class="language-sql">CREATE TABLE patients( patient_id INTEGER PRIMARY KEY,


first_name VARCHAR(50), last_name VARCHAR(50), dob DATE, sex VARCHAR(10), height FLOAT, weight

FLOAT, blood_type VARCHAR(10), smoker BOOLEAN, drinker BOOLEAN );</code></pre>

<pre> <code class="language-sql">CREATE TABLE patients( patient_id INTEGER, first_name

VARCHAR(50), last_name VARCHAR(50), dob DATE, sex VARCHAR(10), height FLOAT, weight FLOAT,
blood_type VARCHAR(10), smoker BOOLEAN, drinker BOOLEAN PRIMARY KEY (patient_id) );</code>

</pre>

<pre> <code class="language-sql">CREATE TABLE patients( patient_id INTEGER, first_name

VARCHAR(50), last_name VARCHAR(50), dob DATE, sex VARCHAR(10), height FLOAT, weight FLOAT,
blood_type VARCHAR(10), smoker BOOLEAN, drinker BOOLEAN PRIMARY KEY (patient_id, dob) );

</code></pre>

<pre> <code class="language-sql">CREATE TABLE patients( patient_id INTEGER, first_name

VARCHAR(50), last_name VARCHAR(50), dob DATE, sex VARCHAR(10), height FLOAT, weight FLOAT,
blood_type VARCHAR(10), smoker BOOLEAN, drinker BOOLEAN PRIMARY KEY (dob, patient_id) );

</code></pre>

No comments.

Candidate Report Page 11 of 36


Siddhesh Kalgaonkar

8. AWS IAM Identities Correct

Multiple Choice AWS Easy

Question description

This diagram shows how a typical user logs into AWS using IAM credentials and accesses various services using roles
and group permissions.

Candidate Report Page 12 of 36


Siddhesh Kalgaonkar

Candidate Report Page 13 of 36


Siddhesh Kalgaonkar

Which of the following is an identity in the AWS IAM?

Candidate's Solution

Options: (Expected answer indicated with a tick)

Groups

Roles

Users

All of these

No comments.

9. At the command line, what command sends the contents of a file to standard out? Correct

Multiple Choice

Question description

At the command line, what command sends the contents of a file to standard out?

Candidate's Solution

Options: (Expected answer indicated with a tick)

Candidate Report Page 14 of 36


Siddhesh Kalgaonkar

less

grep

cat

echo

No comments.

10. AWS: Optimize Cloud Storage Incorrect

Multiple Choice Hard AWS Amazon S3

Question description

A video streaming service aims to expand its presence globally. With a vast and growing volume of high-definition
videos and the demand for low-latency streaming, the company's IT team seeks to optimize its cloud storage solution.
They prioritize rapid content delivery, durability of data, cost-effectiveness, and the capability of analytics and machine
learning operations on this data. Which storage configuration best meets these requirements?

Interviewer guidelines
Implementing Amazon S3 Standard storage class ensures quick access to video content. Integrating with Amazon
Redshift Spectrum offers efficient analytics directly on the S3 data, making it suitable for future machine learning
operations. While in Option 1, the use of Kinesis Video Streams focuses more on real-time video analytics. While
analytics is part of the requirements, the scenario also mentions the need for future machine learning operations on
the stored data

Candidate's Solution

Options: (Expected answer indicated with a tick)

Candidate Report Page 15 of 36


Siddhesh Kalgaonkar

Use Amazon S3 Standard storage class for the video content. Implement Amazon Kinesis Video

Streams for real-time video analytics.

Store video content in Amazon EFS (Elastic File System) with provisioned throughput mode. Use AWS

Lambda for serverless processing and analysis.

Implement Amazon S3 Standard storage class for quick access and integrate with Amazon Redshift
Spectrum for analytics and machine learning operations.

Use Amazon S3 Standard storage class with Amazon CloudFront for low-latency content delivery.

Integrate with Amazon Athena for querying stored data.

No comments.

11. What HTTP request method/methods should not affect the state on the server? Correct

Multiple Choice

Question description

What HTTP request method/methods should not affect the state on the server?

Candidate's Solution

Options: (Expected answer indicated with a tick)

GET

POST

Candidate Report Page 16 of 36


Siddhesh Kalgaonkar

PUT

PATCH

No comments.

12. Data Warehouse Database - 2 Correct

Multiple Choice Hard Data Warehouse Database

Question description

A data analysis project involves two tables: inventory and orders. The inventory table has the following columns.

Table1: inventory

item_no quantity name

2471 43 Television

3478 54 Refrigerator

7265 89 Laptop

6370 20 e-reader

The orders table has the following columns: order_id, customer_id, order_item, and others.

Table2: orders

order_id cust_id order_item

1972 101 Television

3586 107 Refrigerator

2914 115 Grocery

Candidate Report Page 17 of 36


Siddhesh Kalgaonkar

5760 222 Drill machine

Query: with recursive orderdb as (select i.item_no,i.quantity,o.cust_id, o.order_id from inventory i join orders o on
i.name=o.order_item) select * from orderdb where quantity=(select max(quantity) from orderdb)

What is the output of this query?

Interviewer guidelines
The query starts by creating a recursive common table expression (CTE) named "orderdb". The CTE uses a JOIN
operation to combine the 'inventory' and 'orders' tables, and it returns the item number, quantity, customer_id, and
order_id for each order item.

Candidate's Solution

Options: (Expected answer indicated with a tick)

<p>&nbsp;</p> <table border="1" cellpadding="1" cellspacing="1" style="width:500px"> <thead> <tr>


<th scope="col">item_no</th> <th scope="col">quantity</th> <th scope="col">cust_id</th> <th

scope="col">order_id</th> </tr> </thead> <tbody> <tr> <td>2471</td> <td>43</td> <td>101</td>


<td>1972</td> </tr> </tbody> </table> <p>&nbsp;</p>

<p>&nbsp;</p> <table border="1" cellpadding="1" cellspacing="1" style="width:500px"> <thead> <tr>


<th scope="col">item_no</th> <th scope="col">quantity</th> <th scope="col">cust_id</th> <th

scope="col">order_id</th> </tr> </thead> <tbody> <tr> <td>3478</td> <td>54</td> <td>107</td>


<td>3586</td> </tr> </tbody> </table>

<p>&nbsp;</p> <table border="1" cellpadding="1" cellspacing="1" style="width:500px"> <thead> <tr>


<th scope="col">item_no</th> <th scope="col">quantity</th> <th scope="col">cust_id</th> <th

scope="col">order_id</th> <th scope="col">max</th> </tr> </thead> <tbody> <tr> <td>3478</td>


<td>54</td> <td>107</td> <td>3586</td> <td>89</td> </tr> </tbody> </table> <p>&nbsp;</p>

<p>&nbsp;</p> <table border="1" cellpadding="1" cellspacing="1" style="width:500px"> <thead> <tr>


<th scope="col">item_no</th> <th scope="col">quantity</th> <th scope="col">cust_id</th> <th

Candidate Report Page 18 of 36


Siddhesh Kalgaonkar

scope="col">order_id</th> <th scope="col">max</th> </tr> </thead> <tbody> <tr> <td>2471</td>

<td>43</td> <td>101</td> <td>1972</td> <td>54</td> </tr> <tr> <td>3478</td> <td>54</td>


<td>107</td> <td>3586</td> <td>54</td> </tr> </tbody> </table> <p>&nbsp;</p>

No comments.

13. Querying Data Incorrect

Multiple Choice Medium Querying data

Question description

A retail company using Snowflake for data warehousing wants to analyze the sales performance of its different
product categories across multiple stores.

Which of the following SQL queries is the most efficient and accurate way to retrieve the total sales revenue for each
product category across all stores?

Interviewer guidelines
First, selects all distinct store IDs and then retrieve the total sales revenue for each product category across those
stores. This avoids grouping by individual store IDs, which could lead to duplicate calculations.

Candidate's Solution

Options: (Expected answer indicated with a tick)

<p>SELECT SUM(sales_revenue) FROM sales_data GROUP BY product_category, store_id;</p>

<p>&nbsp;SELECT SUM(sales_revenue) FROM sales_data GROUP BY product_category;&nbsp;</p>

<p>SELECT SUM(sales_revenue) FROM sales_data WHERE store_id = &#39;all&#39; GROUP BY

product_category;</p>

Candidate Report Page 19 of 36


Siddhesh Kalgaonkar

<p>SELECT SUM(sales_revenue) FROM sales_data WHERE store_id IS NULL GROUP BY

product_category;</p>

No comments.

14. Databricks Job Configuration Incorrect

Multiple Choice Spark Databricks Jobs Azure Datalake Azure Workflow Medium

Question description

User A is responsible for managing and monitoring 12 production jobs that transfer data to Azure Data Lake (ADL)
which are utilized by the data science team at a large e-commerce store. One day, User A noticed that Job 6 took much
longer than expected, resulting in delays for subsequent jobs and disrupting the team's workflow.

What should User A do to prevent such issues from occurring in the future?

Interviewer guidelines
In order to avoid delays in subsequent jobs due to slow or stalled jobs, User A should create new shared or task-
scoped clusters. These clusters ensure that each job runs in a completely isolated environment, preventing resource
contention and allowing jobs to run independently. Option 1, terminating the current job and manually running the
next job, is not a scalable solution and may result in unnecessary downtime. Option 2, including data quality checks
before each job, is important but does not directly address the issue of job delays. Option 4, increasing the number of
worker nodes, may be helpful but is not as effective as creating new clusters for each job. Therefore, the correct
answer is option 3.

Candidate's Solution

Options: (Expected answer indicated with a tick)

Immediately terminate the current job and run the next job manually.

Include data quality checks before each job to prevent such problems.

Candidate Report Page 20 of 36


Siddhesh Kalgaonkar

Create new shared or task-scoped clusters to ensure each job runs in an isolated environment.

Increase the number of worker nodes to address these problems.

No comments.

15. Which of the following statements are true about Python 3 vs Python 2? Partially correct

Multiple Choice Python Medium

Question description

Which of the following statements are true about Python 3 vs Python 2? (More than one)

Candidate's Solution

Options: (Expected answer indicated with a tick)

Print is now a function and not a statement

Math library is imported by default

All strings are now Unicode

Division of integers returns a float

No comments.

Candidate Report Page 21 of 36


Siddhesh Kalgaonkar

16. Python: Fancy tuple Correct

Coding Python Strings Medium OOPS Exception Handling Collections

Question description

Implement the class FancyTuple.


The constructor takes 0 to 5 parameters.
The elements of FancyTuple can be accessed as named properties: first , second , third , fourth , and fifth . The
expression FancyTuple("dog", "cat").first returns "dog" and FancyTuple("dog", "cat").second returns "cat".
An AttributeError exception is raised if a nonexistent element of the tuple is accessed. The expression
FancyTuple("dog", "cat").third raises an AttributeError exception.
len(FancyTuple) returns the number of elements. len(FancyTuple("dog", "cat")) returns 2.

Your implementations of the class will be tested by a provided code stub on several input files. Each input file
contains parameters to test your implementation with. First, the provided code stub initializes the instance of
FancyTuple. Next, it tests the implementation by accessing its elements and checking its length. The result of their
executions will be printed to the standard output by the provided code.

INPUT FORMAT FORMAT FOR CUSTOM TESTING

Input from stdin will be processed as follows and passed to the function.

The first line has the integer n, the number of elements in the tuple.
The next n lines contain the elements of the tuple.
The following line has the integer q, the number of operations to be performed on a FancyTuple instance.
The next q lines contain the standalone operations to be performed on the FancyTuple.

SAMPLE CASE 0

Sample Input 0

STDIN Function
----- --------
3 → n=3
dog → first item
cat → second item
mouse → third item
6 → q=6
first → first function...
second
third

Candidate Report Page 22 of 36


Siddhesh Kalgaonkar

fourth
fifth
len

Sample Output 0

dog
cat
mouse
AttributeError
AttributeError
3

Explanation 0
The code initializes t = FancyTuple("dog", "cat", "mouse"). Then, there are 6 operations to be performed:
1. t.first returns "dog" because "dog" is the first element of the tuple.
2. t.second returns "cat" because "cat" is the second element of the tuple.
3. t.third returns "mouse" because "mouse" is the third element of the tuple.
4. t.fourth raises the AttributeError exception because the tuple has 3 elements.
5. t.fifth raises the AttributeError exception because the tuple has 3 elements.
6. len(t) returns 3 because the tuple has 3 elements.

Interviewer guidelines
Setter's solution:

from collections import namedtuple

class FancyTuple():

def __init__(self, *items):


Tuples = namedtuple("tuples", ["first", "second", "third", "fourth", "fifth"][:len(items)])
self.values = Tuples(*items)

@property
def first(self):
return str(self.values.first)

@property
def second(self):
return self.values.second

@property

Candidate Report Page 23 of 36


Siddhesh Kalgaonkar

def third(self):
return self.values.third

@property
def fourth(self):
return self.values.fourth

@property
def fifth(self):
return self.values.fifth

def __len__(self):
return len(self.values)

Candidate's Solution Language used: Python 3

1 #!/bin/python3
2
3 import math
4 import os
5 import random
6 import re
7 import sys
8
9
10
11 class FancyTuple:
12 def __init__(self, *args):
13 self._args = args
14
15 def __getattr__(self, name):
16 if name in ("first", "second", "third", "fourth", "fifth"):
17 try:
18 return self._args[{"first": 0, "second": 1, "third": 2, "fourth": 3,
"fifth": 4}[name]]
19 except IndexError:
20 raise AttributeError(name)
21 raise AttributeError(name)
22
23 def __len__(self):
24 return len(self._args)
25
26
27 if __name__ == '__main__':
28 fptr = open(os.environ['OUTPUT_PATH'], 'w')
29

Candidate Report Page 24 of 36


Siddhesh Kalgaonkar

30 n = int(input())
31 items = [input() for _ in range(n)]
32
33 t = FancyTuple(*items)
34
35 q = int(input())
36 for _ in range(q):
37 command = input()
38 if command == "len":
39 fptr.write(str(len(t)) + "\n")
40 else:
41 try:
42 elem = getattr(t, command)
43 except AttributeError:
44 fptr.write("AttributeError\n")
45 else:
46 fptr.write(elem + "\n")
47 fptr.close()

TESTCASE DIFFICULTY TYPE STATUS SCORE TIME TAKEN MEMORY USED

TestCase 0 Easy Sample Success 1 0.02 sec 10.3 KB

TestCase 1 Easy Sample Success 1 0.0226 sec 10.4 KB

TestCase 2 Easy Sample Success 1 0.018 sec 10.3 KB

TestCase 3 Easy Hidden Success 3 0.0193 sec 10.4 KB

TestCase 4 Easy Hidden Success 4 0.0235 sec 10.4 KB

TestCase 5 Easy Hidden Success 4 0.0205 sec 10.3 KB

TestCase 6 Easy Hidden Success 4 0.0255 sec 10.3 KB

TestCase 7 Easy Hidden Success 4 0.0203 sec 10.3 KB

TestCase 8 Easy Hidden Success 4 0.019 sec 10.4 KB

Candidate Report Page 25 of 36


Siddhesh Kalgaonkar

TestCase 9 Easy Hidden Success 4 0.0209 sec 10.3 KB

TestCase 10 Easy Hidden Success 9 0.023 sec 10.4 KB

TestCase 11 Easy Hidden Success 9 0.0195 sec 10.4 KB

TestCase 12 Easy Hidden Success 9 0.0235 sec 10.4 KB

TestCase 13 Easy Hidden Success 9 0.0216 sec 10.5 KB

TestCase 14 Easy Hidden Success 9 0.0219 sec 10.4 KB

No comments.

17. Stop Words (hackerrank) Partially correct

Coding Python Strings Easy

Question description

In NLP, stop words are commonly used words like "a", "is", and "the". They are typically filtered out during processing.

Implement a function that takes a string text and an integer k, and returns the list of words that occur in the text at
least k times. The words must be returned in the order of their first occurrence in the text.

Example
text = "a mouse is smaller than a dog but a dog is stronger"
k=2

The list of stop words that occur at least k = 2 times is ["a", "is", "dog"]. "a" occurs 3 times, "is" and "dog" both occur 2
times. No other word occurs at least 2 times. The answer is in order of first appearance in text.

Function Description
Complete the function stop_words in the editor below.

Candidate Report Page 26 of 36


Siddhesh Kalgaonkar

stop_words has the following parameter(s):


string text: the input text.
int k: the threshold occurrence count for a word to be a stop word

Returns
string[]: the list of stop words in order of their first occurrence

Constraints
text has at most 50000 characters.
Every character in text is either an English lowercase letter or a space.
text starts and ends with a letter. No two consecutive characters are spaces, i.e. text is a valid sentence.
There will be at least one stop word in the text.
1 ≤ k ≤ 18

INPUT FORMAT FORMAT FOR CUSTOM TESTING

Input from stdin will be processed as follows and passed to the function.

The first line contains the string text.


The second line contains the integer k.

SAMPLE CASE 0

Sample Input 0

STDIN Function
----- --------
the brown fox jumps over the brown dog and runs away to a brown house → text
2 →k

Sample Output 0

the
brown

Explanation 0
"the" occurs 2 times and "brown" occurs 3 times. These words are returned in the order of their first occurrence in
the text.

Candidate Report Page 27 of 36


Siddhesh Kalgaonkar

SAMPLE CASE 1

Sample Input 1

foo bar foo baz foo


3

Sample Output 1

foo

Explanation 1
"foo" occurs 3 times.

Interviewer guidelines
Setter's solution:

def stop_words(text, k):


cnt = {}
for i, w in enumerate(text.split(" ")):
if w not in cnt:
cnt[w] = [0, i]
cnt[w][0] += 1
stop_words = [(i, w) for w, (c, i) in cnt.items() if c >= k]
res = [w for _, w in sorted(stop_words)]
return res

Candidate's Solution Language used: Python 3

1 #!/bin/python3
2
3 import math
4 import os
5 import random
6 import re
7 import sys
8
9
10 #
11 # Complete the 'stop_words' function below.

Candidate Report Page 28 of 36


Siddhesh Kalgaonkar

12 #
13 # The function is expected to return a STRING_ARRAY.
14 # The function accepts following parameters:
15 # 1. STRING text
16 # 2. INTEGER k
17 #
18
19 def stopWords(text, k):
20 words = text.lower().split()
21 word_counts = {}
22 for i, word in enumerate(words):
23 if i > 0 and words[i-1] == word:
24 word_counts[words[i-1]] += 1
25 else:
26 word_counts[word] = 1
27
28 frequent_words = []
29
30 for word, count in word_counts.items():
31 if count >= k:
32 frequent_words.append(word)
33
34 return frequent_words
35
36 if __name__ == '__main__':
37 fptr = open(os.environ['OUTPUT_PATH'], 'w')
38
39 text = input()
40
41 k = int(input().strip())
42
43 result = stopWords(text, k)
44
45 fptr.write('\n'.join(result))
46 fptr.write('\n')
47
48 fptr.close()
49

TESTCASE DIFFICULTY TYPE STATUS SCORE TIME TAKEN MEMORY USED

TestCase 0 Easy Sample Wrong Answer 0 0.0215 sec 10.3 KB

TestCase 1 Easy Sample Wrong Answer 0 0.0192 sec 10.4 KB

Candidate Report Page 29 of 36


Siddhesh Kalgaonkar

TestCase 2 Easy Sample Success 1 0.0201 sec 10.4 KB

TestCase 3 Easy Hidden Success 3 0.0235 sec 10.9 KB

TestCase 4 Easy Hidden Wrong Answer 0 0.0212 sec 10.9 KB

TestCase 5 Easy Hidden Wrong Answer 0 0.0208 sec 11 KB

TestCase 6 Easy Hidden Wrong Answer 0 0.0212 sec 11.1 KB

TestCase 7 Easy Hidden Wrong Answer 0 0.0218 sec 11 KB

TestCase 8 Easy Hidden Wrong Answer 0 0.0204 sec 10.9 KB

TestCase 9 Easy Hidden Wrong Answer 0 0.0208 sec 11 KB

TestCase 10 Easy Hidden Wrong Answer 0 0.024 sec 10.9 KB

TestCase 11 Easy Hidden Wrong Answer 0 0.0214 sec 11 KB

TestCase 12 Easy Hidden Wrong Answer 0 0.0206 sec 10.9 KB

TestCase 13 Easy Hidden Wrong Answer 0 0.0206 sec 11 KB

TestCase 14 Easy Hidden Wrong Answer 0 0.0221 sec 11 KB

No comments.

18. Counting Votes (hackerrank) Correct

DbRank Medium Simple Joins Interviewer Guidelines SQL Database Hash Map Sub-Queries

Candidate Report Page 30 of 36


Siddhesh Kalgaonkar

Question description

Given a database of a votes won by different candidates in an election, find the number of votes won by female
candidates whose age is less than 50.

SCHEMA

There are 2 tables: Candidates and Results.

Candidates

Name Type Description

id INTEGER It is the primary key.

gender STRING The gender of the candidate.

age INTEGER Age of the candidate.

party STRING The party to which the candidate belongs.

Results

Name Type Description

constituency_id INTEGER It is the constituency to which the candidate is contesting from.

candidate_id INTEGER It is the primary key.

votes INTEGER The number of votes won by the candidate.

SAMPLE DATA TABLES

Candidates

id gender age party

1 M 55 Democratic

2 M 51 Democratic

3 F 49 Democratic

4 M 60 Republic

Candidate Report Page 31 of 36


Siddhesh Kalgaonkar

5 F 61 Republic

6 F 48 Republic

Results

constituency_id candidate_id votes

1 1 847529

1 4 283409

2 2 293841

2 5 394385

3 3 429084

3 6 303890

Expected Output:
732974

Explanation:

There are three female candidates contesting the election. Two of them are less than 50 years old. The sum of
their votes is 429084 + 303890 = 732974.

Interviewer guidelines

SOLUTION

MySQL solution

SELECT SUM(votes)
FROM (SELECT id
FROM candidates
WHERE gender = 'F' AND age <= 50) AS r
JOIN results res
ON r.id = res.candidate_id;

Candidate's Solution Language used: MySQL

1 /*

Candidate Report Page 32 of 36


Siddhesh Kalgaonkar

2 Enter your query below.


3 Please append a semicolon ";" at the end of the query
4 */
5
6 select sum(r.votes) from candidates c join results r on c.id=r.candidate_id where
gender = 'F' and age < 50;

Time taken: 0.02 sec

No comments.

19. Count of Blood Groups (hackerrank) Correct

DbRank Medium Interviewer Guidelines SQL Database Hash Map Sub-Queries Union

Question description

A blood bank maintains two tables - DONOR, with information about the people who are willing to donate blood and
ACCEPTOR, with information about the people who are in need of blood. The bank wants to know the number of
males and the number of females with a particular blood group.

In the output, each row must contain the following attributes:


1. Gender ( GENDER ).
2. Blood Group ( BG ).
3. Number of people with that gender and that blood group.

The schema of the two tables is given below:

TABLE SCHEMA

'
DONOR

Name Type Description

DID Integer It is the id of the donor.

NAME String It is the name of the donor.

GENDER Character It is the gender of the donor.

Candidate Report Page 33 of 36


Siddhesh Kalgaonkar

CITY String It is the city where the donor lives.

BG String It is the blood group of the donor.

It is the amount of blood in pints,


AMOUNT Integer
which the donator can donate.

ACCEPTOR

Name Type Description

AID Integer It is the id of the acceptor.

NAME String It is the name of the acceptor.

GENDER Character It is the gender of the acceptor.

It is the city where the acceptor


CITY String
lives.

It is the blood group of the


BG String
acceptor.

It is the amount of blood in pints,


AMOUNT Integer
which the acceptor needs.

SAMPLE CASE 0

Sample Input For Custom Testing

DONOR

DID NAME GENDER CITY BG AMOUNT

1 MARIA F Warne, NH AB+ 7

2 RUBY F East Natchitoche, PA


AB+ 3

3 CHARLES M East Natchitoche, PAA- 6

4 DOROTHY F East Natchitoche, PA


AB+ 9

5 MICHAEL M Warne, NH A+ 1

ACCEPTOR

Candidate Report Page 34 of 36


Siddhesh Kalgaonkar

AID NAME GENDER CITY BG AMOUNT

1 LINDA F Warne, NH A+ 9

2 CHARLES M Warne, NH AB+ 8

3 RICHARD M East Natchitoche, PA


AB+ 3

4 LINDA F East Natchitoche, PAA+ 1

5 PATRICIA F Warne, NH A+ 5

Sample Output

F A+ 3
F AB+ 3
M A+ 1
M A- 1
M AB+ 2

Explanation
There are 3 females with A+ blood group.
Similary, there are 3 females with AB+ blood group. And so on.

Interviewer guidelines

SOLUTION

MySQL solution

SELECT t1.gender, t1.bg, SUM(t1.num)


FROM (
SELECT gender,
bg,
COUNT(*) num
FROM donor
GROUP BY 1,2
UNION
SELECT gender,
bg,
COUNT(*) num
FROM acceptor
GROUP BY 1,2) t1
GROUP BY 1, 2

Candidate Report Page 35 of 36


Siddhesh Kalgaonkar

Candidate's Solution Language used: MySQL

1 /*
2 Enter your query below.
3 Please append a semicolon ";" at the end of the query
4 */
5
6 select gender, bg, count(*) from (select gender,bg from donor
7 union ALL
8 select gender, bg from acceptor)a group by gender, bg;

Time taken: 0.02 sec

No comments.

Candidate Report Page 36 of 36

You might also like