Simulado Databricks

Uploaded by

Gregory Pierroti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

229 views25 pages

Simulado Databricks

Uploaded by

Gregory Pierroti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 25

& databricks Academy Practice Exam Databricks Certified Associate Developer for Apache Spark 3.0 - Python Overview This is a practice exam for the Databricks Certified Associate Developer for Apache Spark 3.0 - Python exam. The questions here are retired questions from the actual exam that are representative of the questions one will receive while taking the actual exam. After taking this practice exam, one should know what to expect while taking the actual Associate Developer for Apache Spark 3.0 - Python exam. Just like the actual exam, it contains 60 multiple-choice questions. Each of these questions has one correct answer. The correct answer for each question is listed at the bottom in the Correct Answers section. ‘There are a few more things to be aware of: 1. This practice exam is for the Python version of the actual exam, but it’s incredibly similar to the Scala version of the actual exam, as well. There is a practice exam for the Scala version, too. 2. There is a two-hour time limit to take the actual exam. 3. Inorder to pass the actual exam, testers will need to correctly answer at least 42 of the 60 questions. 4, During the actual exam, testers will be able to reference a PDF version of the Apache Spark documentation. Please use this version of the documentation while taking this practice exam. 5. During the actual exam, testers will not be able to test code in a Spark session. Please do not use a Spark session when taking this practice exam.6. These questions are representative of questions that are on the actual exam, but they are no longer on the actual exam. If you have more questions, please review the Databricks Academy Certification EA\ Once you've completed the practice exam, evaluate your score using the correct answers at the bottom of this document. If you're ready to take the exam, head to Databricks Academy to register. Exam Questions Question 1 Which of the following statements about the Spark driver is incorrect? ‘A. The Spark driver is the node in which the Spark application's main method runs to coordinate the Spark application. B. The Spark driver is horizontally scaled to increase overall processing throughput. C. The Spark driver contains the SparkContext object. D. The Spark driver is responsible for scheduling the execution of data by various worker nodes in cluster mode. E, The Spark driver should be as close as possible to worker nodes for optimal performance. Question 2 Which of the following describes nodes in cluster-mode Spark? ‘A. Nodes are the most granular level of execution in the Spark execution hierarchy. B. There is only one node and it hosts both the driver and executors C. Nodes are another term for executors, so they are processing engine instances for performing computations D. There are driver nodes and worker nodes, both of which can scale horizontally. E. Worker nodes are machines that host the executors responsible for the execution of tasks. Question 3 Which of the following statements about slots is true? There must be more slots than executors. There must be more tasks than slots. Slots are the most granular level of execution in the Spark execution hierarchy. Slots are not used in cluster mode. Slots are resources for parallelization within a Spark application. meop>Question 4 Which of the following is a combination of a block of data and a set of transformers that will run on a single executor? Executor Node Job Task Slot msom> Question 5 Which of the following is a group of tasks that can be executed in parallel to compute the same set of operations on potentially multiple machines? Job Slot Executor Task Stage meopP Question 6 Which of the following describes a shuttle? A shuffle is the process by which data is compared across partitions. A shuffle is the process by which data is compared across executors. A shuffle is the process by which partitions are allocated to tasks. AA shuffle is the process by which partitions are ordered for write. A shuffle is the process by which tasks are ordered for execution. moop> Question 7 DataFrame df is very large with a large number of partitions, more than there are executors in the cluster. Based on this situation, which of the following is incorrect? Assume there is one core per executor, A. Performance will be suboptimal because not all executors will be utilized at the same time. Performance will be suboptimal because not all data can be processed at the same time. ©. There will be a large number of shuffle connections performed on DataFrame df when operations inducing a shuffle are called.There will be a lot of overhead associated with managing resources for data processing within each task. E. There might be risk of out-of-memory errors depending on the size of the executors in the cluster. Question 8 Which of the following operations will trigger evaluation? A. DataFrane. filter () B. DataFrame.distinct () ©. DataFrane. intersect () D. DataFrame.join() E. DataFrame.count () Question 9 Which of the following describes the difference between transformations and actions? A Transformations work on DataFrames/Datasets while actions are reserved for native language objects. There is no difference between actions and transformations. Actions are business logic operations that do not induce execution while transformations are execution triggers focused on returning results. Actions work on DataFrames/Datasets while transformations are reserved for native language objects Transformations are business logic operations that do not induce execution while actions are execution triggers focused on returning results. Question 10 Which of the following DataFrame operations is always classified as a narrow transformation? A. DataFrame.sort () B. DataFrame.distinct () C. DataFrame.repartition() D. DataFrame. select () E, DataFrame.join() Question 11 Spark has a few different execution/deployment modes: cluster, client, and local. Which of the following describes Spark's execution/deployment mode?A. Spark's execution/deployment mode determines where the driver and executors are physically located when a Spark application is run B. Spark's execution/deployment mode determines which tasks are allocated to which executors ina cluster C. Spark's execution/deployment mode determines which node in a cluster of nodes is responsible for running the driver program D. Spark's execution/deployment mode determines exactly how many nodes the driver will connect to when a Spark application is run E. Spark's execution/deployment mode determines whether results are run interactively in a notebook environment or in batch Question 12 Which of the following cluster configurations will ensure the completion of a Spark application in light of a worker node failure? Seonato mt scoot ey) (La a) a 100 GB Worker Node 50 GB Worker Node — : 25.68 Wowar Neo a aa — SES ere ees eet arent | Torrert | ecar Toreg tf aur Tonge Sona Toree tf Eom 100 coe Cover Eee 0 Bu Eb Creer Ears 5) Gv ar Erno 12 53 2 a a er Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores. Scenario #1 ‘They should all ensure completion because worker nodes are fault-tolerant. Scenario #4 Scenario #5 Scenario #8 nmoow> Question 13 Which of the following describes out-of-memory errors in Spark? A. An out-of-memory error occurs when either the driver or an executor does not have enough memory to collect or process the data allocated to it.B. An out-of-memory error occurs when Spark's storage level is too lenient and allows data objects to be cached to both memory and disk, C. An out-of-memory error occurs when there are more tasks than are executors regardless of the number of worker nodes. D. An out-of-memory error occurs when the Spark application calls too many transformations ina row without calling an action regardless of the size of the data object on which the transformations are operating. E. An out-of-memory error occurs when too much data is allocated to the driver for ‘computational purposes, Question 14 Which of the following is the default storage level for persist () for anon-streaming DataFrame/Oataset? MEMORY_AND_DISK MEMORY_AND_DISK_SER DISK_ONLY MEMORY_ONLY_s moomeE (EMORY_ONLY Question 15 Which of the following describes a broadcast variable? A. Abroadcast variable is a Spark object that needs to be partitioned onto multiple worker nodes because it's too large to fit on a single worker node. B. Abroadcast variable can only be created by an explicit call to the broadcast() operation. C. Abroadcast variable is entirely cached on the driver node so it doesn't need to be present ‘on any worker nodes. D. Abroadcast variable is entirely cached on each worker node so it doesn't need to be shipped or shuffled between nodes with each stage. E. Abroadcast variable is saved to the disk of each worker node to be easily read into memory when needed. Question 16 Which of the following operations is most likely to induce a skew in the size of your datas partitions? A. Data ame..collect () ne() C. DataFrame. repartition (n) B. DataFrame.caD. DataFrame. coalesce (n) E. DataFrame.persist () Question 17 Which of the following data structures are Spark DataFrames built on top of? Arrays Strings RODs Vectors SOL Tables meopP Question 18 Which of the following code blocks returns a DataFrame containing only column storetd and column division from DataFrame storesDF? storesDF. select ("storetd") .select ("division") storesDF.select (storeld, division) storesDF.select ("storeId", "division") storesDF.select (col("storeId", "division")) moo@e storesDF, select (storeld) .select (division) Question 19 Which of the following code blocks returns a DataFrame containing all columns from DataFrame storesDF except for column sqft and column customerSatisfaction? ‘A sample of DataFrame storesDF is below: [StoreldjopenlopenDate division _|sqft_|numberOfEmployees|customerSatistaction| 0 rue _|fi100746394]Utah leat61|61 zat rue 44572255 |West Virginial18132[96 3.46 2 false 25495628 |West Virginial79520 [45 6.93 8 rue |[1397353092|Texas la775ilr8 la7.19 4 rue |{986505057 [Delaware 8148305 lp5.24 A. storesDF.drop("sqft", “customerSatisfaction")storesDF.select ("storeId", "open", "openDate", "division") storesDF. select ( col (sqft), -col(customerSatisfaction)) storesDF.drop(sqft, customers tisfaction) moo storesDF.drop (col (sqft), col (customerSatisfaction) ) Question 20 The below code shown block contains an error. The code block is intended to return a DataFrame containing only the rows from DataFrame storesDF where the value in DataFrame storesDF's ‘sqft’ column is less than or equal to 25,000. Assume DataFrame storesDF is the only defined language variable. Identity the error. Code block: storesDF.filter(sqft <= 25000) A. The column name sqft needs to be quoted like storesDF. filter ("sqft" 25000) B. The column name sqft needs to be quoted and wrapped in the col() function like storesDF. filter (col ("sqft") <= 25000). C. The sign in the logical condition inside #11tex () needs to be changed from <= to >. The sign in the logical condition inside £11ter () needs to be changed from <= to >=. The column name sqft needs to be wrapped in the col() function like storesDF.filter(col (sqft) <= 25000) Question 21 The code block shown below should return a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfactionis greater than or equal to 30. Choose the response that correctly fills in the numbered blanks within the code block to complete this task. Code block: storesDP. 1 (2 3 4) A 1, filter 2. (col("saft") <= 25000) 3.1 (col ("customerSatisfaction") >= 30)1. drop (col (sqft) <= 25000) I (col (customerSatisfaction) >= 30) 1. filter 2. col ("sqft") <= 25000 3.1 4, col("customerSatisfaction") >= 30 . filter = col ("sqft") <= 25000 or col ("customerSatisfaction") >= 30 . filter (col ("sqft") <= 25000) or (col ("customerSatisfaction") >= 30) Question 22 Which of the following operations can be used to convert a DataFrame column from one type to another type? col() .cast() convert () castas() col () .coerce() col() meo@pe Question 23 Which of the following code blocks returns a new DataFrame with a new column sqft100 that is ‘100th of column sqft in DataFrame storesDF? Note that column sqft100 is not in the original DataFrame storesDF. storesDF.withColumn("sqft100", col("sqft") * 100) storesDF.withColumn("sqft100", sqft / 100) storesDF.withColumn(col("sqft100"), col("sqft") / 100) storesDF.withColumn("sqft100", col("sqft") / 100) storesDF.newColumn("sqft100", saft / 100) moog,Question 24 Which of the following code blacks returns a new DataFrame from DataFrame storesDF where column numberOfManagers is the constant integer 1? moogpP Question 25 storesDF. storesDF. storesDF. storesDF. storesDF. withColumn ("numberOfManagers", withColumn ("numberOfManagers", withColumn ("numberOfManagers", ‘withColumn ("numberOfManagers", ‘withColumn ("numberofManagers", col (1)) 5) 2it(1)) 1it("1")) IntegerType(1)) The code block shown below contains an error. The code block intends to return a new DataFrame where column storeCategory from DataFrame storesDF is split at the underscore character into column storeValueCategory and column storesizeCategory. Identify the error. A sample of DataFrame storesDF is displayed below {storeld|open|openDate _||storeCategory b rue_|1100746394]VALUE_MEDIUM ft rue 944572255 |MAINSTREAM_SMALL| eB alse [925495628 |PREMIUM_LARGE B rue_|1397353092|VALUE_MEDIUM a rue_|986505057 |VALUE_LARGE 5 rue [955988614 |PREMIUM_LARGE Code block: (storesDF.withc: Lumn ( “storeValueCategory", col("storeCategory") split ("_") [0 ) »withColumn ( “storeSizeCategory", col ("storeCategory") .split("_") [1] ) A. The split () operation comes from the imported functions object. It accepts a string column name and split character as arguments. Itis not a method of a Column object.B. The split () operation comes from the imported functions object. It accepts a Column object and split character as arguments. It is not a method of a Column object. C. The index values of 0 and 1 should be provided as second arguments to the split () operation rather than indexing the result. D. The index values of 0 and 1 are not correct ~ they should be 1 and 2, respectively. E, ThewithColumn () operation cannot be called twice in a row. Question 26 Which of the following operations can be used to split an array column into an individual DataFrame row for each element in the array? extract () split() explode () arrays_zip() moompPr unpack () Question 27 Which of the following code blocks returns a new DataFrame where column storeCategory isan all-lowercase version of column storeCategory in DataFrame storesDF? Assume DataFrame storesDF is the only defined language variable. A. storesDF.withColumn ("storeCategory", Lower (col ("storeCategory"))) B, storesDF.withColumn ("sto col ("storeCategory") «lower ()) C. storesDF.withColumn ("storeCat gory", olower(col("storeCategory") }) D. storesDF.withColumn ("storeCategory", lower ("storeCategory")) E, storesDF.withColumn ("storeCategory", lower (storeCategory)) Question 28 The code block shown below contains an error. The code block is intended to return anew DataFrame where column division from DataFrame storesDF has been renamed to column state and column managerName from DataFrame storesDF has been renamed to column managerFullName. Identify the error. Code block:(storesD? -withColumnRenamed ( -withColumnRenamed ( state", "division") ullName", "managerName")) A. Both arguments to operation withColumnRenamed () should be wrapped in the col () operation. B. The operations withColumnRenamed() should not be called twice, and the first argument should be ["state", "division"] and the second argument should be ["managerFullName", "managerName"] C. The old columns need to be explicitly dropped. D. The first argument to operation witthColumnRenamed () should be the old column name and the second argument should be the new column name. E. The operation withColumnRenamed () should be replaced with withColumn () Question 29 Which of the following code blocks returns a DataFrame where rows in DataFrame storesDF containing missing values in every column have been dropped? A. storesDF.nadrop ("all") B. storesDF.na.drop("all", subset = "sqft") ©. storesbF.dropna() D. storesbF.na.drop() E, storesDF.na.drop ("all") Question 30 Which of the following operations fails to return a DataFrame where every row is unique? A. bataPrame.distinct () B. DataFrame.drop_duplicates (subset = None) ©. DataFrame.drop_duplicates () D. DataFrame.dropDuplicates () E. DataFrame.drop_duplicates(subset = "all") Question 31 Which of the following code blocks will not always return the exact number of distinct values in column division?storesDF.agg (approx_count_distinct (col ("division")) alias ("divi sionDistinct")) storesDF.agg (approx_count_distinct (col ("division"), 0) .alias("divisionDistinct")} storesDF.agg (countDistinct (col ("division")) alias inet")) storesDF. select ("division") .dropDuplicates() count () “divisionDist storesDF, select ("division") .distinct ().count ()Question 32 The code block shown below should return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Choose the response that correctly fills in the numbered blanks within the code block to complete this task. Code block: sqftMean")) 1. agg 2. mean 3. col ("sqft") B 1. mean wget" c. 2. 3. 1. agg 2. average 3. col ("sqft") Question 33 Which of the following code blocks returns the number of rows in DataFrame storesDF? storesDF.withColumn ("numberOfRows", count ()) storesDF.withColumn (count () .alias( Rows") ) storesDF.countDistinct () storesDF.count () moogP storesDF.agg (count ())Question 34 Which of the following code blocks returns the sum of the values in column sqft in DataFrame storesDF grouped by distinct value in column division? A. storesDF.groupBy.agg(sum(col ("sqft") )) B. storesDF. upBy ("division") .agg(sum()) C. storesDF.agg (groupBy ("division") .sum(col("sqft"))) D. storesDF. upby.agg (sum(col ("sqft"))) E. storesDF. upBy ("division") .agg(sum(col ("sqft"))) Question 35 Which of the following code blocks returns a DataFrame contai column sqft in DataFrame storesDF? jing summary statistics only for storesDF. summary ("mean") storesDF. describe ("sqft") storesDF. summary (col ("sqft") ) storesDF.describeColumn ( moopE storesDF. summary () Question 36 Which of the following operations can be used to sort the rows of a DataFrame? A. sort () andorderBy () B. orderby() c. ) and ordexby () D. orderBy() E, ) Question 37 The code block shown below contains an error. The code block is intended to return a 1 percent sample of rows from DataFrame storesDF without replacement. Identify the error. Code block: storesDF.sample(True, fraction = 0.15) ‘A. There is no argument specified to the seed parameter.D. E. There is no argument specified to the wi thReplacement parameter. The sample () operation does not sample without replacement — sampleby () should be used instead. The sample () operation is not reproducible. The first argument True sets the sampling to be with replacement. Question 38 Which of the following operations can be used to return the top n rows from a DataFrame? moogpP DataFrame.n() DataFrame. take (n) DataFrame.head Dat. ame. show (n) DataFrame.collect (n) Question 39 The code block shown below should extract the value for column sqft from the first row of DataFrame storesDF. Choose the response that correctly fills in the numbered blanks within the code block to complete this task. Code block: A. 1 2. 3 B, 1 2. 3 c. 1 2, 3, 0, 1 2. storesDF first 1 (eae) storesDF storesDF first ["saft” storesDF first ()3. sqft E. 1. storesDF 2. first () 3. col("saft") Question 40 Which of the following lines of code prints the schema of a DataFrame? A. print (storesDP) B. storesDF. schema C. print (storesDF. schema ()) D. Dataframe.printSchema() E. DataFrame.schema () Question 41 In what order should the below lines of code be run in order to create and register a SQL UDF named "ASSESS_PERFORMANCE" using the Python function assessPer formance and apply it to column customerSatistfaction in table stores? Lines of code: moomEr spark.ud£. register ("ASSESS_PERFORMANCE", assessPerformance) spark.sql("SELBCT customerSatisfaction, assessPerformance (customerSatisfaction) AS result FROM stores") spark.udf. register (assessPerformance, "ASSESS_PERFORMANCE") spark. sql ("SELECT customerSatisfaction, ASSESS_PERFORMANCE (customerSatisfaction) AS result FROM stores” 34 14 32 2 12 Question 42In what order should the below lines of code be run in order to create a Python UDF assessPerformanceUDF () using the integer-returning Python function assessPerformance and apply it to column customerSatisfaction in DataFrame storesDF? Lines of code: 1. assessPerformanceUDF = udf(assessPerformance, IntegerType) SSESS_PERFORMANCE", 2. assessPerformanceUDF = spark.register.udé (" assessPerformance) 3. assessPerformanceUDF = udf(assessPerformance, IntegerType()) 4, storesDF.withColumn ("result", assessPerformanceUDF (col ("customerSatisfaction"))) 5, storesDF.withColumn ("result", assessPerformance (col ("customerSatisfaction"))) 6 storesDF.withColumn ("result", ASSESS_PERFORMANCE (col ("customerSatisfaction"))) 34 2.6 35 14 2.5 meopP Question 43 Which of the following operations can execute a SOL. query ona table? spark.query () DataFrame.sql() spark.sql() DataFrame.createOrReplaceTempView () DataFrame.createTempView() moopP Question 44 Which of the following code blocks creates a single-column DataFrame from Python list years which is made up of integers? A. spark.createDataFrame({years], IntegerType()} B. spark.createDataFrame (years, IntegerType (}) ©. spark.DataFrame (years, IntegerType ())D. spark.createDataFrame (years) E. spark.createDataFrame (years, IntegerType) Question 45 Which of the following operations can be used to cache a DataFrame only in Spark’s memory assuming the default arguments can be updated? A ame .clearCache () B. ame. storageLevel C. storageLevel D. ame .persist () E. oO Question 46 The code block shown below contains an error. The code block is intended to return a new 4-partition DataFrame from the 8-partition DataFrame storesDF without inducing a shuffle. Identify the error. Code block: storesDF. repartition (4) The repartition operation will only work if the DataFrame has been cached to memory. B. The repartition operation requires @ column on which to partition rather than a number of partitions. C. The number of resulting partitions, 4, is not achievable for an 8-partition DataFrame. D. The repartition operation induced a full shuffle. The coalesce operation should be used instead E, The repartition operation cannot guarantee the number of result partitions. Question 47 Which of the following code blacks will always return a new 12-partition DataFrame from the 8-partition DataFrame storesDE? storesDF. coalesce (12) tion() storesDF. repartition (12) storesDF.repa storesDF.coalesce() moopE storesDF.coalesce (12, “storerd")Question 48 Which of the following Spark config properties represents the number of partitions used in wide transformations like join ()? moon > spark.sql.shuffle.partitions spark.shuffle.partitions spark. shuffle.io.maxRetries spark. shuffle. file.buffer spark.default.parallelism Question 49 In what order should the below lines of code be run in order to return a DataFrame containing a column openDateString, a string representation of Java's SimpleDateFormat? Note that column openDate is of type integer and represents a date in the UNIX epoch format — the number of seconds since midnight on January Ist, 1970. ‘An example of Java's SimpleDateFormat is "Sunday, Dec 4, 2008 1:05 PM". Asample of storesDF is displayed below: [storeld|openDate lo [1100746394] fi [1474410343] ke l1116610009 B [1180035265] lk f1408024¢97| Lines of code: storesDF.witnColumn ("openDatestring", from_unixtime (col ("openDate"), simpleDateFormat}) simpleDateFormat = "BEBE, MMM d, yyyy himm a" storesDF.withColumn ("openDateString", from_unixtime (col (YopenDate"), SimpleDateFormat ()}) storesDF,withColumn ("openDatestring", date_format (col ("openDate"), simpleDateFormat))moopp storesDF.withColumn ("openDatestring", date_format (col ("openDate"), SimpleDate™ simpleDateFormat = "wd, MMM d, yyyy h:mm 23 21 65 2.4 61 Question 50 Which of the following code blocks returns a DataFrame containing a column month, an integer representation of the month from column openDate from DataFrame storesDF? Note that column openDate is of type integer and represents a date in the UNIX epoch format — the number of seconds since midnight on January Ist, 1970. Asample of storesDF is displayed below: [storeld|openDate 1100746394} [1474410343] [1116610009 1180035265] 1408024997} b F b b b storesDF.withColumn (™ storesDP.withColumn ("openTimestamp", col ("openDate") .cast ("Timestamp") ) .withColumn ("month", month (col ("openTimestamp"})) onth", getMonth (col ("openDat: "yy) storesDF.withColumn ("openDateFormat", col (YopenDate") .cast ("Date") .withColumn ("month", month (col ("openDateFormat"))) storesDF.withColumn ("month", substr(col("openDate"), 4, 2)) storesDF.withColumn("month", month (col ("openDate") ))Question 51 Which of the following operations performs an inner join on two DataFrames? A. Data¥rane.innerJoin() B. Data¥rame.join() ©. Standalone join () function D. Data¥rame.merge () E. DataFrame.crossJoin() Question 52 Which of the following code blocks returns a new DataFrame that is the result of an outer join between DataFrame storesDF and DataFrame employeesDF on column storeld? A, storesDF.join(employeesDF, "storela", "outer") B, storesDF. join(employeesDF, "storela") ©. storesDF.join(employeesDF, "outer", col("storetd")) D. storesDF.join(employeesDF, "outer", storesDF.storeld ployeesDF. storeld) E. storesDF.merge(employeesDF, "outer", col ("storerd")) Question 53 The below code block contains an error. The code block is intended to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employeesDF on column storetd and column employeetd which are in both DataFrames. Identify the error. Code block: storesDF.join(employeesDF, [col ("stored"), col ("employeeta") 1) A. The join () operation is a standalone function rather than a method of DataFrame —the join () operation should be called where its first two arguments are storesDF and employeesDF. B. There must be a third argument to join () because the defaul "inner" C. The col ("storetd") and col ("employeeTa") arguments should not be separate elements of a list — they should be tested to see if they're equal to one another like col ("storeId") == col ("employeeta"). D. There isno DataFrame. join() operation -DataFrame merge () should be used the how parameter is not instead,E. The references to "storerd" and "employeeta” should not be inside the col () function — removing the co () function should result in a successful join. Question 54 Which of the following Spark properties is used to configure the broadcasting of a DataFrame without the use of the broadcast () operation? A. spark. sql.autoBroadcastJoinThreshold B. spark.sql.broadcastTimeo! ©. spark. cast .blockSize D. spark.broa compress E, spark.executor.memoryOverhead Question 55 The code block shown below should return a new DataFrame that is the result of a cross join between DataFrame storesDF and DataFrame employeesDF. Choose the response that correctly fills in the numbered blanks within the code block to complete this task. Code block: 1. storesDF 2. crossJoin 3. employeesDF, "storeTa" 8 1. storesDF 2. join 3. employeesDF, "cross" ©, 1. storespF crossJoin employeesPF, "stored" D 1. storesbE 2. join 3. employeesDF, “storeld", "cross"1. storesDF employeesDF Question 56 Which of the following operations performs a position-wise union on two DataFrames? The standalone coneat () function The standalone unionA11 () function The standalone union () function DataFrame. unionByName () DataFrame.union () moog@P Question 57 Which of the following code blocks writes DataFrame storesDF to file path filePath 2s parquet? A. storesDF.write.option ("parquet") .path (filePath) B. storesDF.write.path(filePath) C. storesDF.write () .parquet (FilePath) D. storesDF.write (filePath) E, storesDF.write.parquet (filePath) Question 58 The code block shown below contains an error. The code blocks intended to write DataFrame storesDF to file path £ilePath as parquet and partition by values in column division. Identify the error. Code block: storesDP.write.repa tition ("division") arquet (FilePath) A. The argument division to operation repartition () should be wrapped in the col () function to return a Column object. B. There is no parquet () operation for DataFrameWriter ~ the save () operation should be used instead. C. There isno repartition () operation for DataFrameWriter — the partitionBy () operation should be used instead.D. DataFrame write is an operation — it should be followed by parentheses to return a DataFrameWriter. E, Themode() operation must be called to specify that this write should not overwrite existing files. Question 59 Which of the following code blocks reads a parquet at the file path £i1ePath into a DataFrame? spark.read() .parquet (filePath) spark.read().path(filePath, source ~ "parquet") spark.read.path(filePath, source = "parquet") spark.read.parquet (filePath) spark. read() .path(filePath) moopP> Question 60 Which of the following code blocks reads JSON at the file path £11ePath into a DataFrame with the specified schema schema? spark. read() schema (schema) . format (json) . load (filePath) spark. read() schema (schema) . format ("json") .load(filePath) spark. read. schema ("schema") . format ("json") ,load(filePath) spark. read, schema ("schema") . format ("json") .load(filePath) spark. read. schema (schema) . format ("json") . load (filePath) mpopP Correct Answers

Pyspark Practice - Databricks
No ratings yet
Pyspark Practice - Databricks
66 pages
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
No ratings yet
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
96 pages
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
No ratings yet
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
6 pages
Databricks Data Engineer Associate Notes
No ratings yet
Databricks Data Engineer Associate Notes
5 pages
Snowpro-Core 7
No ratings yet
Snowpro-Core 7
37 pages
Ebook Accelerating Apache Spark 3
No ratings yet
Ebook Accelerating Apache Spark 3
108 pages
Py 1731703428
No ratings yet
Py 1731703428
8 pages
Databricks Certified Professional Data Engineer 1 1
No ratings yet
Databricks Certified Professional Data Engineer 1 1
16 pages
SQL - & - Pyspak
No ratings yet
SQL - & - Pyspak
6 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Databricks Final
100% (1)
Databricks Final
81 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
Cert DEWD (Edits)
No ratings yet
Cert DEWD (Edits)
158 pages
Data Engineering Databricks
No ratings yet
Data Engineering Databricks
139 pages
Databricks Champions Program Guide v2
No ratings yet
Databricks Champions Program Guide v2
10 pages
Spark QA
No ratings yet
Spark QA
34 pages
Spark in Production
No ratings yet
Spark in Production
34 pages
SnowProCore Exam Study Guide 011425 COF C02
No ratings yet
SnowProCore Exam Study Guide 011425 COF C02
14 pages
Databricks Certified Data Engineer Associate Practice Exams - 1
100% (1)
Databricks Certified Data Engineer Associate Practice Exams - 1
25 pages
Complete Guide To Spark Memory Management 1726709042
No ratings yet
Complete Guide To Spark Memory Management 1726709042
11 pages
PracticeExam DCADAS3 Scala 1
No ratings yet
PracticeExam DCADAS3 Scala 1
27 pages
De Mod 2 Transform Data With Spark
No ratings yet
De Mod 2 Transform Data With Spark
32 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Apache Airflow Fundamentals Study Guide
No ratings yet
Apache Airflow Fundamentals Study Guide
7 pages
Databricks Certified Data Engineer Professional Practice Questions
No ratings yet
Databricks Certified Data Engineer Professional Practice Questions
13 pages
Data Engineering Roadmap 2024
No ratings yet
Data Engineering Roadmap 2024
4 pages
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
Apache Spark
No ratings yet
Apache Spark
62 pages
Snowflake Demo
No ratings yet
Snowflake Demo
13 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Databricks Certified Developer For Apache Spark 3.0 Practice Tests 540 Questions
0% (1)
Databricks Certified Developer For Apache Spark 3.0 Practice Tests 540 Questions
290 pages
Cloudera Spark Developer Training
No ratings yet
Cloudera Spark Developer Training
491 pages
2018 02 08 Whats New in Apache Spark 2 180213220045
No ratings yet
2018 02 08 Whats New in Apache Spark 2 180213220045
57 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
Pyspark Interview 1738079940
No ratings yet
Pyspark Interview 1738079940
6 pages
Mongodb Cheat Sheet
No ratings yet
Mongodb Cheat Sheet
10 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
20 PySpark Problems
No ratings yet
20 PySpark Problems
22 pages
SparkInternals All
No ratings yet
SparkInternals All
90 pages
Pyspark Cashing & Persisting - Complete Guide
No ratings yet
Pyspark Cashing & Persisting - Complete Guide
3 pages
HDP Developer-Enterprise Spark 1-Python Lab Guide-Rev 1
No ratings yet
HDP Developer-Enterprise Spark 1-Python Lab Guide-Rev 1
168 pages
Microsoft - Pass4sure - DP 203.free - pdf.2024 Mar 29
No ratings yet
Microsoft - Pass4sure - DP 203.free - pdf.2024 Mar 29
21 pages
Python For Data Engineering Guide
No ratings yet
Python For Data Engineering Guide
4 pages
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
No ratings yet
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
3 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
Ambari Operations
No ratings yet
Ambari Operations
194 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Spark Syllabus 1
No ratings yet
Spark Syllabus 1
3 pages
Hadoop and Mapreduce Cheat Sheet
No ratings yet
Hadoop and Mapreduce Cheat Sheet
1 page
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages

Simulado Databricks

Uploaded by

Simulado Databricks

Uploaded by

You might also like