0% found this document useful (0 votes)

80 views12 pages

Extracting Data From An API On Databricks - by Ryan Chynoweth - Feb, 2024 - Medium

Uploaded by

walteravelin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views12 pages

Extracting Data From An API On Databricks - by Ryan Chynoweth - Feb, 2024 - Medium

Uploaded by

walteravelin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Extracting Data from an API on

Databricks
Ryan Chynoweth · Follow
5 min read · Feb 11, 2024

160 6

Introduction
Databricks seamlessly integrates with the application and data
infrastructure of organizations. Its ability to extract data from various
sources, perform transformations, and integrate with data sinks
simplifies system integration differentiating Databricks. This stands in
contrast to cloud data warehouses that are reliant on external functions
for these tasks that increase complexity and cost.

We will cover Databricks’ ability to consume data from external APIs and
save that data to a table in Databricks Unity Catalog. We will explore two
primary methods: a single-threaded approach, and a distributed option
for executing requests in parallel. In both scenarios we will be using the
python requests library to perform these actions.

Associated code can be found on my GitHub. It is worth noting that

within Databricks, I utilize Python Imports to modularize the code
effectively, housing the functions responsible for calling the Rest APIs in
the libs directory.

Single Threaded Option

The first approach involves consuming a single API endpoint at a time,
with the execution taking place on the driver. This method is ideal for
scenarios where you need to access one or just a few endpoints
periodically. To deploy this solution, engineers should consider selecting
the single-node compute option for the cluster since the code operates on
a single machine and doesn’t require additional virtual machines for
processing data.

As an example, we are going to call the https://cat-

fact.herokuapp.com/facts/ endpoint which is available through Postman.

See below for the example function that calls an API.

import requests
import json
from pyspark.sql.functions import udf

class APIExtract():
"""
Class used for transformations requiring an API call.
"""

def __init__(self):
self.api_udf = udf(self.call_simple_rest_api)

def call_simple_rest_api(self, url="https://cat-fact.herokuapp.com/facts/"):

""" Example Rest API call to open API from Postman """
# public REST API from PostMan: https://documenter.getpostman.com/view/8854915
response = requests.get(url)
return json.loads(response.text)
In a Python notebook we can then import this class and call the API with
the following code.

from libs.api_extract import APIExtract

api_extract_client = APIExtract()
api_extract_client.call_simple_rest_api()

Once executed you should have the following output.

'[
{
"status": {
"verified": true,
"sentCount": 1
},
"_id": "58e00b5f0aac31001185ed24",
"user": "58e007480aac31001185ecef",
"text": "When asked if her husband had any hobbies, Mary Todd Lincoln is s
"__v": 0,
"source": "user",
"updatedAt": "2020-08-23T20:20:01.611Z",
"type": "cat",
"createdAt": "2018-02-19T21:20:03.434Z",
"deleted": false,
"used": false
},
...
...
...
{
"status": {
"verified": true,
"sentCount": 1
},
"_id": "58e00af60aac31001185ed1d",
"user": "58e007480aac31001185ecef",
"text": "It was illegal to slay cats in ancient Egypt, in large part becau
"__v": 0,
"source": "user",
"updatedAt": "2020-09-16T20:20:04.164Z",
"type": "cat",
"createdAt": "2018-01-15T21:20:02.945Z",
"deleted": false,
"used": true
}
]'

Next we want to save the data to a table in Unity Catalog which can be
done using the code below.

df = spark.createDataFrame(data)
df.write.saveAsTable('cat_data')

We have now extracted data from an API, converted the response to a

DataFrame, then saved the DataFrame to a table in Unity Catalog.

Parallel API Calls

In some scenarios, there may be a need to collect data from multiple API
endpoints concurrently, or to paginate through a single API endpoint.
This can be efficiently achieved by running operations in parallel across
multiple cores simultaneously.

To begin, we will need to make a couple of additions to our notebook to

enable the parallel execution of API calls. First, we create a PySpark
DataFrame containing the request parameters. In this example, we only
have one column, url , but additional columns could be included for
filtering, authentication, payloads, and other purposes.
# Create a list of dictionaries with the URL values
request_params = [
{"url": "https://cat-fact.herokuapp.com/facts/"},
{"url": "https://dog.ceo/api/breeds/list/all/"},
{"url": "https://world.openpetfoodfacts.org/api/v0/product/20106836.json"},
{"url": "https://world.openfoodfacts.org/api/v0/product/737628064502.json"},
{"url": "https://openlibrary.org/api/books?bibkeys=ISBN:0201558025,LCCN:930054
]

# Create DataFrame from the list of dictionaries

request_df = spark.createDataFrame(request_params)
request_df.show()

You should then have the following output.

+--------------------+
| url|
+--------------------+
|https://cat-fact....|
|https://dog.ceo/a...|
|https://world.ope...|
|https://world.ope...|
|https://openlibra...|
+--------------------+

Now I can call the api_udf function created in our APIExtract class
above.

response_df = request_df.withColumn('response', api_extract_client.api_udf(col('ur

response_df.show()

You end up with the following DataFrame which can be saved to a table
using response_df.write.saveAsTable('parallel_api_calls') .
+--------------------+--------------------+
| url| response|
+--------------------+--------------------+
|https://cat-fact....|[{createdAt=2018-...|
|https://dog.ceo/a...|{message={pyrenee...|
|https://world.ope...|{status_verbose=p...|
|https://world.ope...|{status_verbose=p...|
|https://openlibra...|{LCCN:93005405={p...|
+--------------------+--------------------+

Notice that all the data regardless of endpoint is saved to a single table
and/or DataFrame. Ideally you will need to split the data into separate
datasets which you can do with the following code as a stream or batch
process from the ingestion table.

df = (spark.read
.table('parallel_api_calls')
.filter(col('url') == 'https://cat-fact.herokuapp.com/facts/')
)

df.write.saveAsTable('cat_facts')

Running as a Task in Workflows

Users often need to extract data as part of a larger job. This can be
seamlessly integrated into Databricks workflows as a task. Below, we
present an example job that involves ingesting data from OpenWeather,
followed by the execution of dependent tasks for further processing.

Search Write
The source code for this pipeline can be found here.

Before concluding, let’s consider the cost of extracting data from an API.
Utilizing single-node compute on Databricks for extracting data from an
API can prove to be highly cost-effective. While some may argue it is
cheaper to use cloud function solutions, the streamlined architecture of
having ingestion and transformations in a single pipeline often
outweighs any marginal cost differences and allow users to more easily
save data directly into a table without staging the data as files. Typically,
the cost associated with extracting data from the API is insignificant
compared to the overall expense of running data pipelines on entire
datasets and tables.

With the provided parallel API code, I would recommend consolidating

the ingestion process into a single task within Databricks. This approach
allows you to leverage economies of scale, where the cost for data
extraction is spread over multiple data sources and the cluster is highly
utilized.

Conclusion
Consuming data from an API and saving the response to a table is
extremely simple on Databricks. This can be done in a single node
manner for smaller use cases or can be distributed to run in parallel for
scale.
Disclaimer: these are my own thoughts and opinions and not a reflection of my
employer

Databricks Data Ingestion API

Written by Ryan Chynoweth Follow

470 Followers

Senior Solutions Architect Databricks — anything shared is my own thoughts and

opinions

More from Ryan Chynoweth

Ryan Chynoweth Ryan Chynoweth

Task Parameters and Values in Converting Stored Procedures to

Databricks Workflows Databricks
Special thanks to co-author Kyle Hale, Sr.
Specialist Solutions Architect at…
Databricks.
11 min read · Dec 7, 2022 14 min read · Dec 29, 2022

54 4 123 7

Ryan Chynoweth in DBSQL SME Engineering Ryan Chynoweth

Converting Chained Stored Recursive CTE on Databricks

Procedures to Databricks Introduction
SQL workflows on DBSQL!

6 min read · Feb 22, 2024 3 min read · Apr 20, 2022

21 33

See all from Ryan Chynoweth

Recommended from Medium

Prashanth Kumar Dave Melillo in Towards Data Science

Azure Databricks: Job Building a Data Platform in 2024

Performance Monitoring,… How to build a modern, scalable data
Troubleshooting
As a part of Databricksand
Job performance platform to power your analytics and data…
Optimization
Monitoring, troubleshooting and… science projects (updated)
Optimization we will be looking into various
14 min read · Feb 25, 2024
aspects. 9 min read · Feb 5, 2024

5 2.1K 31

Lists

Coding & Development Company Offsite Reading

11 stories · 502 saves List
8 stories · 104 saves

data science and AI Natural Language

40 stories · 103 saves Processing
1284 stories · 774 saves
Ahmed Sayed Ryan Chynoweth in DBSQL SME Engineering

Build Scalable Data Pipelines in Converting Chained Stored

Python Using DLT Procedures to Databricks
SQL workflows on DBSQL!

13 min read · Feb 18, 2024 6 min read · Feb 22, 2024

326 2 21

Viral Patel Abhinav Prakash

Apply Encryption to PII Fields of Delta Lake vs. Parquet

Delta Tables If Delta lake tables also use Parquet files to
When it comes to the PII (Personally store data, how are they different (and…
Identifiable Information) or Commercially… better) than vanilla Parquet tables?
Sensitive data, It is necessary requirement
6
ofmin readof· the…
most Jan 17, 2024 10 min read · Jan 24, 2024

100 3 137 3

See more recommendations

Help Status About Careers Blog Privacy Terms Text to speech Teams

PWNSAT Sample Paper Class 10th Sample Paper Questions
71% (7)
PWNSAT Sample Paper Class 10th Sample Paper Questions
7 pages
Precalculus Concepts Through Functions A Unit Circle Approach To Trigonometry 3rd Edition Sullivan Test Bank
No ratings yet
Precalculus Concepts Through Functions A Unit Circle Approach To Trigonometry 3rd Edition Sullivan Test Bank
105 pages
Time Series Forecasting
100% (1)
Time Series Forecasting
52 pages
Building An Open Source Facial Recognition System For Mass Surveillance
100% (1)
Building An Open Source Facial Recognition System For Mass Surveillance
31 pages
Simple Neon Lamp Circuits and Working Explained 2
No ratings yet
Simple Neon Lamp Circuits and Working Explained 2
36 pages
Pyhon FastAPI
No ratings yet
Pyhon FastAPI
10 pages
PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
What Is Trip Circuit Supervision (TCS) Protection
No ratings yet
What Is Trip Circuit Supervision (TCS) Protection
7 pages
How To Create Secrets in Databricks? - by Ashish Garg - Medium
No ratings yet
How To Create Secrets in Databricks? - by Ashish Garg - Medium
13 pages
Big Book of Data Engineering 2nd Edition Final
No ratings yet
Big Book of Data Engineering 2nd Edition Final
97 pages
1000 Nesco Discom 2000 Southco Discom 3000 Wesco Discom: Code Description
No ratings yet
1000 Nesco Discom 2000 Southco Discom 3000 Wesco Discom: Code Description
58 pages
Berry-Esseen Central Limit The
No ratings yet
Berry-Esseen Central Limit The
65 pages
How To Upload Single and Multiple Files in Golang 2024
No ratings yet
How To Upload Single and Multiple Files in Golang 2024
46 pages
Plano Hyd d8t
100% (1)
Plano Hyd d8t
2 pages
Physical Science Grade 12 Step Ahead Solutions 2021
No ratings yet
Physical Science Grade 12 Step Ahead Solutions 2021
38 pages
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
No ratings yet
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
14 pages
Uv-K5 User Manuel
No ratings yet
Uv-K5 User Manuel
55 pages
Oracle Recommended Patches R12.ATG - PF.B
No ratings yet
Oracle Recommended Patches R12.ATG - PF.B
32 pages
CA02CA3103 RMTTransportation Problem
No ratings yet
CA02CA3103 RMTTransportation Problem
25 pages
Structural Cals For UCW
No ratings yet
Structural Cals For UCW
11 pages
Dynamic Resource Allocation: Spark Interview Series. - by Nitesh Chaudhry - Jan, 2024 - Medium
No ratings yet
Dynamic Resource Allocation: Spark Interview Series. - by Nitesh Chaudhry - Jan, 2024 - Medium
11 pages
Smart Soot Blower System
No ratings yet
Smart Soot Blower System
8 pages
Improve s3 Write Performance With Magic Committer in Spark3 - by Rishika Idnani - Towards Data Engineering - Medium
No ratings yet
Improve s3 Write Performance With Magic Committer in Spark3 - by Rishika Idnani - Towards Data Engineering - Medium
8 pages
GCSE Maths Revision Checklist Higher
No ratings yet
GCSE Maths Revision Checklist Higher
32 pages
SDM Lab Report
No ratings yet
SDM Lab Report
35 pages
Phy 421 Note
No ratings yet
Phy 421 Note
27 pages
APC200 ECM-ECI Error Codes TE13,15,17,27,32, Ver2.6
No ratings yet
APC200 ECM-ECI Error Codes TE13,15,17,27,32, Ver2.6
15 pages
Propylparabens Uv-Vis 1
No ratings yet
Propylparabens Uv-Vis 1
12 pages
Pyspark Modules&packages RDD
No ratings yet
Pyspark Modules&packages RDD
9 pages
Monitoring Databricks Jobs Through Calls To The Rest Api
No ratings yet
Monitoring Databricks Jobs Through Calls To The Rest Api
7 pages
GE 10 Lab Ex 4
No ratings yet
GE 10 Lab Ex 4
8 pages
Taking The Control System For Granted - Ensuring The Integrity of Sub-Sil Instrumented Functions
No ratings yet
Taking The Control System For Granted - Ensuring The Integrity of Sub-Sil Instrumented Functions
5 pages
Maximum Mark: 30: Cambridge International Advanced Subsidiary and Advanced Level
No ratings yet
Maximum Mark: 30: Cambridge International Advanced Subsidiary and Advanced Level
4 pages
1704875755
No ratings yet
1704875755
4 pages
5-IA Overlap Clarification May 2020
No ratings yet
5-IA Overlap Clarification May 2020
3 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Array or Binary Multiplier
No ratings yet
Array or Binary Multiplier
2 pages
A9F74220
No ratings yet
A9F74220
3 pages
D5072-087 DTS0434
No ratings yet
D5072-087 DTS0434
2 pages
Spacer Plate
No ratings yet
Spacer Plate
1 page
Building Intelligent Agents with Google ADK
From Everand
Building Intelligent Agents with Google ADK
Amulya Rattan Bhatia
No ratings yet
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
From Everand
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Neylson Crepalde
No ratings yet
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Modern DevOps Practices: Implement, secure, and manage applications on the public cloud by leveraging cutting-edge tools
From Everand
Modern DevOps Practices: Implement, secure, and manage applications on the public cloud by leveraging cutting-edge tools
Gaurav Agarwal
No ratings yet
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
From Everand
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Adi Wijaya
No ratings yet
AWS Certified Solutions Architect Study Guide: Associate SAA-C02 Exam
From Everand
AWS Certified Solutions Architect Study Guide: Associate SAA-C02 Exam
David Clinton
No ratings yet
Microsoft Power Platform For Dummies
From Everand
Microsoft Power Platform For Dummies
Jack A. Hyman
1/5 (1)
Data Mining with Microsoft SQL Server 2008
From Everand
Data Mining with Microsoft SQL Server 2008
Jamie MacLennan
4/5 (1)
Microsoft Azure: Enterprise Application Development
From Everand
Microsoft Azure: Enterprise Application Development
Richard J. Dudley
1/5 (1)
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
From Everand
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
Aniruddha Deswandikar
No ratings yet
50+ App Features with Python: Implement feature-focused, code-driven Python capabilities with UX at the core
From Everand
50+ App Features with Python: Implement feature-focused, code-driven Python capabilities with UX at the core
Ylena Zorak
No ratings yet
AWS Glue for Data Engineers: Serverless ETL Made Easy
From Everand
AWS Glue for Data Engineers: Serverless ETL Made Easy
Robert Johnson
No ratings yet
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
From Everand
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
equitypress
4.5/5 (3)
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
Data Lakes & Pipelines: A Modern Azure Guide
From Everand
Data Lakes & Pipelines: A Modern Azure Guide
Kameron Hussain
No ratings yet
Building a Product Master
From Everand
Building a Product Master
Edufdev
No ratings yet
Kubernetes and Cloud Native Associate (KCNA) Exam Preparation
From Everand
Kubernetes and Cloud Native Associate (KCNA) Exam Preparation
Georgio Daccache
No ratings yet
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learn Microsoft Azure: Step by Step in 7 day for .NET Developers
From Everand
Learn Microsoft Azure: Step by Step in 7 day for .NET Developers
Saillesh Pawar
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
Coding & Dev Tools 300+ Prompts Collection
From Everand
Coding & Dev Tools 300+ Prompts Collection
Hema
No ratings yet
Mastering C++ Network Automation
From Everand
Mastering C++ Network Automation
Justin Barbara
No ratings yet
50+ App Features with Python
From Everand
50+ App Features with Python
Ylena Zorak
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
From Everand
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
Eddie Vi
4/5 (1)
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
From Everand
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
Justin Barbara
No ratings yet
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
From Everand
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
Exam OG
No ratings yet
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
50 Recipes for Programming Angular
From Everand
50 Recipes for Programming Angular
Jamie Munro
4/5 (1)
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
Microsoft® Dynamics AX® Interview Questions: Unofficial Microsoft Dynamics AX Axapta Certification Review
From Everand
Microsoft® Dynamics AX® Interview Questions: Unofficial Microsoft Dynamics AX Axapta Certification Review
equitypress
No ratings yet
Conversations with: AI: Developer edition, #1
From Everand
Conversations with: AI: Developer edition, #1
Xinc Cyberwizard
No ratings yet
API Gateway, Cognito and Node.js Lambdas
From Everand
API Gateway, Cognito and Node.js Lambdas
Matthew Casperson
5/5 (1)
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
Microsoft .NET Interview Questions: MS .NET Certification Review
From Everand
Microsoft .NET Interview Questions: MS .NET Certification Review
equitypress
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Google Cloud Professional Cloud Architect 100+ Practice Exam questions with Detailed Answers
From Everand
Google Cloud Professional Cloud Architect 100+ Practice Exam questions with Detailed Answers
vivian njoroge
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
DevOps. How To Build Pipelines With Bitbucket Pipelines + Docker Container + AWS ECS + JDK 11 + Maven 3?
From Everand
DevOps. How To Build Pipelines With Bitbucket Pipelines + Docker Container + AWS ECS + JDK 11 + Maven 3?
John Edward Cooper Berg
No ratings yet
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
From Everand
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
Georgio Daccache
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
From Everand
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
SUJAN
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
SAP XI Exchange Infrastructure
From Everand
SAP XI Exchange Infrastructure
equitypress
1/5 (3)

Extracting Data From An API On Databricks - by Ryan Chynoweth - Feb, 2024 - Medium

Uploaded by

Extracting Data From An API On Databricks - by Ryan Chynoweth - Feb, 2024 - Medium

Uploaded by

Extracting Data from an API on

Associated code can be found on my GitHub. It is worth noting that

Single Threaded Option

As an example, we are going to call the https://cat-

fact.herokuapp.com/facts/ endpoint which is available through Postman.

def call_simple_rest_api(self, url="https://cat-fact.herokuapp.com/facts/"):

from libs.api_extract import APIExtract

Once executed you should have the following output.

We have now extracted data from an API, converted the response to a

Parallel API Calls

To begin, we will need to make a couple of additions to our notebook to

# Create DataFrame from the list of dictionaries

You should then have the following output.

response_df = request_df.withColumn('response', api_extract_client.api_udf(col('ur

Running as a Task in Workflows

With the provided parallel API code, I would recommend consolidating

Databricks Data Ingestion API

Written by Ryan Chynoweth Follow

Senior Solutions Architect Databricks — anything shared is my own thoughts and

More from Ryan Chynoweth

Task Parameters and Values in Converting Stored Procedures to

Ryan Chynoweth in DBSQL SME Engineering Ryan Chynoweth

Converting Chained Stored Recursive CTE on Databricks

See all from Ryan Chynoweth

Prashanth Kumar Dave Melillo in Towards Data Science

Azure Databricks: Job Building a Data Platform in 2024

Coding & Development Company Offsite Reading

data science and AI Natural Language

Build Scalable Data Pipelines in Converting Chained Stored

Viral Patel Abhinav Prakash

Apply Encryption to PII Fields of Delta Lake vs. Parquet

See more recommendations

You might also like