0% found this document useful (0 votes)

12 views4 pages

???? ?????????? ????

The document outlines the process for retrieving and analyzing earthquake data using the USGS API. It details steps for obtaining historical and daily data, flattening the data using Pyspark or Cloud Dataflow, and loading it into BigQuery for further analysis. Additionally, it specifies the required transformations and analysis tasks to be performed on the earthquake data.

Uploaded by

Satyajit Ligade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views4 pages

???? ?????????? ????

Uploaded by

Satyajit Ligade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

𝗘𝗮𝗿𝘁𝗵𝗾𝘂𝗮𝗸𝗲 𝗗𝗮𝘁𝗮 𝗨𝗦𝗚𝗦 API Data

1. Refer to the below Site for the API data schema

https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php

2. Refer below steps to get data from the source.

a. First, you have to get historical data [Last month's data]
https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.geojson
b. After that, you have to get data for each day[Below URL will pull data for the
past day]
https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson

3. Once you get this data from the source, please perform the below steps
a. First, get data from the source
b. Using Spark flatten all the columns from the source.
i. Flatten column names, IF you are having nested columns make them
unnest it.
Example:
“test”:”ha”,
“Feature”:[
{“Type”:”abc”
“Name”:”abc”},
{“Type”:”pqr”
“Name”:”pqr”}
]

After flattening the above JSON File, I should get the below columns in
my target table.
test, feature_type, feature_name

c. and store them in the target location. Please refer target location
earthquakeanalysis/raw/<date in YYYYMMDD>/<target file>.parquet

d. In target data, you have the URL at the location,

features.properties.detail

Base url: https://earthquake.usgs.gov/

End point url: earthquakes/feed/v1.0/detail/<id>.geojson

Please refer below screenshot, for more details,

e. At this detail: location you have a URL , you have to pick this URL and pull data
for this URL using rest API.
f. Using Pyspark you have to flatten all the columns in the data and store it in the
below location
earthquakeanalysis/raw/<date in YYYYMMDD>/<ids>_<target file>.parquet
g. Above highlighted yellow “ids” value you will get from the same URL or from
the previously copied data.

4. Once this is done. To generate Analysis Layer Questions will be shared with you.

Step 1: API Request

There are two scenarios
1. Using Pyspark - Dataproc or Databricks - python - request lib.
2. Using Cloud Dataflow - python request lib
Landing Location: gs://earthquake_analysis/pyspark/landing/20241019/*.json
gs://earthquake_analysis/dataflow/landing/20241019/*.json

Step 2. Flattening the data

1. Using Pyspark
2. Using Cloud dataflow

- While doing flattening also do below transformation

- Columns like “time”, “updated” - convert its value from epoch to timestamp
- Generate column “area” - based on existing “place” column

Silver Location: gs://earthquake_analysis/Silver/20241019/*.json

Flatten historical and daily data based on below example:

"mag": 0.89,
"place": "6 km NW of The Geysers, CA",
"time": 1729308248850,
"updated": 1729308343908,
"tz": null,
"url": "https://earthquake.usgs.gov/earthquakes/eventpage/nc75076006",
"detail": "https://earthquake.usgs.gov/earthquakes/feed/v1.0/detail/nc75076006.geojson",
"felt": null,
"cdi": null,
"mmi": null,
"alert": null,
"status": "automatic",
"tsunami": 0,
"sig": 12,
"net": "nc",
"code": "75076006",
"ids": ",nc75076006,",
"sources": ",nc,",
"types": ",nearby-cities,origin,phase-data,",
"nst": 9,
"dmin": 0.01303,
"rms": 0.02,
"gap": 77,
"magType": "md",
"type": "earthquake",
"title": "M 0.9 - 6 km NW of The Geysers, CA",
"geometry": {
"longtitude":-122.813163757324,
"latitude":38.8125,
"depth": 3.25999999046326
}

Step 3: Load data into Bigquery

- Add two extra columns
- 1. Insert data : insert_dt (Timestamp)

BQ Table: earthquake_db.earthquake_data

Do below Analysis using Pyspark and BigQuery

1. Count the number of earthquakes by region
2. Find the average magnitude by the region
3. Find how many earthquakes happen on the same day.
4. Find how many earthquakes happen on same day and in same region
5. Find average earthquakes happen on the same day.
6. Find average earthquakes happen on same day and in same region
7. Find the region name, which had the highest magnitude earthquake last week.
8. Find the region name, which is having magnitudes higher than 5.
9. Find out the regions which are having the highest frequency and intensity of
earthquakes.

Cloud Composer
Historical load - Manual and its going to be one time activity
Daily Load -
- Ingestion - transformation - Bq load

Free Ebook Opensource Geospatial Data
No ratings yet
Free Ebook Opensource Geospatial Data
58 pages
GIS For Web Developers Adding Where To Your Web Applications 1st Edition Scott Davis Download
100% (2)
GIS For Web Developers Adding Where To Your Web Applications 1st Edition Scott Davis Download
53 pages
The Business Case For Cloud
No ratings yet
The Business Case For Cloud
33 pages
Huawei H19 308
No ratings yet
Huawei H19 308
29 pages
GIS For Web Developers
100% (2)
GIS For Web Developers
258 pages
Seminar Smart Health Record Using Machine Learning Report
No ratings yet
Seminar Smart Health Record Using Machine Learning Report
23 pages
Carrier VoIP Nortel CS 2000 Core Manager Fault Management
100% (1)
Carrier VoIP Nortel CS 2000 Core Manager Fault Management
394 pages
Spark Job Dataproc
No ratings yet
Spark Job Dataproc
4 pages
Relational Database Design - Domain and Data Dependency
No ratings yet
Relational Database Design - Domain and Data Dependency
79 pages
Inti Go Docs
No ratings yet
Inti Go Docs
12 pages
Mapping Global Data Sets - Json
100% (1)
Mapping Global Data Sets - Json
15 pages
Geopandas Documentation: Release 0.2.0.dev
No ratings yet
Geopandas Documentation: Release 0.2.0.dev
45 pages
Plotly Club Cheat Sheet
No ratings yet
Plotly Club Cheat Sheet
1 page
An Era of ChatGPT As A Significant Futuristic Support Tool A Study On Features
No ratings yet
An Era of ChatGPT As A Significant Futuristic Support Tool A Study On Features
8 pages
ULFEM Time-Series-Analysis Package: Open-File Report 2013-1285
No ratings yet
ULFEM Time-Series-Analysis Package: Open-File Report 2013-1285
327 pages
Assignment 1
0% (1)
Assignment 1
6 pages
Live Expand RedHat-based Linux LVM Volume and Filesystem On VMWare Virtual Machines
No ratings yet
Live Expand RedHat-based Linux LVM Volume and Filesystem On VMWare Virtual Machines
6 pages
Geopandas 50 Exercises
No ratings yet
Geopandas 50 Exercises
2 pages
GIS 4653/5653: Spatial Programming and GIS
No ratings yet
GIS 4653/5653: Spatial Programming and GIS
86 pages
Step by Step Duplicating Oracle Database Using RMAN Backup With Connection To Target Database
No ratings yet
Step by Step Duplicating Oracle Database Using RMAN Backup With Connection To Target Database
6 pages
May 3 11am-12pm Gary Gordhamer
No ratings yet
May 3 11am-12pm Gary Gordhamer
47 pages
SAP Bi 7.3
No ratings yet
SAP Bi 7.3
95 pages
GeoEvent & GeoAnalytics
No ratings yet
GeoEvent & GeoAnalytics
60 pages
Slido 2
No ratings yet
Slido 2
39 pages
Applsci 10 00856 v2
No ratings yet
Applsci 10 00856 v2
32 pages
v3 GCP Service Wise Interview Questions
No ratings yet
v3 GCP Service Wise Interview Questions
62 pages
Stat Chapter 1
No ratings yet
Stat Chapter 1
28 pages
Data Analysis Based On Earthquake
No ratings yet
Data Analysis Based On Earthquake
20 pages
OpenText Archive Server and OpenText Enterprise Library 10.5 - Update Installation Guide (UNIX-Linux) English (AR100500-DUG-EN-27)
No ratings yet
OpenText Archive Server and OpenText Enterprise Library 10.5 - Update Installation Guide (UNIX-Linux) English (AR100500-DUG-EN-27)
32 pages
What Are The Advantages of Using A Computer
No ratings yet
What Are The Advantages of Using A Computer
2 pages
Using The Geoquery Package: Sean Davis September 21, 2014
No ratings yet
Using The Geoquery Package: Sean Davis September 21, 2014
15 pages
Sara
No ratings yet
Sara
40 pages
Scripting GeoServer
No ratings yet
Scripting GeoServer
28 pages
What Is Data Vault Modelling
No ratings yet
What Is Data Vault Modelling
4 pages
Geohazards 03 00011 v2
No ratings yet
Geohazards 03 00011 v2
28 pages
Remote Sensing: An Overview of Platforms For Big Earth Observation Data Management and Analysis
No ratings yet
Remote Sensing: An Overview of Platforms For Big Earth Observation Data Management and Analysis
25 pages
ITECH1103 - Big Data and Analytics Group Assignment Semester 1
No ratings yet
ITECH1103 - Big Data and Analytics Group Assignment Semester 1
22 pages
Seismic Data Generator
No ratings yet
Seismic Data Generator
9 pages
Enterprise Resource Planning Systems
No ratings yet
Enterprise Resource Planning Systems
36 pages
12 Gmt2 Gridding Processing
No ratings yet
12 Gmt2 Gridding Processing
27 pages
AI Phase5
No ratings yet
AI Phase5
11 pages
Seismic Attributes User
No ratings yet
Seismic Attributes User
20 pages
Introduction To Geopandas
No ratings yet
Introduction To Geopandas
14 pages
Unit-10 Databases
No ratings yet
Unit-10 Databases
21 pages
Declustring-vanStiphout Et Al
No ratings yet
Declustring-vanStiphout Et Al
26 pages
Seismic Analysis With Python
No ratings yet
Seismic Analysis With Python
12 pages
A Beginners Guide To Geospatial Data Analysis
No ratings yet
A Beginners Guide To Geospatial Data Analysis
11 pages
Phase 1
No ratings yet
Phase 1
13 pages
02 Vector Data
No ratings yet
02 Vector Data
27 pages
JGR Solid Earth - 2022 - Aden Antoniów - An Adaptable Random Forest Model For The Declustering of Earthquake Catalogs
No ratings yet
JGR Solid Earth - 2022 - Aden Antoniów - An Adaptable Random Forest Model For The Declustering of Earthquake Catalogs
19 pages
Searching Earthquake Data
No ratings yet
Searching Earthquake Data
17 pages
Spark Theory
No ratings yet
Spark Theory
26 pages
Suicide Rates Analysis
No ratings yet
Suicide Rates Analysis
22 pages
GY4005 Earthquakes Exercise
No ratings yet
GY4005 Earthquakes Exercise
14 pages
Phase 2
No ratings yet
Phase 2
16 pages
Pyspark 1
No ratings yet
Pyspark 1
7 pages
Jmastats
No ratings yet
Jmastats
15 pages
Funcao Estorno
No ratings yet
Funcao Estorno
6 pages
The Quality of Qualitative
No ratings yet
The Quality of Qualitative
7 pages
Gerald Corzo 5/27/2020: Google Colab Platform 2 Reading Data 2
No ratings yet
Gerald Corzo 5/27/2020: Google Colab Platform 2 Reading Data 2
11 pages
Earthquake Prediction
No ratings yet
Earthquake Prediction
10 pages
Gis Final Module
No ratings yet
Gis Final Module
10 pages
GIS F2E Python - Features To Edge List in Python - Installation and Tutorial
No ratings yet
GIS F2E Python - Features To Edge List in Python - Installation and Tutorial
5 pages
Preparing A Dataset For Analysis. A Journey Through My Data Preparation - by Tom Welsh - Feb, 2022 - Medium
No ratings yet
Preparing A Dataset For Analysis. A Journey Through My Data Preparation - by Tom Welsh - Feb, 2022 - Medium
18 pages
Dve Assi 2
No ratings yet
Dve Assi 2
4 pages
Optimizing Seismic Sequence Clustering With Rapid Cube-Based Spatiotemporal Approach
No ratings yet
Optimizing Seismic Sequence Clustering With Rapid Cube-Based Spatiotemporal Approach
12 pages
All Code Explanations
No ratings yet
All Code Explanations
8 pages
317 4441 1 PB
No ratings yet
317 4441 1 PB
7 pages
Grupo 1 Accessing Satellite Imagery Using Python
No ratings yet
Grupo 1 Accessing Satellite Imagery Using Python
6 pages
How Can We Update A Record in Target Table Without Using Update Strategy?
No ratings yet
How Can We Update A Record in Target Table Without Using Update Strategy?
30 pages
An Introduction To GeoPandas. Everything You Need To Get You Started - by Mark Friese - Aug, 2022 - Medium
No ratings yet
An Introduction To GeoPandas. Everything You Need To Get You Started - by Mark Friese - Aug, 2022 - Medium
9 pages
PDC Case Study 637,440,636
No ratings yet
PDC Case Study 637,440,636
22 pages
IJSRDV9I10257
No ratings yet
IJSRDV9I10257
4 pages
Proactive Disaster Detection
No ratings yet
Proactive Disaster Detection
5 pages
Final Questions (GIS)
No ratings yet
Final Questions (GIS)
3 pages
NCM 57: Health Assessment
No ratings yet
NCM 57: Health Assessment
2 pages
Subjects Portions: English
No ratings yet
Subjects Portions: English
5 pages
Drop Tablespace bq1 Qa
No ratings yet
Drop Tablespace bq1 Qa
4 pages
Takeing Cold and Hot Backup
No ratings yet
Takeing Cold and Hot Backup
5 pages
The Final Individual Assignment in Applied Econometrics
No ratings yet
The Final Individual Assignment in Applied Econometrics
5 pages
TeamViewer Backup Factsheet EMEA en
No ratings yet
TeamViewer Backup Factsheet EMEA en
4 pages
Poster PaperSample 4 Pages Sample1
No ratings yet
Poster PaperSample 4 Pages Sample1
4 pages
Towards A Framework For Offering Remote Sensing Data in An Analysis-Ready Format
No ratings yet
Towards A Framework For Offering Remote Sensing Data in An Analysis-Ready Format
4 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
2 pages
Geo Processing
No ratings yet
Geo Processing
3 pages
Example Configuring A Database Connection With VBS
No ratings yet
Example Configuring A Database Connection With VBS
2 pages
Vedant Int Ques Till Now
No ratings yet
Vedant Int Ques Till Now
2 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
NgRx SignalStore: An effortless solution for state management
From Everand
NgRx SignalStore: An effortless solution for state management
Abdelfattah Ragab
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
How to a Developers Guide in 4k: Developer edition, #2
From Everand
How to a Developers Guide in 4k: Developer edition, #2
Xinc Cyberwizard
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet

???? ?????????? ????

Uploaded by

???? ?????????? ????

Uploaded by

𝗘𝗮𝗿𝘁𝗵𝗾𝘂𝗮𝗸𝗲 𝗗𝗮𝘁𝗮 𝗨𝗦𝗚𝗦 API Data

1. Refer to the below Site for the API data schema

2. Refer below steps to get data from the source.

d. In target data, you have the URL at the location,

Base url: https://earthquake.usgs.gov/

Please refer below screenshot, for more details,

Step 1: API Request

Step 2. Flattening the data

- While doing flattening also do below transformation

Silver Location: gs://earthquake_analysis/Silver/20241019/*.json

Flatten historical and daily data based on below example:

Step 3: Load data into Bigquery

Do below Analysis using Pyspark and BigQuery

You might also like