Exercise Combining Two Data Sources

The document outlines a process for combining two webscraping programs to gather S&P 500 company ticker symbols and additional financial information from Yahoo Finance. It instructs to store the data in a pandas dataframe and save it as a CSV file while ensuring that the program runs every 15 seconds without overwriting previous data. It also includes guidance on using the time, os, and datetime modules for managing file existence and recording timestamps.

Uploaded by

blabla blabla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views1 page

Exercise Combining Two Data Sources

Uploaded by

blabla blabla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

1 Combining two data sources

Now that we have created a webscrap to get a list of company ticker symbols and we also have
a webscrap to get more financial information for them from the Yahoo Finance website, it’s
time to combine these information sources.

Modify the programs we’ve created in the previous lecture to have a program that scraps
the list of S&P 500 companies for the ticker symbols, and gets the additional information from
the Yahoo Finance website.

Furthermore, put this data into a pandas dataframe (in whatever format you think is
good/appropriate for further analysis) and save it to a csv file.

Challenge:
Make this webscrap run every 15 seconds, and make sure to not overwrite your previous data file.

1. You can use the time module to get the current time and also have your program wait
for a specific amount of time

(a) You can use time.time() to get the current epoch time
(b) You can store time.time() in a variable to keep track of the time that an event
occurred and reference it later (or subtract it from calling time.time() again to get
the time difference between the event and now)
(c) You can use time.sleep(seconds) to make your program wait for a specific number of
seconds

2. You can use the os module to check if a specific file exists

(a) You can use os.path.isfile(pathToFile) to check if the file at the location referenced
in pathToFile exists.
E.g. os.path.isfile(”test/rt.txt”) checks if the file rt.txt exists in the folder test (rel-
ative to the folder where the program is saved).

3. Also use the datetime module an create an extra column that keeps track of the time the
information was recorded

(a) You can use datetime.datetime.now() to get the current date and time
(b) You can use the .timestamp() on a datetime object to get the epoch timestamp
(e.g.datetime.datetime.now() gives the current datetime’s epoch time)

Data Aggregation
No ratings yet
Data Aggregation
68 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
Common Python Data Science Interview Questions1
No ratings yet
Common Python Data Science Interview Questions1
5 pages
Lecture 15 (DS) - Pandas - DataFrame Merging, String Operations
No ratings yet
Lecture 15 (DS) - Pandas - DataFrame Merging, String Operations
25 pages
MY Question Bank
No ratings yet
MY Question Bank
3 pages
Question Bank CIA 2
No ratings yet
Question Bank CIA 2
3 pages
Notes For Fintech Assesment, Cheatsheet
No ratings yet
Notes For Fintech Assesment, Cheatsheet
19 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Python Unit 2 Question Bank
No ratings yet
Python Unit 2 Question Bank
5 pages
Python2 Materials
No ratings yet
Python2 Materials
27 pages
100 Python Interview Questions
No ratings yet
100 Python Interview Questions
68 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
4 BNI Python Training
100% (1)
4 BNI Python Training
126 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
DS Final
No ratings yet
DS Final
46 pages
Data Analysis and Visualization LAB
No ratings yet
Data Analysis and Visualization LAB
2 pages
Sac QB 2023-2024
No ratings yet
Sac QB 2023-2024
2 pages
Data Frame
No ratings yet
Data Frame
95 pages
Utf-8''libraries Data Management
No ratings yet
Utf-8''libraries Data Management
9 pages
Unit Ii 2M
No ratings yet
Unit Ii 2M
8 pages
Jenisha INTERNSHIP REPORT-2
No ratings yet
Jenisha INTERNSHIP REPORT-2
19 pages
UNIT II Material
No ratings yet
UNIT II Material
34 pages
Python For Data Science Unit 3: DR Kruti Dangarwala CSE & IT Department Svmit
No ratings yet
Python For Data Science Unit 3: DR Kruti Dangarwala CSE & IT Department Svmit
113 pages
Data Analysis Python Read The Docs Io en Latest
No ratings yet
Data Analysis Python Read The Docs Io en Latest
79 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
Ba Ca
No ratings yet
Ba Ca
10 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Introduction To Python 1
No ratings yet
Introduction To Python 1
13 pages
Pandas Training Plan
No ratings yet
Pandas Training Plan
5 pages
Python Ds
No ratings yet
Python Ds
22 pages
Python 1
No ratings yet
Python 1
14 pages
1
No ratings yet
1
7 pages
Data Science Workshop - Day 1
No ratings yet
Data Science Workshop - Day 1
80 pages
Python For Data Analysis 3rd Edition - Wes McKinney-trang-4
No ratings yet
Python For Data Analysis 3rd Edition - Wes McKinney-trang-4
60 pages
Reading An Entire File at Once: Generating Current Date
No ratings yet
Reading An Entire File at Once: Generating Current Date
2 pages
Intro2python Part2
No ratings yet
Intro2python Part2
26 pages
Python Vibration Analysis
No ratings yet
Python Vibration Analysis
22 pages
Ct3 QB Answers
No ratings yet
Ct3 QB Answers
8 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
PDS Qba
No ratings yet
PDS Qba
12 pages
Python For DataScience
No ratings yet
Python For DataScience
47 pages
Phyton
No ratings yet
Phyton
11 pages
Lab Manual ET Lab III
No ratings yet
Lab Manual ET Lab III
38 pages
Directories and Modules in Python
No ratings yet
Directories and Modules in Python
39 pages
Data Science
No ratings yet
Data Science
10 pages
Internship Interview Questions and Answers
No ratings yet
Internship Interview Questions and Answers
5 pages
Numpy Notes
No ratings yet
Numpy Notes
38 pages
DATASCIENCE (Unit-1) Question Bank
No ratings yet
DATASCIENCE (Unit-1) Question Bank
6 pages
Python Topics
No ratings yet
Python Topics
3 pages
Report
No ratings yet
Report
18 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Python For Data Science - Ultimate Library Guide
No ratings yet
Python For Data Science - Ultimate Library Guide
5 pages
PP Anakonda
No ratings yet
PP Anakonda
8 pages
Chapter 3 Python For Data Science
No ratings yet
Chapter 3 Python For Data Science
81 pages
Dictionaries
No ratings yet
Dictionaries
87 pages
Unit 5 I
No ratings yet
Unit 5 I
34 pages
Real Python Interview Questions American Express
No ratings yet
Real Python Interview Questions American Express
7 pages
HKUST2023 Python HSC Lecture2
No ratings yet
HKUST2023 Python HSC Lecture2
13 pages

Exercise Combining Two Data Sources

Uploaded by

Exercise Combining Two Data Sources

Uploaded by

1 Combining two data sources

2. You can use the os module to check if a specific file exists

You might also like