0% found this document useful (0 votes)
16 views2 pages

Report

The project developed a Python command-line tool to fetch research papers from PubMed, focusing on studies by authors affiliated with pharmaceutical or biotech companies. It utilized the PubMed API to extract key metadata and structured the results in a CSV file for analysis. The tool is designed for modularity and maintainability, with plans for future enhancements including improved affiliation detection and advanced filtering options.

Uploaded by

mpv09149
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views2 pages

Report

The project developed a Python command-line tool to fetch research papers from PubMed, focusing on studies by authors affiliated with pharmaceutical or biotech companies. It utilized the PubMed API to extract key metadata and structured the results in a CSV file for analysis. The tool is designed for modularity and maintainability, with plans for future enhancements including improved affiliation detection and advanced filtering options.

Uploaded by

mpv09149
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

# **Research Paper Fetcher: Approach, Methodology, and Results**

## **1. Introduction**

The objective of this project was to develop a Python-based **command-line tool**


to fetch research papers from **PubMed**, focusing on identifying studies authored
by researchers affiliated with **pharmaceutical or biotech companies**. The results
were structured and exported into a CSV file for further analysis.

## **2. Approach**

To ensure modularity and maintainability, the project followed a structured


workflow:

1. **Fetching Research Papers**: Accessing **PubMed API** based on user queries.


2. **Filtering Non-Academic Authors**: Identifying researchers affiliated with
**pharmaceutical or biotech companies**.
3. **Data Extraction**: Collecting key information like **title, authors,
affiliations, and corresponding author email**.
4. **Exporting Data**: Saving results in a **CSV file**.
5. **Command-Line Interface (CLI)**: Providing user-friendly interaction.
6. **Packaging with Poetry**: Ensuring the tool is well-structured and easily
installable.
7. **Publishing to TestPyPI**: Making the package publicly accessible for testing.

## **3. Methodology**

### **3.1 Fetching Data from PubMed**

- Used the **PubMed API** to fetch research articles.


- Extracted critical metadata:
- **PubmedID**
- **Title**
- **Publication Date**
- **Authors & Affiliations**
- **Corresponding Author Email**

### **3.2 Filtering Industry-Affiliated Authors**

- Identified **non-academic affiliations** based on keywords:


- "Pharmaceutical"
- "Biotech"
- Specific companies (e.g., Pfizer, Moderna, Johnson & Johnson)
- Extracted **author names and their respective company affiliations**.

### **3.3 Exporting Data to CSV**

- Implemented CLI functionality with options:


- `-h` / `--help`: Display usage guide.
- `-f` / `--file`: Specify the filename for saving results.
- Stored output in a **well-structured CSV file**.

### **3.4 Project Structure & Packaging**

- Organized the project directory:


- `src/pubmed_fetcher/` - Core logic
- `scripts/get_papers.py` - CLI script
- `tests/` - Unit testing
- Used **Poetry** for dependency and package management.
- Configured an **executable CLI command** (`get-papers-list`).

### **3.5 Publishing to TestPyPI**

- Configured **TestPyPI** as the repository.


- Published package using `poetry publish -r testpypi`.
- Resolved potential **naming conflicts** to ensure successful deployment.

## **4. Results**

The script successfully generated a **CSV file** containing research papers with
industry-affiliated authors. Example output:

| PubmedID | Title | Publication Date | Non-Academic Authors |


Company Affiliations | Corresponding Author Email |
| -------- | ---------------------- | ---------------- | -------------------- |
-------------------- | ------------------------------------------------------ |
| 12345678 | COVID-19 Vaccine Study | 2023-08-15 | John Doe |
Pfizer | [[email protected]](mailto:[email protected]) |
| 87654321 | mRNA Vaccine Research | 2022-11-20 | Jane Smith |
Moderna | [[email protected]](mailto:[email protected]) |

### **Key Findings:**

- Successfully retrieved **relevant research papers** based on search queries.


- Correctly identified **pharmaceutical and biotech affiliations**.
- Extracted **author details & contact emails** for further study.

## **5. Conclusion**

This project efficiently automates the retrieval of **industry-affiliated research


papers** from **PubMed**, presenting results in a structured CSV format. The
approach ensures a **scalable and reusable** solution for filtering research papers
by **company affiliations**.

### **Future Enhancements:**

- **Improved Affiliation Detection**: Use **AI-based entity recognition** for more


precise filtering.
- **Database Storage**: Implement a **structured database** for better querying.
- **Advanced Filtering Options**: Add filters for **date range, author names, and
specific companies**.

This research tool provides an **automated, scalable, and efficient** method for
identifying industry-affiliated research studies in PubMed.

You might also like