A sophisticated multi-type research and analysis system that combines web content extraction, PDF processing, and specialized domain research to provide comprehensive insights across various research domains.
- Developer Tools Research: Analyze development tools, APIs, and technologies
- Product Research: Compare products, features, and alternatives
- Educational Research: Evaluate institutions, programs and courses
- Financial Research: Analyze stocks, companies, and market trends
- Technical Documentation: Review APIs, SDKs, and technical specifications
- Industry Research: Study market trends, competitors, and industry analysis
- General Research: Fallback for uncategorized queries.
- Web Content Extraction: Advanced web scraping and content analysis
- PDF Document Processing: Intelligent PDF analysis with relevance scoring
- Entity Extraction: Automated identification of companies, products, and technologies
- Context-Aware Analysis: Domain-specific insights and recommendations
- PDF Selection Options:
- Auto-select relevant PDFs
- Manual PDF selection
- Relevance-based PDF ranking
- S3 Integration: Secure PDF storage and retrieval
- Relevance Scoring: Intelligent PDF filtering based on query context
Each research type provides tailored output with domain-specific metrics and insights.
The system uses a multi-agent architecture with specialized nodes for different research types:
- Intent Detection Agent: Classifies research queries
- Content Extraction Agent: Processes web content
- PDF Processing Agent: Analyzes PDF documents
- Specialized Research Agents: Domain-specific analysis
- Analysis & Synthesis Agent: Combines insights and generates recommendations
- Python 3.8+
- AWS S3 access (for PDF storage)
- Required API keys (configured via environment variables)
-
Clone the repository
git clone <repository-url> cd Advanced-Research-Agent
-
Install dependencies
cd advanced-agent pip install -r requirements.txt -
Configure environment variables
cp .env.example .env # Edit .env with your API keys and configuration -
Run the application
python main.py
Create a .env file with the following variables:
# API Keys
ANTHROPIC_API_KEY=your_anthropic_key
FIRECRAWL_API_KEY=your_firecrawl_key
# AWS Configuration
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-east-2
S3_BUCKET_NAME=your_bucket_name
# Application Settings
MAX_ENTITIES=10
MAX_PDFS=5python main.py- Regular Research: Auto-selects relevant PDFs
- Select Specific PDFs: Choose specific PDFs for analysis
- Find Relevant PDFs: Rank PDFs by relevance to query
- "Best tools to build AI agents"
- "Compare Python web frameworks"
- "Top universities for computer science"
- "Stock analysis for tech companies"
- "API documentation for payment processing"
1. π’ LangChain
π Website: https://langchain.com
π° Pricing: Freemium
π Open Source: Yes
π οΈ Tech Stack: Python, JavaScript, TypeScript
π» Language Support: Python, JavaScript, TypeScript
π API: β
Available
π Integrations: OpenAI, Anthropic, Pinecone
1. π± ChatGPT
π·οΈ Category: AI Assistant
π Website: https://chat.openai.com
π° Price: Free + Premium
β Rating: 4.8/5
β¨ Features: Natural language processing, Code generation
π― Target: Developers, Content creators
- Environment variables for sensitive configuration
- AWS IAM roles for S3 access
- Secure API key management
- No hardcoded credentials in source code
Advanced-Research-Agent/
βββ advanced-agent/
β βββ main.py # Main application entry point
β βββ src/
β β βββ workflow.py # Workflow orchestration
β β βββ models.py # Data models
β β βββ prompts.py # LLM prompts
β β βββ firecrawl.py # Web content extraction
β β βββ pdf_notetaker.py # PDF processing
β β βββ s3_pdf_service.py # S3 PDF management
β βββ pyproject.toml # Project configuration
β βββ README.md # Agent-specific documentation
βββ simple-agent/ # Simplified version
βββ .gitignore # Git ignore rules
βββ README.md # This file
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with LangChain for LLM orchestration
- Uses Anthropic Claude for AI analysis
- Powered by Firecrawl for web content extraction
- PDF processing with pdfplumber and PyPDF2
For questions or support, please open an issue in the GitHub repository.
Note: This is a research and development tool. Please ensure compliance with relevant terms of service and data privacy regulations when using third-party APIs and services.