Internship Assignment Coding2024
Internship Assignment Coding2024
Coding Challenge
AI/ML Engineering Intern
For This Assignment, we would like you to demonstrate how you would approach the task of
gathering the required information for the case study using Python programming.
Speci cally, we want you to write a Python script that searches the internet for information related to
Canoo, a publicly traded company listed on NASDAQ (ticker symbol: GOEV).
Your script should retrieve data from various online sources and store it in a CSV le for further
analysis.
fi
fi
fi
Write a Python Code for Retrieval: The rst step involves in RAG
Text
Scrape the data Store in Database Convert into Run Queries in
Queries List of Query links Summarization
from web links (.csv) vector database vector Database
Output
Retrieval: The rst step involves in RAG retrieving relevant documents or information from a large database Generate Report
or corpus in response to a query or prompt. This retrieval is typically done using a vector similarity search,
where both the query and the documents are embedded into a high-dimensional space, and the most
relevant documents are selected based on their similarity to the query vector.
Your task is to write a script that will search the following queries using a internet search APIs, then scrape
the data from the links and save in a structured tabular format in a *.csv le
1. Identify the industry in which Canoo operates, along with its size, growth rate, trends, and key players.
2. Analyze Canoo's main competitors, including their market share, products or services offered, pricing
strategies, and marketing efforts.
3. Identify key trends in the market, including changes in consumer behavior, technological
advancements, and shifts in the competitive landscape.
4. Gather information on Canoo's nancial performance, including its revenue, pro t margins, return on
investment, and expense structure.
To help you get started, we have attached a sample report that provides an overview of the information we expect you to gather. Please note that this
report is for reference only, and you are not expected to generate a similar report. Instead, we want to see how you would approach the task of
extracting relevant data from various online sources using Python.
fi
fi
fi
fi
Coding Task
1. A brief summary of the steps you took to complete the task, including any challenges you faced and
how you overcame them.
2. A link to a GitHub repository containing your Python script and any necessary dependencies.
3. A sample output of the data retrieved from the internet, stored in a CSV le or other suitable format.
We look forward to reviewing your submission and assessing your skills in Python programming and
data collection. Good luck!
fi
References
Libraries
• scrapy
• selenium
• beautifulSoup
• duckduckgo_search
• requests
Similar tool
• GPT researcher
• https://github.com/assafelovic/gpt-researcher
Link
• https://en.wikipedia.org/wiki/Large_language_model
• https://www.ibm.com/docs/en/watsonx-as-a-service?topic=models-retrieval-augmented-generation