Data Science Portfolio For Success
Data Science Portfolio For Success
11
YOUSSEF HOSNI
~ Youssef
Actions:
● Choose three domains of interest depending on
your experience and research background.
Actions:
● Select the market you would like to work in.
● Do market research and find the intersection
between your interest and the market need.
Actions:
● Read the recent data science job requirements
for the companies you are interested in or read
about the projects these companies are currently
working on.
● If you do not find much information on the job
requirements or the company website, you can
contact data scientists that are working there.
● Find three case studies or business problems
they are using data science to solve it.
● Repeat this for different companies you are
interested in working there.
Actions:
● Discover the different areas of AI and know your
interests.
● Narrow down your case studies into at least
three that cover the basic machine learning tasks
and focus on your area of interest in AI.
● Define which case study is solved by which
machine learning task and which data type.
Actions:
● Brainstorm different AI& data science solutions
for your business problem
● Evaluate them based on your criteria and select
the one that best meets them.
● Validate your potential solution.
Actions:
● Study the problem and define the success
metrics for your project, both the machine
learning and the business metrics.
● Kaggle
● Scrape your own data.
● Ask for data.
● Use open datasets from universities, NGO
organizations, or governmental organizations.
Actions:
● Search for different data sources that can
provide you with a unique dataset that can fit
your project.
● Collect and store the data.
Actions:
● Clean the data.
● Explore the data.
● Feature engineering to make it ready for
modeling.
● Model explainability
● In memory vs. out memory
● Number of features and instances
● Categorical vs. numerical features
Actions:
● Select suitable models for your problem.
● Split the data.
● Train the model
● Choose suitable evaluation metrics.
● Optimize the model hyperparameters.
● Test your model on the testing data.
Actions:
● Deploy the trained model into production.
● Integrate the model into a mobile or a web
application.
● Monitor the model’s performance.
● Iterate
You can also write a blog explaining each step and show
the insights you got from the data. You can create a
YouTube video explaining the project step and show the
results and the insights you got from the data and how
you answered the business questions, and how the
model works in production.
Actions:
● Upload your project on GitHub & Publish it on
your professional social media channel.
● Write a comprehensive readme file for your
project.
● Record a short video of your project to
demonstrate how it works.
● Invite people to try your project and give you
feedback.
● Write a blog about your project.
● Record a long video explaining the project steps.
You will also have a custom domain like this one, that
you can share on your social media and resume. You
can also get inspired for your next project by browsing
by topic all the projects added by the community here.
You will then evaluate the training jobs and look at some
metrics such as Precision, Recall, and F1 Score. Upon
evaluation, you will deploy the deep learning model on
AWS with the help of AWS API Gateway and Lambda
functions.
You will then test our API with Postman and see if we
get inference results after that is completed and will
secure our endpoints and set up autoscaling to prevent
latency issues. Finally, you will build our web application
which will have access to the AWS API. After that, you
will deploy our web application to DigitalOcean.
6.2. Kaggle
Kaggle is a popular platform that hosts a large collection
of datasets and competitions for data scientists and
machine learning enthusiasts. The platform was
launched in 2010 and has since grown to become one of
the most popular online communities for data scientists.
One of the main features of Kaggle is its vast collection
of datasets, which cover a wide range of topics and
come from a variety of sources. Users can browse and
search through the datasets on the website and
download them for free. This makes it a great resource
6.4. Data.gov
Data.gov is an online repository maintained by the U.S.
government that provides access to a vast collection of
datasets. The website was launched in 2009 with the
goal of promoting transparency and openness in
government by making federal data available to the
public.
6.7. Pudding.cool
Pudding.cool is a website that specializes in visual
storytelling using data visualization and interactive
graphics. They produce original content on a wide range
of topics, including pop culture, politics, and social
issues.
6.8. FiveThirtyEight
FiveThirtyEight is a popular website that specializes in
data journalism and statistical analysis. It was founded
by Nate Silver in 2008 and is now owned by ABC News.
The website covers a wide range of topics, including
politics, sports, economics, and culture. They are known
for their unique approach to reporting, which combines
data analysis with traditional reporting methods.
6.9. KDnuggets
KDnuggets covers a wide range of topics related to data
science, including news and trends, tutorials, job listings,
and educational resources. They also provide a
collection of open datasets that can be used for research
and analysis.
6.10. Buzzfeed
BuzzFeed is a digital media company that produces and
distributes news, entertainment, and lifestyle content
across a variety of platforms, including its website and
social media channels.
7.1. Datascienceportfol.io
datascienceportfol.io is a tool that will take your data
science portfolio representation to the next level. You
can use it to build your own portfolio website for free to
showcase your projects in a recruiter-friendly way. In
addition to that you will get a personalized URL that you
7.2. Voilà
If you’re a data scientist who enjoys working with Python
and is interested in learning web development, then
Voilà is going to be your new best friend! This amazing
library allows you to effortlessly create impressive web
applications and interactive dashboards using your
Jupyter notebooks.
7.4. DagsHub
DagsHub is an innovative platform for managing and
collaborating on data science projects, offering tools