0% found this document useful (0 votes)

17 views13 pages

Xujia Wei - Data Science Portfolio

Xujia Wei's portfolio showcases expertise in data analysis, statistical modeling, and machine learning across various projects, including housing market analysis and COVID-19 data trends. The portfolio emphasizes skills in Python, SQL, data visualization, and predictive analytics, demonstrating a strong problem-solving mindset and the ability to derive meaningful insights from complex datasets. Each project highlights contributions to data wrangling, statistical modeling, and visualization, aimed at informing business strategies and optimizing processes.

Uploaded by

xujiaweijessica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views13 pages

Xujia Wei - Data Science Portfolio

Uploaded by

xujiaweijessica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Data Science

Portfolio

Xujia Wei
This portfolio highlights my experience in data analysis, statistical modeling,
and machine learning, applied to real-world challenges in urban planning,
finance, and predictive modeling. Through projects on housing market
analysis, economic inequality, and spam classification, I have developed
strong skills in Python, SQL, data visualization, and predictive analytics to
extract meaningful insights from complex datasets.

My experience demonstrates a problem-solving mindset that combines

exploratory data analysis (EDA), business intelligence, and data-driven
decision-making to optimize processes and inform strategy. Whether through
building predictive models, analyzing trends, or presenting insights to
stakeholders, these projects showcase my ability to transform data into
meaningful business solutions, making me well-equipped for data and
business analyst roles.

XUJIA WEI PORTFOLIO ⸺ LAST UPDATED: FEB 2025

Page 01
EXAMPLE #1
Bike Sharing Data Analysis and
Visualization
Summary
This project analyzes a bike-sharing dataset from Washington, DC, to understand user
behaviors, trends, and key factors affecting bike rentals. Through data wrangling, visualization,
and exploratory data analysis (EDA), insights were derived about rental patterns, peak usage
times, and seasonal variations.
Problem and Approach
Bike-sharing systems are widely used in urban environments, but understanding user behavior
and operational efficiency requires detailed data analysis. This project aimed to clean, process,
and visualize bike rental data to uncover trends and insights. The approach involved:
Data Cleaning: Processing raw data, handling missing values, and structuring it for
analysis.
Exploratory Data Analysis (EDA): Using statistical summaries and visualizations to identify
rental patterns.
Data Visualization: Creating informative plots to illustrate key trends, such as
daily/seasonal demand and correlations with weather conditions.
Contribution
Data Wrangling: Processed and transformed raw rental data using Pandas.
Visualization & EDA: Created multiple visualizations using Matplotlib and Seaborn to
analyze trends.
Statistical Insights: Identified key rental patterns, including peak usage times, seasonal
variations, and user preferences.
Critical Thinking: Answered open-ended analytical questions about the impact of various
factors on bike rental trends.
Results and Impact
Discovered that bike rental demand peaks during commuting hours, indicating a strong
usage pattern among working professionals.
Found a clear correlation between weather conditions and rental demand, where adverse
weather reduced usage.
Highlighted the importance of weekend vs. weekday demand differences, providing
insights for bike-sharing system optimizations.

Skills Gained and Tools Used

Data Analysis & EDA Data Visualization

Applied Pandas and NumPy to clean, Utilized Matplotlib and Seaborn to
process, and analyze large-scale bike- create clear, informative charts
sharing data for meaningful insights. highlighting trends in bike rental
behaviors.

Statistical Thinking Python & Data Wrangling

Interpreted data trends, identified Used Python libraries (Pandas,
correlations between variables (e.g., Matplotlib, Seaborn) to manipulate
weather conditions, time of day), and datasets, handle missing values, and
extracted business insights. structure data for analysis.
Page 02
EXAMPLE #1
Bike Sharing Data Analysis and Visualization (Continued)

Page 03
EXAMPLE #2
COVID-19 Data Analysis and Estimation
Models
Summary
This project examines a dataset of daily COVID-19 cases across U.S. counties, incorporating
vaccination rates and related metadata to understand the factors influencing case trends. The
analysis involved statistical modeling techniques, including bootstrap sampling, bias-variance
tradeoff analysis, and multicollinearity detection. By leveraging these methods, the project aimed
to improve predictive accuracy and assess pandemic-related trends for data-driven insights.
Problem and Approach
Understanding COVID-19 trends and predicting case numbers is critical for public health planning.
However, estimating trends from noisy real-world data presents challenges such as bias,
variance, and data dependencies.
The approach involved:
Bootstrap Sampling: Generating resampled datasets to estimate the distribution of statistics.
Bias-Variance Tradeoff: Evaluating predictive models to balance complexity and
generalization.
Multicollinearity Analysis: Identifying and mitigating redundant features in regression models.
Contribution
Data Wrangling & Cleaning: Processed COVID-19 data using Pandas and NumPy.
Statistical Modeling: Applied Scipy and Sklearn to perform bias-variance decomposition and
evaluate estimators.
Visualization & Insights: Used Matplotlib and Seaborn to create meaningful graphs explaining
pandemic trends.
Machine Learning Techniques: Explored regression models and feature selection to improve
prediction accuracy.
Results and Impact
Demonstrated how bootstrap resampling reduces variability in estimator calculations.
Highlighted the tradeoff between bias and variance, optimizing model performance.
Identified multicollinearity issues in COVID-19 predictors, leading to better feature selection.
Developed key takeaways for public health decision-making based on case and vaccination
trends.

Skills Gained and Tools Used

Statistical Modeling Bootstrap Sampling

Applied bias-variance tradeoff, Implemented resampling techniques
regression, and probability estimators to assess uncertainty in statistical
for COVID-19 case analysis. estimates.

Python for Data Science Data Visualization

Used Pandas, NumPy, Scipy, Sklearn Created Seaborn and Matplotlib plots
for data manipulation and statistical to convey insights on case trends and
computation. model performance.
Page 04
EXAMPLE #2
COVID-19 Data Analysis and Estimation Models (Continued)

Page 05
EXAMPLE #3
IMDB DATA ANALYSIS WITH SQL
Summary
This project utilizes SQL to analyze the Internet Movie Database (IMDb), extracting insights
into movies, actors, and ratings. The goal was to formulate and execute SQL queries to
explore trends, relationships, and anomalies in the dataset.

Problem and Approach

The IMDb database contains vast amounts of structured data on movies, ratings, and actors,
but extracting meaningful insights requires querying relational databases efficiently. The
approach involved:
Database Querying: Writing SQL queries to retrieve relevant data.
Exploratory Data Analysis: Analyzing patterns in movie ratings, genres, and actor
collaborations.
Data Visualization: Using Python tools to illustrate key findings.

Contribution
SQL Query Development: Executed complex SQL queries using SQLite to extract insights
from IMDb data.
Database Management: Leveraged SQLAlchemy and pandas to manipulate relational
data.
Visualization & Reporting: Used Matplotlib, Seaborn, and Plotly to present key trends in
movies and ratings.
Trend Analysis: Investigated factors affecting movie ratings, including actor
collaborations and genre patterns.

Results and Impact

Identified highly-rated movie genres and their trends over time.
Analyzed actor collaborations and their impact on movie ratings.
Highlighted discrepancies in rating distributions, providing insights into IMDb rating
biases.
Demonstrated how SQL can efficiently extract valuable insights from structured
databases.

Skills Gained and Tools Used

SQL & Database Management Data Visualization

Developed and optimized SQL queries Used Matplotlib, Seaborn, and Plotly to
to retrieve and analyze structured IMDb create insightful visualizations of IMDb
data. trends.

Exploratory Data Analysis (EDA) Python for SQL Integration

Performed statistical analysis of IMDb Utilized SQLAlchemy, Pandas, and
movie ratings, genres, and actor Jupyter Notebooks to interact with and
collaborations. manipulate database records.
Page 06
EXAMPLE #3
IMDb Data Analysis with SQL (Continued)

Page 07
EXAMPLE #4
COOK COUNTY HOUSING MARKET
ANALYSIS
Summary
This project examines housing data from Cook County to analyze market trends, identify
influential property factors, and build predictive models for housing prices. The first phase (A1)
focused on exploratory data analysis (EDA), understanding pricing patterns, and ensuring
fairness in valuation. The second phase (A2) applied machine learning techniques to predict
property prices based on real estate attributes.
Problem and Approach
The real estate market involves complex pricing structures influenced by location, property
size, economic conditions, and other factors. The project addressed this by:
Cleaning and processing real estate data to detect trends.
Identifying key price determinants, such as square footage, neighborhood, and location.
Evaluating fairness in valuation across different neighborhoods.
Developing regression models to estimate home prices.
Performing feature engineering to improve model accuracy.
Evaluating model performance using statistical validation techniques.
Contribution
Data Wrangling & Preparation: Processed and structured Cook County housing datasets
using Pandas and NumPy.
Exploratory Data Analysis (EDA): Analyzed housing market trends and influential property
features.
Machine Learning Models: Built linear regression models to predict home prices with high
accuracy.
Bias & Fairness Analysis: Assessed whether property valuations were equitable across
different regions.
Results and Impact
Uncovered price trends across Cook County, highlighting key pricing drivers.
Improved housing price prediction accuracy through feature engineering and model
refinement.
Identified valuation biases, ensuring fairness in predictive pricing models.
Demonstrated the power of data science in real estate valuation and investment analysis.

Skills Gained and Tools Used

Real Estate Market Analysis Data Wrangling & Processing

Analyzed housing prices, valuation Applied Pandas, NumPy, and SQL to
fairness, and market trends using data- clean and structure real estate data for
driven methods. analysis.

Machine Learning & Regression Models Data Visualization & Statistical Analysis
Built predictive models (linear Utilized Seaborn, Matplotlib, and
regression, feature engineering) to Scikit-learn to analyze pricing patterns
estimate housing prices. and model performance.
Page 08
EXAMPLE #4
Cook County Housing Market Analysis (Continued)

Page 09
EXAMPLE #5
SPAM EMAIL CLASSIFICATION USING
MACHINE LEARNING
Summary
This project develops a binary classification model to distinguish between spam (junk,
commercial, or bulk) emails and ham (regular emails). The first phase (B1) focuses on
exploratory data analysis, feature engineering, and initial logistic regression modeling, while
the second phase (B2) builds on this foundation to optimize classification models, perform
cross-validation, and analyze model performance using real-world email datasets.
Problem and Approach
Spam detection is a crucial application in cybersecurity and email filtering, requiring robust
machine learning techniques to identify patterns in text data. The approach involved:
Extracting features from email text using NLP techniques (word frequency, n-grams,
stopword filtering).
Applying logistic regression to develop a baseline spam classifier.
Evaluating model accuracy and initial performance using confusion matrices and
precision-recall metrics.
Implementing advanced classification models (e.g., Naïve Bayes, Random Forest, SVM).
Performing hyperparameter tuning and cross-validation to improve model performance.
Generating ROC curves and AUC scores to assess classifier effectiveness.
Contribution
Feature Engineering for Text Data: Extracted relevant features from email text using TF-
IDF, bag-of-words, and tokenization techniques.
Supervised Machine Learning: Built and optimized classification models using Scikit-learn.
Model Performance Evaluation: Analyzed confusion matrices, precision-recall, and ROC-
AUC curves to assess model effectiveness.
Overfitting Prevention & Validation: Applied cross-validation and hyperparameter tuning to
ensure model generalization.
Results and Impact
Achieved high classification accuracy using optimized spam detection models.
Identified key patterns in spam emails based on word frequency and NLP-based feature
extraction.
Improved model precision-recall tradeoff to reduce false positives in email classification.
Provided a scalable framework for real-world spam detection in email filtering systems.

Skills Gained and Tools Used

Natural Language Processing (NLP) Machine Learning & Classification Models

Extracted and processed text-based Built and optimized spam detection
features for classification using TF-IDF, models using Logistic Regression,
bag-of-words, and n-grams. Naïve Bayes, and Random Forest.

Model Evaluation & Performance Metrics Python for Data Science

Assessed classification effectiveness Utilized Scikit-learn, NumPy, Pandas,
using ROC-AUC, confusion matrices, and Seaborn for feature engineering,
and precision-recall analysis. modeling, and visualization.
Page 10
EXAMPLE #5
Spam Email Classification Using Machine Learning (Continued)

Page 11
EXAMPLE #6
CITY PLANNING IN SAN FRANCISCO
Summary
This project investigates the relationship between urban planning, economic disparity, and
accessibility in San Francisco through a data-driven approach. By applying Marxist theory, the
study examines how capitalist-driven urban development has shaped economic stratification,
labor force trends, and infrastructure accessibility. The analysis is conducted using historical
labor force and income inequality data, spatial visualizations, and regression models to
highlight systemic inequalities and propose equitable planning strategies.
Problem and Approach
San Francisco's urban development reflects a growing divide between affluent communities
and marginalized populations, driven by economic cycles and city planning decisions. This
project explores:
Labor Force & Income Inequality Trends: Analyzed changes in the civilian labor force over
time and its correlation with wealth disparity.
Impact of Economic Crises on Employment: Studied how the 2008 financial crisis and the
COVID-19 pandemic disrupted employment trends and exacerbated inequality.
Spatial Analysis of Unemployment Rates: Mapped unemployment distribution by zip code
from 2019 to 2022 to highlight areas most affected by economic downturns.
Accessibility in Urban Design: Examined the distribution of curb ramps to assess whether
public infrastructure investments are equitably allocated.
Contribution
Data Collection & Processing: Cleaned and analyzed datasets.
Statistical Analysis: Conducted regression modeling to quantify the relationship between
workforce participation and economic disparity.
Geospatial Visualization: Created heatmaps of unemployment rates and accessibility
distributions using Python-based mapping tools.
Urban Policy Recommendations: Proposed equitable city planning strategies that balance
economic growth with social inclusion.
Results and Impact
Confirmed strong correlation between labor force changes and income inequality in SF.
Identified unemployment disparities across zoning districts, advocating targeted
interventions.
Found curb ramp distribution uniform, challenging assumptions of wealth-based favoritism.
Highlighted economic crises' impact on spatial inequalities, urging inclusive urban planning.

Skills Gained and Tools Used

Data Analysis & Visualization Geospatial Mapping

Utilized Python (Pandas, Matplotlib, Created heatmaps and spatial
Seabornl, Plotly, Geopandas) to visualizations to examine
analyze historical labor force and unemployment distribution and curb
income inequality data. ramp accessibility.

Regression & Statistical Modeling Urban Policy & Planning

Developed predictive models to Applied Marxist theory and economic
measure the relationship between analysis to critique city planning
workforce trends and economic strategies and propose equitable
inequality. solutions.
Page 12
EXAMPLE #6
City Planning in San Francisco (Continued)

Page 13

18.4 Evaluating and Choosing The Best Hypothesis: Model Selection: Complexity vs. Goodness of Fit
No ratings yet
18.4 Evaluating and Choosing The Best Hypothesis: Model Selection: Complexity vs. Goodness of Fit
8 pages
Data Analysis Visualization Full Project
No ratings yet
Data Analysis Visualization Full Project
19 pages
RHadoop
No ratings yet
RHadoop
50 pages
Sma Exp4 Ayu
No ratings yet
Sma Exp4 Ayu
6 pages
Bda U-3
No ratings yet
Bda U-3
30 pages
Data Analytics Detailed Curriculum Latest
No ratings yet
Data Analytics Detailed Curriculum Latest
7 pages
Antim Prahar Data Analytics For Business Decisions 2025 - Compressed
No ratings yet
Antim Prahar Data Analytics For Business Decisions 2025 - Compressed
44 pages
Data Analytics III-i
No ratings yet
Data Analytics III-i
85 pages
Da End Sem
No ratings yet
Da End Sem
5 pages
Palak Garg Resume
No ratings yet
Palak Garg Resume
2 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
DAV Question Bank
No ratings yet
DAV Question Bank
5 pages
Ayush Report
No ratings yet
Ayush Report
17 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
No ratings yet
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
14 pages
2022 Dec. ITT401-A
No ratings yet
2022 Dec. ITT401-A
2 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
GCD Detailed Syllabus
No ratings yet
GCD Detailed Syllabus
24 pages
Portofolio Siti Nurjanah - Data Analyst Di Iaf Multifinance
0% (1)
Portofolio Siti Nurjanah - Data Analyst Di Iaf Multifinance
17 pages
Total Documentation
No ratings yet
Total Documentation
21 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Udacity Dandsyllabus
No ratings yet
Udacity Dandsyllabus
7 pages
Module1 DS
No ratings yet
Module1 DS
61 pages
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
12 pages
Resume 1
No ratings yet
Resume 1
3 pages
BDA Mod1
No ratings yet
BDA Mod1
15 pages
Data Analytics Lesson Plan
No ratings yet
Data Analytics Lesson Plan
11 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
Approaches in Data Analysis (Slides) (Re-Brand)
No ratings yet
Approaches in Data Analysis (Slides) (Re-Brand)
13 pages
PythonDASE - 2025 Version1
No ratings yet
PythonDASE - 2025 Version1
44 pages
Session1 DataCharacteristics
No ratings yet
Session1 DataCharacteristics
41 pages
Prasad Shinde Data Analytics Portfolio
No ratings yet
Prasad Shinde Data Analytics Portfolio
29 pages
Apply R Tool For Developing and Evaluating Real Time Applications
No ratings yet
Apply R Tool For Developing and Evaluating Real Time Applications
1 page
TE AINDS Syllabus REV 2019 - DAV
No ratings yet
TE AINDS Syllabus REV 2019 - DAV
3 pages
Data Visualization The Ultimate Handiwork of Data Analysis and Deployment of Mern App
No ratings yet
Data Visualization The Ultimate Handiwork of Data Analysis and Deployment of Mern App
55 pages
All Resume Projects Mention Brief Imp
No ratings yet
All Resume Projects Mention Brief Imp
4 pages
Data Analysis Salary of Data Professions
No ratings yet
Data Analysis Salary of Data Professions
14 pages
Sundar Raghvan
No ratings yet
Sundar Raghvan
2 pages
Presentation 1
No ratings yet
Presentation 1
14 pages
Interview Study Guide
No ratings yet
Interview Study Guide
16 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
BDA - Regular - End-Sem QB
No ratings yet
BDA - Regular - End-Sem QB
4 pages
N N N N N N: A Ovel Approach To A Alyze Uber Datausi G Machi E Lear I G
No ratings yet
N N N N N N: A Ovel Approach To A Alyze Uber Datausi G Machi E Lear I G
17 pages
Kavin
No ratings yet
Kavin
13 pages
Set 2
No ratings yet
Set 2
3 pages
Coursera Report
No ratings yet
Coursera Report
6 pages
Chinmayee Khade Data Science
No ratings yet
Chinmayee Khade Data Science
1 page
Module 1
No ratings yet
Module 1
91 pages
Unit 4 DA Revised
No ratings yet
Unit 4 DA Revised
102 pages
UNIT - 1 EDA Continuation
No ratings yet
UNIT - 1 EDA Continuation
113 pages
DSV Lab Manual
No ratings yet
DSV Lab Manual
12 pages
Training Report On Data Analysis With Python
No ratings yet
Training Report On Data Analysis With Python
12 pages
MD Younus Khan FlowCV Resume 20250205
No ratings yet
MD Younus Khan FlowCV Resume 20250205
2 pages
Brief - DAMC - Version A
No ratings yet
Brief - DAMC - Version A
2 pages
5 Data Analytics Projects For Beginners - CourseraG
No ratings yet
5 Data Analytics Projects For Beginners - CourseraG
6 pages
Approaches in Data Analysis (Slides)
No ratings yet
Approaches in Data Analysis (Slides)
13 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
138 pages
Dlsu Aki Working Paper Series 2022-09-085
No ratings yet
Dlsu Aki Working Paper Series 2022-09-085
48 pages
Feature Analysis of Predictive Maintenance Models: Zhaoan Wang
No ratings yet
Feature Analysis of Predictive Maintenance Models: Zhaoan Wang
4 pages
SSRN 4165241
No ratings yet
SSRN 4165241
28 pages
Answers 111111111111111111111111111
No ratings yet
Answers 111111111111111111111111111
21 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
27 pages
Earthquake Prediction Model - Based On Geomagnetic Field Data Using Automated Machine Learning
No ratings yet
Earthquake Prediction Model - Based On Geomagnetic Field Data Using Automated Machine Learning
5 pages
Summary Business Analytics
No ratings yet
Summary Business Analytics
24 pages
ML Lab File
No ratings yet
ML Lab File
48 pages
ML 1 Lecture 2
No ratings yet
ML 1 Lecture 2
50 pages
PLS-UV Spectrophotometric Method For The Simultaneous Determination of Paracetamol, Acetylsalicylic Acid and Caffeine in Pharmaceutical Formulations
No ratings yet
PLS-UV Spectrophotometric Method For The Simultaneous Determination of Paracetamol, Acetylsalicylic Acid and Caffeine in Pharmaceutical Formulations
4 pages
Modeling The Potential Distribution of The Threatened Grey Necked Picathartes Picathartes Oreas Across Its Entire Range
No ratings yet
Modeling The Potential Distribution of The Threatened Grey Necked Picathartes Picathartes Oreas Across Its Entire Range
9 pages
Crime Type and Occurrence Prediction Using Machine Learning Algorithm
No ratings yet
Crime Type and Occurrence Prediction Using Machine Learning Algorithm
8 pages
Topic Modeling On The Indian Express News Article
No ratings yet
Topic Modeling On The Indian Express News Article
7 pages
Machine Learning in Java - Sample Chapter
100% (1)
Machine Learning in Java - Sample Chapter
26 pages
Updated Used Cars Price Prediction Using Machine Learning
No ratings yet
Updated Used Cars Price Prediction Using Machine Learning
24 pages
A Big Data-Driven Hybrid Model For Enhancing Streaming Service Customer Retention Through Churn Prediction Integrated With Explainable AI
No ratings yet
A Big Data-Driven Hybrid Model For Enhancing Streaming Service Customer Retention Through Churn Prediction Integrated With Explainable AI
21 pages
Stress, Schizophrenia, and Violence: A Machine Learning Approach
No ratings yet
Stress, Schizophrenia, and Violence: A Machine Learning Approach
21 pages
Spot Welding
No ratings yet
Spot Welding
15 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
5 pages
Predictive Analytics Notes
No ratings yet
Predictive Analytics Notes
42 pages
Behavior Revealed in Mobile Phone Usage Predicts Credit Repayment
No ratings yet
Behavior Revealed in Mobile Phone Usage Predicts Credit Repayment
28 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Model For The Prediction of Default Risk of Funding Requests Using Data Mining Sameh Ali 2
No ratings yet
Model For The Prediction of Default Risk of Funding Requests Using Data Mining Sameh Ali 2
8 pages
Ok3 2021 2
No ratings yet
Ok3 2021 2
10 pages
HRMS Demo Material
No ratings yet
HRMS Demo Material
6 pages
Siddhanth Garg Resume
No ratings yet
Siddhanth Garg Resume
1 page

Xujia Wei - Data Science Portfolio

Uploaded by

Xujia Wei - Data Science Portfolio

Uploaded by

Data Science

My experience demonstrates a problem-solving mindset that combines

XUJIA WEI PORTFOLIO ⸺ LAST UPDATED: FEB 2025

Skills Gained and Tools Used

Data Analysis & EDA Data Visualization

Statistical Thinking Python & Data Wrangling

Skills Gained and Tools Used

Statistical Modeling Bootstrap Sampling

Python for Data Science Data Visualization

Problem and Approach

Results and Impact

Skills Gained and Tools Used

SQL & Database Management Data Visualization

Exploratory Data Analysis (EDA) Python for SQL Integration

Skills Gained and Tools Used

Real Estate Market Analysis Data Wrangling & Processing

Skills Gained and Tools Used

Natural Language Processing (NLP) Machine Learning & Classification Models

Model Evaluation & Performance Metrics Python for Data Science

Skills Gained and Tools Used

Data Analysis & Visualization Geospatial Mapping

Regression & Statistical Modeling Urban Policy & Planning

You might also like