Week7 DataProcessionMethod

Uploaded by

18782101508

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views6 pages

Week7 DataProcessionMethod

Uploaded by

18782101508

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Ⅱ.

Questions

降维技术可以分为线性和非线性两大类：
★线性降维技术。侧重让不相似的点在低维表示中分开。
①PCA（Principle Components Analysis，主成分分析）
②MDS（Multiple Dimensional Scaling，多维缩放）等
★非线性降维技术（广义上“非线性降维技术”≈“流形学习”，狭义上后者
是前者子集）。这类技术假设高维数据实际上处于一个比所处空间维度低的非
线性流形上，因此侧重让相似的近邻点在低维表示中靠近。
①Sammon mapping
②SNE （Stochastic Neighbor Embedding，随机近邻嵌入），t-SNE 是基于 SNE
的。
③Isomap（Isometric Mapping，等度量映射）
④MVU（Maximum Variance Unfolding）
⑤LLE（Locally Linear Embedding，局部线性嵌入）等

Data dimension reduction

Q1. Principal Component Analysis (PCA) 主成分分析

主成分为影响因素的线性组合
A1. Principal component analysis is a dimensionality reduction algorithm,
which can convert multiple indicators into a few principal components. These
principal components are linear combinations of original variables and are not
correlated with each other, which can reflect most of the information of the original
data. Generally speaking, when the research problem involves multiple variables and
there is a strong correlation between the variables, we can consider using the principal
component analysis method to simplify the data.
I. Introduction to Principal component analysis
Principal component analysis can replace more new variables with fewer new
variables, and it is these fewer new variables as much as possible to retain the original
reflected information.
Principal component analysis is a kind of data dimensionality reduction
algorithm. Dimensionality reduction is to retain some of the most important features
of high-dimensional data (too many indicators) and remove noise and unimportant
features, so as to achieve the purpose of improving the data processing speed.
Ⅱ. Methodology
1) Standardization;
2) Calculate the covariance matrix of standardized sample search;
3) Calculate the eigenvalues and eigenvalue vectors of the covariance matrix;
4) Calculate principal component sharing rate and cumulative contribution rate;
5) Generally, the first, second and... values corresponding to the eigenvalues
whose cumulative contribution rate exceeds 80% are taken into the principal
component.
清风数学建模学习笔记——主成分分析 (PCA)原理详解及案例分析 _Xiu Yan
的博客-CSDN 博客

Q2. Multidimensional Scaling (MDS) 多维缩放

多维缩放(multidimensional scaling ,MDS)，是另外一种线性降维方式，与
主成分分析法和线性降维分析法都不同的是，多维缩放的目标不是保留数据的
最大可分性，而是更加关注与高维数据内部的特征。多维缩放算法集中于保留
高维空间中的“相似度”信息，而在一般的问题解决的过程中，这个“相似
度”通常用欧式距离来定义。
A2. The goal of the analysis of Multidimensional Scaling is to display the
differences (or similarities) between the objects by graph. The distance between two
points in the graph can be used to represent the differences of the objects. The longer
the distance is, the greater the difference between the two objects; the closer the
distance can be.
Scaling: Projects,
Critical Points: Turn the multidimensional data into two dimensions.
sklearn 与机器学习系列专题之降维（三）一文弄懂 MDS 特征筛选 &降维_
南上加南的博客-CSDN 博客

Q3. T-distributed Stochastic Neighbor Embedding (t-SNE)

高维数据-降维至 1~2 维：t 分布随机邻域嵌入
A3. t-Distributed Neighbor Embedding (t-SNE) is a dimensionality reduction
technique used to represent high-dimensional data sets in a two-dimensional or three-
dimensional low-dimensional space for visualization. In contrast to other
dimensionality reduction algorithms such as PCA, t-SNE creates a reduced feature
space where similar samples are modeled by nearby points and dissimilar samples are
modeled by distant points with high probability.
At high levels, t-SNE constructs a probability distribution for high-dimensional
samples, with a high probability that similar samples will be selected, and a low
probability that different points will be selected. t-SNE then defines a similar
distribution for the points in the low-dimensional embedding. Finally, t-SNE
minimizes the Kullback-Leibler (KL) divergence between the two distributions with
respect to the location of the embedding point.
t-SNE(t-distributed neighbor embedding) is a machine learning algorithm used
for dimension reduction, which is proposed by Laurens van der Maaten et al in 2008.
In addition, t-SNE is a nonlinear dimension reduction algorithm, which is very
suitable for high-dimensional data to be reduced to 2 or 3 dimensions for
visualization. The algorithm can reduce the distance of t distribution in low-
dimensional space for points with greater similarity. For points with low similarity,
the distance of t distribution in low dimensional space needs to be further.

Q4. Shepard Diagrams

用于非度量多维标度分析法(Non-metric multidimensional scaling，NMDS)
A4. Shepard diagram - Oxford Reference
A plot of two measurements of the distances between objects. One measurement
is the true distance, and the other measurement is the apparent distance in some
representation of the objects. For example, the apparent distance between objects in a
photograph (two dimensions) and the real three-dimensional distance. The diagram is
used in multidimensional scaling to assess the extent of any distortion. Zero distortion
would correspond to a set of collinear points.
Shepard plot is a plot comparing actual or transformed proximity to predicted
proximity. The figure reflects the extent to which the multidimensional scaling graph
reflects actual proximity. A Shepard diagram is similar to a "predicted value versus
actual value" diagram. Ideally, these points fall on the Y = X line.
NMDS weakens the dependence on the actual distance value and puts more
emphasis on the ranking (rank) among the values. For example, the pair similarity
distance of the three samples, (1,2,3) and (10,20,30) are in the same order in NMDS
analysis, showing the same effect.
The NMDS analysis runs as follows:
1. Set the analysis dimension (usually 2-dimensional plane);
2. Build the initial structure and place the distance value (input data);
3. Judge the suitability of the model according to the comparison between the set
distance data and the original data (Stress judgment)

Shepard diagram。样本/观测点原始距离与 NMDS 排序结果比较。胁迫系数

表示原始差异的经 NMDS 排序后产生的变化，变化越小，表示 NMDS 排序效果
越好，越能准确反应样本的原始空间位置或梯度。 Non-metric fit R^2=1-
stress^2。拟合优度图(右侧)中气泡越大，观测在排序空间的位置与原始位置差
距越大。

pca（Principal Component Analysis，主成分分析)

思路：寻找一个从高维 N 到低维 d 的映射矩阵 W。使得投影后的方差最大
化（保留尽可能多的信息）
优点：
线性变换，新的维度是原始维度的线性组合（可解释性强比如 x' = 0.6*性
别 + 0.3 年龄）
保留全局结构
不涉及超参，结果稳定
计算速度较快
缺点：
区分度、表现力不突出
结果容易受离群点影响（保留全局结构）

mds（Metric Multidimensional Scaling, 多维尺度变换)

距离度量：欧式距离
优化目标：任意两个实例在 Z 维空间中的距离与原始空间的距离相同

tSNE（t-Dist Stochastic Neighborhood Embedding)

核心特色：用概率分布来表示数据点之间的距离
距离度量：
高维空间：高斯分布(方差 sigma 对于每个 xi 都是不同的，有 N 个分布）
低维空间：自由度为 1 的 t 分布
t 分布在将高维空间中较远的点，在低维空间中推的更远，能够减轻低维空
间的拥挤的问题
优化目标：KL 散度（刻画两个概率分布的差异程度）
优点：
降维区分度效果凸出
能处理离群点
缺点：
性能差
结果受超参影响明显，优化难度大
形状与密度无关（t 分布中去掉了密度的参数）
结果难以解读解释

Shepard Diagram：可视化方案
目标：针对降维的数据，提供可视化主体内容及辅助信息，从而帮助受众
更好地认知数据应该从哪些角度去分析、获取信息
全局视角：
1.形状探索（数据区分度）
通过标签打标看看不同类型人群的形状及分布（这里打标的方式可以是点
击、收藏加购之类的动作，也可以是不同定向类型的人群）。通过 Grid
Search，尝试多种参数结果，将其聚类找到主要的几种空间布局结构。并通过
定量的方式反应降维的效果
2.密度区分：tSNE，UMAP 等概率密度公式中均在低维抛弃了密度参数，
可以通过密度图辅助还原密度信息
局部视角：
1. 子群探索（显著特征分析）：探索某一特定的子群，相较其他人群显著
的特征是什么：
2. 特征空间分布分析
3. 边界构成及相关因素分析

Unit-5 Curve Fitting by Numerical Method
100% (2)
Unit-5 Curve Fitting by Numerical Method
10 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Mathematial Introduction To Data Science
No ratings yet
Mathematial Introduction To Data Science
158 pages
Mathematics As A Tool Data MGNT PDF
No ratings yet
Mathematics As A Tool Data MGNT PDF
60 pages
Unit 3
No ratings yet
Unit 3
102 pages
D3S2 - Unsupervised - Dimensionality Reduction
No ratings yet
D3S2 - Unsupervised - Dimensionality Reduction
81 pages
Lec 3
No ratings yet
Lec 3
60 pages
Feature Selection Extraction
No ratings yet
Feature Selection Extraction
24 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Isometric Projection: Report No. UIUCDCS-R-2006-2747 UILU-ENG-2006-1787
No ratings yet
Isometric Projection: Report No. UIUCDCS-R-2006-2747 UILU-ENG-2006-1787
24 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
L09 Dimensionality Reduction and Advanced Topics
No ratings yet
L09 Dimensionality Reduction and Advanced Topics
34 pages
Dimensionality Reduction Visualization
No ratings yet
Dimensionality Reduction Visualization
28 pages
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
No ratings yet
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
49 pages
降维
No ratings yet
降维
12 pages
ML Unit 4 (R22)
No ratings yet
ML Unit 4 (R22)
34 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Polo Chaur Dimension Reduction
No ratings yet
Polo Chaur Dimension Reduction
59 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
ML 4
No ratings yet
ML 4
14 pages
Week 1
No ratings yet
Week 1
19 pages
MBC W1-2 Notes
No ratings yet
MBC W1-2 Notes
21 pages
Linear (PCA, LDA) and Manifolds
No ratings yet
Linear (PCA, LDA) and Manifolds
15 pages
Multidimensional Scaling (MDS)
No ratings yet
Multidimensional Scaling (MDS)
18 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
14 pages
T Sne Implementation R Python
No ratings yet
T Sne Implementation R Python
19 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
No ratings yet
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
18 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Implementation of Dimensionality Reduction Techniques in Hospital Management
No ratings yet
Implementation of Dimensionality Reduction Techniques in Hospital Management
4 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
Visualizing Data Using t-SNE: Laurens Van Der Maaten
No ratings yet
Visualizing Data Using t-SNE: Laurens Van Der Maaten
27 pages
Dimension Reduction: P Adraig Cunningham University College Dublin
No ratings yet
Dimension Reduction: P Adraig Cunningham University College Dublin
24 pages
Pca
No ratings yet
Pca
19 pages
Feature Engineering
No ratings yet
Feature Engineering
5 pages
Sanjay Singh Principal Component Analysis
No ratings yet
Sanjay Singh Principal Component Analysis
9 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
MDS, PCoA
No ratings yet
MDS, PCoA
5 pages
Facial Recognition and Mathematics - Vectors and Geometry in Action
No ratings yet
Facial Recognition and Mathematics - Vectors and Geometry in Action
6 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
W4 2高级心理统计【赵蓁媜2021011898】
No ratings yet
W4 2高级心理统计【赵蓁媜2021011898】
5 pages
Study Guide For Iassc Certified Lean Six Sigma Green Belt (Icgb) Certification Exam
No ratings yet
Study Guide For Iassc Certified Lean Six Sigma Green Belt (Icgb) Certification Exam
11 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Pca and t-SNE Dimensionality Reduction
No ratings yet
Pca and t-SNE Dimensionality Reduction
3 pages
Module 1 - Statistical Process Control PDF
No ratings yet
Module 1 - Statistical Process Control PDF
37 pages
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) As Dimensionality Reduction Techniques
No ratings yet
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) As Dimensionality Reduction Techniques
4 pages
ML5-Machine Language Techniques
No ratings yet
ML5-Machine Language Techniques
1 page
Past 5 Manual
No ratings yet
Past 5 Manual
314 pages
Partial Least Squares Structural Equation Modeling: September 2017
No ratings yet
Partial Least Squares Structural Equation Modeling: September 2017
41 pages
Standard Deviation and Variance
No ratings yet
Standard Deviation and Variance
3 pages
Learning Apache Spark With Python: Wenqiang Feng
No ratings yet
Learning Apache Spark With Python: Wenqiang Feng
8 pages
Correlation Analysis
No ratings yet
Correlation Analysis
30 pages
Analysis of Variance
No ratings yet
Analysis of Variance
132 pages
Damped Trend Exponential Smoothing: Prediction and Control: Giacomo Sbrana
No ratings yet
Damped Trend Exponential Smoothing: Prediction and Control: Giacomo Sbrana
8 pages
Econometrics Model With Panel Data: Dinh Thi Thanh Binh Faculty of International Economics, FTU
No ratings yet
Econometrics Model With Panel Data: Dinh Thi Thanh Binh Faculty of International Economics, FTU
19 pages
Statistics Solutions Class 11
No ratings yet
Statistics Solutions Class 11
41 pages
Spatial Correlation New
No ratings yet
Spatial Correlation New
14 pages
Data Analysis Rough
No ratings yet
Data Analysis Rough
3 pages
II B.Tech (MIC23) SMDS Model Paper-1
No ratings yet
II B.Tech (MIC23) SMDS Model Paper-1
2 pages
Panel Data Model Princeton 101 SHORT
No ratings yet
Panel Data Model Princeton 101 SHORT
29 pages
Metodology of Research Cheap OK
No ratings yet
Metodology of Research Cheap OK
96 pages
Topic 03 CIs and Sample Size 03042024 042900pm
No ratings yet
Topic 03 CIs and Sample Size 03042024 042900pm
121 pages
Wa0048.
No ratings yet
Wa0048.
3 pages
Complete Time Series Analysis in Python 1673057003
No ratings yet
Complete Time Series Analysis in Python 1673057003
56 pages
A Comparative Analysis of Machine Learning Algorithms For Classification Purpose
No ratings yet
A Comparative Analysis of Machine Learning Algorithms For Classification Purpose
10 pages
Steps in SPSS To Find Correlation Matrix and Partial Correlation
No ratings yet
Steps in SPSS To Find Correlation Matrix and Partial Correlation
33 pages
Chelsea Stats Passes 2024-25
No ratings yet
Chelsea Stats Passes 2024-25
6 pages
AGR003 Laboratory Stats Tester: For Android
No ratings yet
AGR003 Laboratory Stats Tester: For Android
3 pages
STAT151 Practice Midterm 1
No ratings yet
STAT151 Practice Midterm 1
10 pages
Managerial Economics: "Final Project" (Fauji Cement Company)
No ratings yet
Managerial Economics: "Final Project" (Fauji Cement Company)
14 pages
Saids Dec 22
No ratings yet
Saids Dec 22
3 pages
Set 3 IBM-322
No ratings yet
Set 3 IBM-322
3 pages
Lab 11 ANOVA 2way Worksheet
No ratings yet
Lab 11 ANOVA 2way Worksheet
2 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet

Week7 DataProcessionMethod

Uploaded by

Week7 DataProcessionMethod

Uploaded by

Ⅱ.

Data dimension reduction

Q1. Principal Component Analysis (PCA) 主成分分析

Q2. Multidimensional Scaling (MDS) 多维缩放

Q3. T-distributed Stochastic Neighbor Embedding (t-SNE)

Q4. Shepard Diagrams

Shepard diagram。样本/观测点原始距离与 NMDS 排序结果比较。胁迫系数

pca（Principal Component Analysis，主成分分析)

mds（Metric Multidimensional Scaling, 多维尺度变换)

tSNE（t-Dist Stochastic Neighborhood Embedding)

You might also like