R语言主成分分析总结

机器学习 专栏收录该内容
17 篇文章 0 订阅

简单总结R语言PCA相关函数
这里是数据集

yearX1X2X3
19511-2.7-4.3
1952-5.3-5.9-3.5
1953-2-3.4-0.8
1954-5.7-4.7-1.1
1955-0.9-3.8-3.1
1956-5.7-5.3-5.9
1957-2.1-5-1.6
19580.6-4.3-0.2
1959-1.7-5.72
1960-3.6-3.61.3
19613-3.1-0.8
19620.1-3.9-1.1
1963-2.6-3-5.2
1964-1.4-4.9-1.7
1965-3.9-5.7-2.5
1966-4.7-4.8-3.3
1967-6-5.6-4.9
1968-1.7-6.4-5.1
1969-3.4-5.6-2.9
1970-3.1-4.2-2
1971-3.8-4.9-3.9
1972-2-4.1-2.4
1973-1.7-4.2-2
1974-3.6-3.3-2
1975-2.7-3.70.1
1976-2.4-7.6-2.2

princomp

这个函数是R中的标准PCA函数,可用cor,也可用cov协方差阵来做PCA

> pca <- princomp(temprature)
> summary(pca,loadings = T)
Importance of components:
                          Comp.1    Comp.2    Comp.3
Standard deviation     2.3927483 1.6766875 1.0093123 #标准差,特征值的平方根
Proportion of Variance 0.5991735 0.2942137 0.1066129 #方差比
Cumulative Proportion  0.5991735 0.8933871 1.0000000 #方差累积

#载荷阵,特征向量,(有些缺失值一直不知道为什么,回头在填坑)
Loadings:
   Comp.1 Comp.2 Comp.3
X1  0.800 -0.532  0.278
X2  0.238 -0.145 -0.960
X3  0.551  0.834 

相关数据

#输出每组的得分,也可用predict(pca)
> pca$scores
         Comp.1      Comp.2      Comp.3
1951  2.1396991 -3.83416758 -0.86081047
1952 -3.2170709  0.65036989  0.46732300
1953  1.5047115  0.78326549 -0.98770616
1954 -1.9282363  2.69070902 -0.77199999
1955  1.0208400 -1.66238671 -0.32088258
1956 -4.7179457 -1.22586764 -0.24479186
1957  0.6034101  0.40169317  0.51297983
1958  3.7008125  0.03107027  0.60645276
1959  2.7423145  3.29340498  1.33340102
1960  1.3359150  3.41533524 -1.21947452
1961  5.5741402 -1.92082051  0.11577404
1962  2.8996923 -0.51171559  0.07392112
1963 -1.3065888 -2.62572541 -1.58383704
1964  1.1317614 -0.06871931  0.61074000
1965 -1.4985797  0.71048716  0.67510633
1966 -2.3656444  0.33808233 -0.42011674
1967 -4.4776216 -0.18852359 -0.02993928
1968 -1.3395824 -2.52711224  1.93314844
1969 -1.2956017  0.09625872  0.71413634
1970 -0.2267435  0.48388739 -0.53777663
1971 -2.0006306 -0.62674492 -0.07971504
1972  0.4560127 -0.44959984 -0.33175339
1973  0.8927389 -0.26104978 -0.14812577
1974 -0.4127266  0.61914798 -1.54132776
1975  1.3700344  1.95003688 -0.88520432
1976 -0.5851104  0.43868459  2.92047869
#载荷阵
> pca$loadings

Loadings:
   Comp.1 Comp.2 Comp.3
X1  0.800 -0.532  0.278
X2  0.238 -0.145 -0.960
X3  0.551  0.834  
> screeplot(pca,type = "lines")#碎石图

 biplot(pca)#绘制主成分方向图

椭圆图还不会画,等以后填坑

注意

  1. 若要用相关系数矩阵,指定参数 cor=true
  2. 若要求完整载荷矩阵(权重矩阵)可以用cor,或者cov先求协差阵或相关阵,然后用eigen求特征值和特征向量(不过一般不需要用完整特征向量数据)

principal

应用平行法则挑选主成分

> fa.parallel(temprature,n.iter = 100,fa="pc",main="screen plot with parallel analysis")
Parallel analysis suggests that the number of factors =  NA  and the number of components =  1 

他会画出图,应用平行法则挑选主成分变量

主成分principal

参数
- data: 相关矩阵或者数据框
- rotate: 指定旋转方法
- scores: 是否计算得分
- nfactor: 主成分个数

> pca <- principal(temprature,rotate = "none",nfactors = 2,scores = T)
> pca
Principal Components Analysis
Call: principal(r = temprature, nfactors = 2, rotate = "none", scores = T)
Standardized loadings (pattern matrix) based upon correlation matrix
    PC1   PC2   h2    u2 com
X1 0.82 -0.11 0.69 0.313 1.0
X2 0.74 -0.51 0.81 0.193 1.8
X3 0.63  0.75 0.95 0.045 1.9

                       PC1  PC2
SS loadings           1.62 0.83
Proportion Var        0.54 0.28
Cumulative Var        0.54 0.82
Proportion Explained  0.66 0.34
Cumulative Proportion 0.66 1.00

Mean item complexity =  1.6
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.17 
 with the empirical chi square  4.31  with prob <  NA 

Fit based upon off diagonal values = 0.73

得分和权重

> pca$weights
         PC1        PC2
X1 0.5067548 -0.1384156
X2 0.4575790 -0.6118978
X3 0.3885133  0.9012161
> pca$scores
             PC1         PC2
1951  1.15613102 -2.14274863
1952 -1.40397385  0.29619057
1953  0.87645580  0.02225637
1954 -0.55355307  0.79852707
1955  0.52133957 -0.89617467
1956 -1.73868595 -1.09440483
1957  0.07029356  0.49584374
1958  1.25079371  0.60192968
1959  0.60373762  2.49012260
1960  0.84116086  1.19371677
1961  2.16014463 -0.45293824
1962  1.11189864  0.01144684
1963  0.02078349 -2.17040460
1964  0.25282918  0.35309338
1965 -0.80115014  0.56219311
1966 -0.79512763 -0.22411286
1967 -1.72761805 -0.45915894
1968 -1.07610798 -0.40767704
1969 -0.72471773  0.29431309
1970  0.06988130 -0.04170145
1971 -0.74303145 -0.50504203
1972  0.28630899 -0.34781995
1973  0.39653695 -0.13092458
1974  0.30440309 -0.47945781
1975  0.77438081  0.63703597
1976 -1.13311339  1.59589645
  • 11
    点赞
  • 5
    评论
  • 52
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

相关推荐
©️2020 CSDN 皮肤主题: 精致技术 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值