天天看點

資料分析,展現與R語言學習筆記(1)

> x1=c(1,2,3,4,5,6,7,8,9)//c()=産生一個向量
> x1
[1] 1 2 3 4 5 6 7 8 9
> mode(x1)
[1] "numeric"
> length(x1)
[1] 9      
> rbind(x1,x1)//整合連個向量,形成一個矩陣
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
x1    1    2    3    4    5    6    7    8    9
x1    1    2    3    4    5    6    7    8    9
> cbind(x1,x1)
      x1 x1
 [1,]  1  1
 [2,]  2  2
 [3,]  3  3
 [4,]  4  4
 [5,]  5  5
 [6,]  6  6
 [7,]  7  7
 [8,]  8  8
 [9,]  9  9      
> mean(x1)//求平均數
[1] 5
> sum(x1)//求和
[1] 45
> max(x2)//求最大最小值
[1] 100
> min(x1)
[1] 1
> var(x1)//求方差(variance)
[1] 7.5
> prod(x1)
[1] 362880
> prod(x2)
[1] 9.332622e+157
> 
> sd(x2)//标準差( standard deviation) 
[1] 29.01149      

一些文法

> 1:10
 [1]  1  2  3  4  5  6  7  8  9 10
> 1:10-1
 [1] 0 1 2 3 4 5 6 7 8 9
> 2:60*2+1
 [1]   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35  37  39
[19]  41  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75
[37]  77  79  81  83  85  87  89  91  93  95  97  99 101 103 105 107 109 111
[55] 113 115 117 119 121
> 1:10*2
 [1]  2  4  6  8 10 12 14 16 18 20
> 2:60*2+1
 [1]   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35  37  39
[19]  41  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75
[37]  77  79  81  83  85  87  89  91  93  95  97  99 101 103 105 107 109 111
[55] 113 115 117 119 121
> 
> 2:60*2+1
 [1]   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35  37  39
[19]  41  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75
[37]  77  79  81  83  85  87  89  91  93  95  97  99 101 103 105 107 109 111
[55] 113 115 117 119 121
> 2:60*2+1
 [1]   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35  37  39  41  43
[21]  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75  77  79  81  83
[41]  85  87  89  91  93  95  97  99 101 103 105 107 109 111 113 115 117 119 121
> a=2:60*2+1
> a
 [1]   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35  37  39  41  43
[21]  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75  77  79  81  83
[41]  85  87  89  91  93  95  97  99 101 103 105 107 109 111 113 115 117 119 121
> a[5]
[1] 13
> a[-5]
 [1]   5   7   9  11  15  17  19  21  23  25  27  29  31  33  35  37  39  41  43  45
[21]  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75  77  79  81  83  85
[41]  87  89  91  93  95  97  99 101 103 105 107 109 111 113 115 117 119 121
> a[1:5]
[1]  5  7  9 11 13
> a[-(1:5)]
 [1]  15  17  19  21  23  25  27  29  31  33  35  37  39  41  43  45  47  49  51  53
[21]  55  57  59  61  63  65  67  69  71  73  75  77  79  81  83  85  87  89  91  93
[41]  95  97  99 101 103 105 107 109 111 113 115 117 119 121
> a[c(2,4,7)]
[1]  7 11 17
> 
> a[3:8]
[1]  9 11 13 15 17 19
> a[a<20]
[1]  5  7  9 11 13 15 17 19
> a[a[3]]
[1] 21
> a[9]
[1] 21      
> seq(5,20)//産生一個向量,可以指定
 [1]  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> seq(5,121,by=2)
 [1]   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35  37  39  41  43
[21]  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75  77  79  81  83
[41]  85  87  89  91  93  95  97  99 101 103 105 107 109 111 113 115 117 119 121
> seq(5,121,length=10)
 [1]   5.00000  17.88889  30.77778  43.66667  56.55556  69.44444  82.33333  95.22222
 [9] 108.11111 121.00000
> letters[1:30]
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
[21] "u" "v" "w" "x" "y" "z" NA  NA  NA  NA 
> letters//内置對象,存着26個字幕
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
[21] "u" "v" "w" "x" "y" "z"
> a=seq(2,40)      
> which.max(a)  //找位置,各種找位置
[1] 39
> which(a==2)
[1] 2
> which(a>5)
 [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20      
> a=1:20
> a
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> rev(a)//翻轉
 [1] 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1
> sort(a)//排序
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> rev(sort(a))
 [1] 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1
> a1=c(1:12)      
> a1=c(1:12)
> matrix(a1,nrow=4,ncol=3)//矩陣
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12
> matrix(a1,nrow=3,ncol=4)
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> matrix(a1,nrow=4,ncol=3,byrow=T)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12      

矩陣的轉置,加減法

> a=b=matrix(a1,nrow=4,ncol=3,byrow=T)
> a
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12
> b
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12
> t(a)//轉置
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> a+b//加法
     [,1] [,2] [,3]
[1,]    2    4    6
[2,]    8   10   12
[3,]   14   16   18
[4,]   20   22   24
> a-b//減法
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0
[4,]    0    0    0      
矩陣乘法      
> a=matrix(1:12,nrow=3,ncol=4)
> b=matrix(1:12,nrow=4,ncol=3)
> a
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> b
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12
> a%*%b
     [,1] [,2] [,3]
[1,]   70  158  246
[2,]   80  184  288
[3,]   90  210  330      

方陣的對角線

> a=matrix(1:16,nrow=4)
> a
     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16
> diag(a)//結果是一個向量
[1]  1  6 11 16
> diag(diag(a))//産生以向量為對角線的矩陣
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    6    0    0
[3,]    0    0   11    0
[4,]    0    0    0   16
> diag(4)//産生四階機關矩陣
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    0
[3,]    0    0    1    0
[4,]    0    0    0    1      
> diag(seq(1,5))//産生以向量為對角線的矩陣
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    2    0    0    0
[3,]    0    0    3    0    0
[4,]    0    0    0    4    0
[5,]    0    0    0    0    5      
> rnorm(16)//以正态分布産生16個随機數 [1] 0.79027687 1.14167897 1.27162428 -1.13071815 -1.46295346 -0.33647679 [7] -0.20166697 0.02592894 0.20498691 1.51331875 1.35167580 1.40470721 [13] -0.16802030 -0.35107031 -0.51437608 -0.09406821 > a=matrix(rnorm(16),4,4) > a [,1] [,2] [,3] [,4] [1,] -1.4205777 0.3643621 0.82097989 1.03121963 [2,] 0.1486225 -0.7520685 0.68004193 -0.03371108 [3,] -1.4458179 -0.8287518 1.48177576 0.09116119 [4,] -1.3000649 -0.1764955 0.02366358 -0.06364255      
> solve(a)//求逆矩陣(這個果真是不好求啊,電腦明顯頓了一下)
            [,1]       [,2]        [,3]       [,4]
[1,] -0.02472174  0.2219122 -0.07808618 -0.6299696
[2,] -0.31118268 -2.6935027  1.43350614 -1.5621103
[3,] -0.27601213 -1.4377140  1.51227778 -1.5445831
[4,]  1.26536035  2.4019986 -1.81803407  0.9137964      
> a
           [,1]       [,2]       [,3]        [,4]
[1,] -1.4205777  0.3643621 0.82097989  1.03121963
[2,]  0.1486225 -0.7520685 0.68004193 -0.03371108
[3,] -1.4458179 -0.8287518 1.48177576  0.09116119
[4,] -1.3000649 -0.1764955 0.02366358 -0.06364255
> b=c(1:4)
> b
[1] 1 2 3 4
> solve(a,b)//解線性方程組,a*x=b的值
[1] -2.335034 -7.646111 -4.792939  4.270441      

求矩陣的特征值和特征向量(考研的童鞋慢慢的回憶啊)

> a=diag(4)+1
> a
     [,1] [,2] [,3] [,4]
[1,]    2    1    1    1
[2,]    1    2    1    1
[3,]    1    1    2    1
[4,]    1    1    1    2
> a.e=eigen(a,symmetric=T)
> a.e
$values
[1] 5 1 1 1

$vectors
     [,1]       [,2]       [,3]       [,4]
[1,] -0.5  0.8660254  0.0000000  0.0000000
[2,] -0.5 -0.2886751 -0.5773503 -0.5773503
[3,] -0.5 -0.2886751 -0.2113249  0.7886751
[4,] -0.5 -0.2886751  0.7886751 -0.2113249      
> a.e$vectors%*%diag(a.e$values)%*%t(a.e$vectors)(沒錯,就是那個公式)
     [,1] [,2] [,3] [,4]
[1,]    2    1    1    1
[2,]    1    2    1    1
[3,]    1    1    2    1
[4,]    1    1    1    2
>      

上邊是向量和矩陣兩種資料類型,下邊是數組類型

> x=c(1:6)
> is.vector(x)
[1] TRUE
> is.array(x)
[1] FALSE
> is.matrix(x)
[1] FALSE
> dim(x)=c(2,3)
> is.vector(x)
[1] FALSE
> is.array(x)
[1] TRUE
> is.matrix(x)//從這看以得知,矩陣就是2維的數組
[1] TRUE
> x
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6      

資料框:矩陣形式,但列可以是不同類型。

                每列是一個變量(是以隻能取到一列也就是一個變量的值),每行是一個觀測值(樣本)。

> x1=seq(6,10)
> x2=seq(19,23)
> x1
[1]  6  7  8  9 10
> x2
[1] 19 20 21 22 23
> x3=data.frame(x1,x2)//處理x1,x2,産生一個資料框
> x3
  x1 x2
1  6 19
2  7 20
3  8 21
4  9 22
5 10 23
> x3[1]
  x1
1  6
2  7
3  8
4  9
5 10
> x3[2]
  x2
1 19
2 20
3 21
4 22
5 23      
> x=data.frame("重量"=x1,"運費"=x2)
> x
  重量 運費
1    6   19
2    7   20
3    8   21
4    9   22
5   10   23      
plot(x)//以上邊的資料框x變量中的數值,畫散點圖。有兩列,兩個變量,畫出來就是兩個坐标軸,2維的。      
For循環      
> for(i in 1:59) {a[i]=i*2+3} > a [1] 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 [21] 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 [41] 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 >      

while循環

> a[1]=5 > i=1 > while(a[i]<121){i=i+1;a[i]=a[i-1]+2} > a [1] 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 [21] 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 [41] 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 >      
産生各種分布的向量> x1=round(runif(100,min=80,max=100))
 > x1
   [1]  98  87  85  92  86  90  96  91  90  95  94  93  99  84  93  81  81  94  92  81
  [21]  84  82  85  88  81  94  97  99  81  89  85 100  90  82  87  87  83  96  83  92
  [41]  94  97  88  94  88  92  82  93  92  81  85  86  87  98  84  97  91  95  86  81
  [61]  85  83  81  94  90  81  89  96  92  86  96  89 100  81  97  82  87  96  91  86
  [81]  92  97  96  99  86  99  82  89  96  94  86  98  91  99  95  98 100  92  87  85
 > x1=round(rnorm(100,mean=80,sd=7))
 > x1
   [1] 99 89 72 92 72 88 68 78 80 84 76 74 74 90 78 86 74 76 80 74 58 91 69 90 72 88 78
  [28] 90 79 76 82 77 80 85 77 74 70 89 85 91 67 82 64 85 75 82 82 88 85 84 86 71 69 87
  [55] 90 70 87 61 74 76 76 78 79 89 79 73 85 82 78 77 82 75 78 82 84 94 69 92 72 73 93
  [82] 76 90 77 76 82 87 82 81 82 75 78 77 88 68 74 78 82 74 90
 > x1[which(x1>100)]=100
 > x1
   [1] 99 89 72 92 72 88 68 78 80 84 76 74 74 90 78 86 74 76 80 74 58 91 69 90 72 88 78
  [28] 90 79 76 82 77 80 85 77 74 70 89 85 91 67 82 64 85 75 82 82 88 85 84 86 71 69 87
  [55] 90 70 87 61 74 76 76 78 79 89 79 73 85 82 78 77 82 75 78 82 84 94 69 92 72 73 93
  [82] 76 90 77 76 82 87 82 81 82 75 78 77 88 68 74 78 82 74 90      

生成資料框并寫入檔案

> a1=round(rnorm(100,mean=90,sd=10))
> a1
  [1]  90  89  94 108  72  97  88  99  98  89  74 104  81  84  77  72  83  94 100  95
 [21] 100  94  87  90  87  92  81  97  78  98 100 101  90  82  94  92  83  80 103  91
 [41]  83  84  90  92  97  99 108  68  72  84  78  93  98  91  85 117 103  71  87 100
 [61]  96  88  93  89  90 130  90  94  75  85  87 105  94  75  88  96 104  88  86 102
 [81] 109  83  87  95 114  91  88  94  82  90 104  80  83  79  87  95  99  92  78  87
> a2=round(rnorm(100,mean=90,sd=50))
> a2
  [1]  94  14 -23  80 119 149  83 105  91 189 192  67 173  34  69  65 207 144  69 115
 [21] 145  95  49 103  29  59 103 126  66 137 112 104  84 167 137  78 123   2  63  60
 [41] 127  62  69 149  35  52 136  84  23  48 110  68  58 151  59 123 -20 157 161  48
 [61]  20 138 118  44  54 197  67 175 180  28  31  23  25  94  61  80 144  58  85  79
 [81]  67 117  26  -7  82 138 130  80  40  59 157 126  93 -60  48 123  37  75  18 230
> a3=round(rnorm(100,mean=90,sd=50))
> a3=round(rnorm(100,mean=90,sd=2))
> a3
  [1] 89 91 88 89 90 89 91 88 92 87 90 89 92 89 88 92 89 92 88 88 88 90 89 95 91 87 91
 [28] 94 90 90 94 93 91 89 91 92 93 88 89 87 91 88 90 91 94 90 89 88 93 92 88 91 88 90
 [55] 90 93 92 91 89 89 85 94 90 87 89 88 86 93 94 87 91 88 88 89 90 91 93 90 90 92 88
 [82] 89 91 89 91 89 90 93 92 91 90 89 90 89 91 90 86 92 92 93
> a4=round(rnorm(100,mean=90,sd=60))
> a4
  [1]  87  13 125 109 179  29  40  27 152 187  68   5 120 105 123 186 148 167 110 115
 [21] 114   8 119  64  29 107 120   6 123 206 141 124  96  66 -19 192  33 163 195 156
 [41] 167 116  72  69  45 146  98  54 127 102  83  68  43 129  26  83 138  53  92 218
 [61] 245  98 132  36  93  46  44 -21  15  87 143   6 112 143  79 145  41  69  99   0
 [81] 196  15  96 120  -4 126 104  63 156  62 -58  37 104 136  71 213  59 152   8 102
> a=data.frame(a1,a2,a3,a4)//向量生成資料框      

> write.table(x,file="d:\\mark.txt",col.names=F,row.names=F,sep=" ") //寫入檔案

a是一個資料框      
> colMeans(a) a1 a2 a3 a4 90.80 88.58 90.10 94.37 > colMeans(a)[c("a1","a2","a3")] a1 a2 a3 90.80 88.58 90.10 >      
強大的apply函數,第一個參數是一個資料框,第二個參數,1為對行處理,2為對列處理,第三個參數為action,值可為max,min,mean,sum等      
> apply(x,1,mean)
[1] 12.5 13.5 14.5 15.5 16.5
> apply(x,1,max)
[1] 19 20 21 22 23      

繼續閱讀