文章目錄
- dataframe
-
- data.frame() to create a dataframe
-
- data frame is like a spreadsheet with row numbers and column names
- common operation on data frame
-
- subsetting: extract element(s)
-
- use index numbers to extract elements
-
- single bracket to return elements of the same type
- double bracket to return elements of its own type
- use column names to extract elements
- use dollar sign "$" to extract elements
- use logical value to extract elements
- get elements matching specific conditions
- matrix
-
- 2 ways to construct a matrix
-
- use cbind() or rbind() to create a matrix
- use matrix to construct a matrix
- common operations on matrice
-
- subsetting:extract elements
- summary operations about matrice in DA
-
- get the row wise sum via rowSums() not rowsum()
- also get the column wise sum via colSums() not colsum()
- get the column wise mean via colMeans() not colmean()
- array
-
- use array() to create an array
- subsetting
-
- use index number to extract elements
dataframe
- heterogeneous data structure
- contains elements of different classes
- 2 dimensional arrangement
players.name=c("KD","Curry","Klay","Green") players.number=c(35,30,11,23) players.2K=c(87,96,91,85) players.gender=factor(c("male","male","male","male"),levels = c("male","female"))
data.frame() to create a dataframe
players=data.frame(players.name,players.number,players.gender,players.2K)
str(players)
notice the string type players.names are converted to factor
However we can fix it by adding parameter stringsAsFactors=False
make sure not typing as stringAsFactors
players=data.frame(players.name,players.number,players.gender,players.2K,stringsAsFactors=FALSE)
str(players)
players
data frame is like a spreadsheet with row numbers and column names
common operation on data frame
subsetting: extract element(s)
4 ways
use index numbers to extract elements
single bracket to return elements of the same type
players[2]
#single number in the brackets means column number
typeof(players[2])
data frame is basically a list
double bracket to return elements of its own type
players[[2]]
typeof(players[[2]])
2 numbers in the bracket means row and column number
#for example:below means extract the element at row1 and column2
players[1,2]
some other examples
players[1:3,2:3]
players[c(1,3),c(2,4)]
players[1,]
players[,1]
use column names to extract elements
players["players.name"]
typeof(players["players.name"])
players[["players.name"]]
typeof(players[["players.name"]])
single bracket to return elements of the same type
double bracket to return elements of its own type
players[c("players.name","players.gender")]
use dollar sign “$” to extract elements
players$players.name
use logical value to extract elements
players[c(T,F,F,F)]
get elements matching specific conditions
players[players.2K>90,]
players[players.gender="male"]
players[players.gender=="male"]
be sure to use “==” instead of “=” to do logical judgement
vector, list and data frame are the most common data structures in DA
matrix
- homogeneous data structure to store elements of the same type
- 2D data arrangement
-
usually the matrix is used to store numerical elements for computing
student.English.score=c(100L,98L,99L,96L)
student.math.score=c(10L,11L,9L,15L)
2 ways to construct a matrix
use cbind() or rbind() to create a matrix
- rbind() means row bind
-
row names are given by vector names and column names are autogenerated
student.score1=rbind(student.math.score,student.English.score)
student.score1
str(student.score1)
column names can be given afterwards by colnames()
colnames(student.score1)=c("KD","Curry","Klay","Green")
student.score1
- cbind means column bind
- column names are given
codes:
student.score2=cbind(student.math.score,student.English.score)
student.score2
str(student.score2)
rownames(student.score2)=c("KD","Curry","Klay","Green")
student.score2
use matrix to construct a matrix
student.score3=matrix(c(10L,11L,9L,15L,100L,98L,99L,96L),ncol=2,nrow=4)
student.score3
- generate the matrix by listing elements and give the row and column numbers
- in default, the matrix is generated vertically
-
we can generate the matrix horizonally by adding "byrow=true"
codes:
student.score4=matrix(c(10L,11L,9L,15L,100L,98L,99L,96L),ncol=4,nrow=2,byrow=TRUE)
student.score4
common operations on matrice
student.score2
subsetting:extract elements
- subsetting of matrice is simliar to that of dataframe
- matrice use index number or logical value to extract elements
codes:
student.score2[4,2]
student.score2[1,]
student.score2[c(T,T,F,F),]
student.score2[student.math.score>10,]
summary operations about matrice in DA
student.score2
get the row wise sum via rowSums() not rowsum()
rowSums(student.score2)
also get the column wise sum via colSums() not colsum()
colSums(student.score2)
get the column wise mean via colMeans() not colmean()
colMeans(student.score2)
array
-homogeneous data structure
-n- dimensional
-array can be regarded as multiple sheets or multiple sheets stacj over each other
-a matrix is like a sheet as we discuss above
#create first matrix
class1.student.math=c(100,99,87,96)
class1.student.English=c(90,98,90,87)
class1.student.marks=cbind(class1.student.math,class2.student.English)
class1.student.marks
#create the second matrix
class2.student.math=c(91,93,91,92)
class2.student.English=c(98,92,80,78)
class2.student.marks=cbind(class2.student.math,class2.student.English)
class2.student.marks
use array() to create an array
student.marks=array(c(class1.student.marks,class2.student.marks),dim=c(4,2,2))
student.marks
subsetting
use index number to extract elements
student.marks[1,,1]