天天看點

R-Description Data(step 3)

[I]Description

1.overall description

summary()

> summary(data)        #min,lower quartile,median,upper quartile,max            

sapply()

> sapply(x,FUN,options)        #mean,standard deviation,skewness,kurtosis
#options:mean(),sd(),var(),min(),max(),median(),length(),range(),quantile(),fivenum()           

describe() of Hmisc

> describe(data)        #variable and observation amount,missing value and unique value mean,
                        #quantile,,five min,five max           

stat.desc() of pastecs

> stat.desc(data)
#- basic=TRUE(default)
#variable,null value,missing value,min,max,range,summary
#- desc=TRUE(default)
#median,mean,mean standard deviation,mean confidence interval(confidence=95%)
#- norm=TRUE
#normal distribution,include skewness and kurtosis(and degree of statistics)           

describe() of psych

> describe(data)        #missing value mean,standard deviation,madian,trimmed mean,
                        #median absolute deviation,min,max,range,skewness,kurtosis,standard error of the mean           

2.part description

aggregate()

> aggregate(data,by=list(INDICES),FUN)        #return single statistic           

by()

> by(data,INDICES,FUN)        #return multiple statistics           

summaryBy() of doBy

> summaryBy(formula,data=dataframe,FUN=function)          #single or multiple grouping variable layering
#formula = var1 + var2 + var3 + ... + varN ~ groupvar1 + groupvar2 + ... + groupvarN
#(varN is numerical variable,groupvar is grouping variable)           

describeBy() of psych

> describeBy(data,list(INDICES))        #grouping variable are related           

3.contingency table

traditional

Function Describe
table(var1,var2, ... ,varN) N dimensional table
xtabs(~formula,data) N dimensional table is based on a formula,a matrix or data frame generating
prop.table(table,margins) convert frequency to scale
margin.table(table,margin) summary
addmargins(table,margins) add margins to table
ftable(table) tiled contingency table

CrossTable() of gmodels

> CrossTable(data1,data2)           

[II]Test

1.known sample

- independence

Chi-square
> chis.test(data)        #p<0.01,related;p>0.05,unrelated           
Fisher percision
> fisher.test(mytable)        #mytable is not a 2×2 table           
Cochran-Mantel-Haenszel
> mantelhaen.test(mytable)        #no third-order interaction           

- correlation

category type

(1)Phi/Contingency/Cramer's V

> assocstats(mytable)           

(2)Pearson/Spearman/Kendall

> cor(x,use,method)        #default:use="everything",method="pearson"
> cov(data)        #covariance
> cor.test(x,y,alternative= ,method= )        #test a relationship at a time
> corr.test(x,use,method)        #test multiple relationships at a time           

use:

  • all.obs:getting an error while getting wrong data;
  • everything:missing is setting while missing data;
  • complete.obs:line deletion
  • pairwise.complete.obs:pairwise deletion

method:

  • pearson:linear correlation between two variables
  • spearman:degree of correlation between graded variables
  • kendall:level related measure

    (3)partial correlation

> library(ggm)
> pcor(u,S)        #u:numerical vetor;S:covariance
> pcor.test(r,q,n)        #r:correlation coefficient;q:variable number;n:sample size           

continuous type

(1)parameter

1)independent sample

> t.test(y~x,data)        #t.test(y1,y2)           

2)dependent sample

> t.test(y1,y2,paired=TRUE)           

3)more than two groups:ANOVA

  • single factor varinance (y~A)
> aov(formula,data=dataframe)
> TukeyHSD()        #pairwise comparison           
  • single factor covariance (y~x + A)
  • double factors varinance (y~A * B)
  • repeated measurement varinance (y~ B*W + Error(Subject/W))
  • multiple varinance
> data->manova(y~A)
> summary.aov(data)
> Wilks.test(y,shelf,method="mcd")           
  • regression
> fit.lm<-lm(y~A,data)
> summary(fit.lm)           

(2)nonparameter

  • two groups
> wilcox.test(y~x,data)        #wilco.text(y1,y2)           
  • more than two groups
#groups independent
> kruskal.test(y~A,data)        
#groups dependent
>friedman.test(y~A|B,data)                   

2.random sample

Description
oneway_test(y~A) two samples and K samples
oneway_test(y~A | C) containing a layering factor of two samples and K samples
wilcox_test(y~A) Wilcoxon-Meann-Whitney
kruskal_test(y~A) Kruskal-Wallis
chisq_test(A~B | C) Pearson Chi-square
cmh_test(A~B | C)
lbl_test(D~E) linear correlation
spearman_test(y~x) Spearman
friendman_test(y~A | C) Friendman
wilcoxsign_test(y1~y2) Wilcoxon
  • function_name(formula,data,distribution=)
  • formula=variables relationship
  • data=dataframe
  • distribution="exact"/"asymptotic"/"approximate"
lmp(A~B,data=,perm=) simple
lmp(A~B+I(height^2),data=,perm=) polynomical
lmp(A~B+C+D+E,data=,perm=) multiple
avop(A~B,data=,perm=) single factor variance
avop(A~B+C,data=,perm=) single factor covariance
avop(A~B*C,data=,perm=) double factor variance
  • perm="Exact"/"Prob"/"SPR"

[III]efficacy

pwr.2p.test(h=,n=,sig.level=,power=) two(n is equal)
pwr.2p2n.test(h=,n1=,n2=,sig.level=,power=) two(n are not equal)
pwr.anova.test(k=,n=,f=,sig.level=,power=) balanced single factor ANOVA
pwr.chisq.test(w=,N=,df=,sig.level=,power=) Chi-square test
pwr.f2.test(u=,v=,f2=,sig.level=,power=) generalized linear model
pwr.p.test() proportion(single sample)
pwr.r.test(n=,r=,sig.level=,power=,alternative=) correlation coefficient
pwr.t.test(n=,d=,sig.level=,power=,type=,alternative=) t est(single sample/two samples/pair)
pwr.t2n.test(n1=,n2=,d=,sig.level=,power=,alternative=) t test(n are not equal of two samples)
  • h=ES.h(p1,p2)
  • n=sample size
  • $\mu$=mean
  • $\sigma^2$=error variance
  • sig.level=significant level(default=0.05)
  • power=efficacy level
  • k=groups number
  • f=$\sqrt{\frac{\sum_{i-1}^{k}{p_i * {(\mu_i -\mu)}^2}}{\sigma^2}}$,$p_i=\frac{n_i}{N}$
  • w=$\sqrt{\sum_{i=1}^{m}{\frac{{(p0_i-p1_i)}^2}{p0_i}}}$,$p0_i=H_0$ for probability,$p1_i=H_1$ for probability
  • N=total sample
  • df=free degree
  • u=N-B;N-k-1(k=forecast number)
  • v=denominator free degree
  • f2=$\frac{R^2}{1-R^2}$($R^2$=total squared value of multiple correlation);

    f2=$\frac{{R_{AB}}^2-{R_A}^2}{1-{R_{AB}}^2}$(${R_{A}}^2$=interpretation rate of A for total variance,${R_{AB}}^2$=interpretation rate of A and B for total variance)

  • r=reference linear correlation coefficient
  • alternative="two.sided"(default)/"less"/"greater"
  • d=$\frac{\mu_1-\mu_2}{\sigma}$
  • type="two.sample"(default)/"one.sample"/"paired"

    END!

繼續閱讀