cluster with k-means

To select one traits in one cluster, simplify the gwas result

1. calculate the Pearson correlation coefficient by using R and plot

library(pheatmap)
data<-read.table("pro_alti_rm.cor",header=T,row.names = 1)
pdf(file="pro_soil.cor.pdf",height = 7,width = 7)
pheatmap(data,show_rownames = F,show_colnames = F)
dev.off()

soil traits with k-means clustercluster with k-means

2. choose a suitable K

library("ClusterR")
corm<-as.matrix(read.table("pro_alti_rm.cor")) ## with the altitude trait and soil properites without enough data removed, so get a k in 11 not in 14
Optimal_Clusters_KMeans(corm,max_clusters=25,criterion="variance_explained") # do not minimize with AIC (with k=4)
k=11
bob<-KMeans_rcpp(corm,clusters=k)
clusters<-data.frame(colnames(corm),bob$clusters)
for (i in 1:11)  # randomly select one trait in a cluster
{
print (sample(clusters[which(clusters$bob.clusters==i),1],1))
}

soil traits with k-means clustercluster with k-means

3. Manhattan plot for the 11 traits

soil traits with k-means clustercluster with k-means

Question: lost the information like below, and lost the QTLs listed in supplementary table 16.

Such as: organic carbon in 0-0.045m

soil traits with k-means clustercluster with k-means

May consider PCA or:

If the soil properites under different depths are clusted together, pick 1 soil property.

For example, total carbon (16.6mm), total carbon (28.9mm), total N (16.6mm), total N (28.9) all for trait in 1 cluster, randomly select a trait in total carbon and total N, not just remain 1 trait in 1 cluster.

soil traits with k-means clustercluster with k-means

cluster with k-means

1. calculate the Pearson correlation coefficient by using R and plot

2. choose a suitable K

3. Manhattan plot for the 11 traits

Question: lost the information like below, and lost the QTLs listed in supplementary table 16.

May consider PCA or:

If the soil properites under different depths are clusted together, pick 1 soil property.

For example, total carbon (16.6mm), total carbon (28.9mm), total N (16.6mm), total N (28.9) all for trait in 1 cluster, randomly select a trait in total carbon and total N, not just remain 1 trait in 1 cluster.

繼續閱讀

Gaussian discriminant analysis and Gaussian Mixture Model

機器學習聚類問題

模式識别--緒論什麼是模式識别？模式識别的主要方法及具體應用

PCA(主成分分析)降維可視化Matlab實作

數理統計——Kmeans一、聚類二、程式實作三、各種算法對比1.KMeans++2.Mini Batch K-Means3.如何确定合适的k值

拓端tecdat|R語言代寫實作層次聚類模型

拓端tecdat|R語言輔導使用K-Means聚類可視化WiFi通路

拓端tecdat|R語言代寫：EM算法和高斯混合模型的實作

拓端tecdat|R語言輔導中不同類型的聚類方法比較

ICCV何恺明團隊又一神作：Transformer仍有繼續改善的空間

經典算法筆記：無監督算法（聚類、降維）

【基礎算法】常見的ML、DL程式設計題

層次聚類算法介紹1層次聚類的定義2距離與相似性3合并算法思想4算法流程5 示例與分析6需注意的問題

跟着Cell學單細胞轉錄組分析(十二):轉錄因子分析

機器學習 day7 kmeans 聚類算法

【Spark Mllib】K-均值聚類——電影類型K-均值聚類資料特征提取