餘弦相似度算法:
餘弦相似性通過測量兩個向量的夾角的餘弦值來度量它們之間的相似性。0度角的餘弦值是1,而其他任何角度的餘弦值都不大于1;并且其最小值是-1。進而兩個向量之間的角度的餘弦值确定兩個向量是否大緻指向相同的方向。兩個向量有相同的指向時,餘弦相似度的值為1;兩個向量夾角為90°時,餘弦相似度的值為0;兩個向量指向完全相反的方向時,餘弦相似度的值為-1。這結果是與向量的長度無關的,僅僅與向量的指向方向相關。餘弦相似度通常用于正空間,是以給出的值為-1到1之間。
坐标系表示:

具體公式:
具體實作(Java):
import java.util.ArrayList;
public class SimilarityUtil {
public static double similarity(ArrayList va, ArrayList vb) {
if (va.size() > vb.size()) {
int temp = va.size() - vb.size();
for (int i = 0; i < temp; i++) {
vb.add(0);
}
} else if (va.size() < vb.size()) {
int temp = vb.size() - va.size();
for (int i = 0; i < temp; i++) {
va.add(0);
}
}
int size = va.size();
double simVal = 0;
double num = 0;
double den = 1;
double powa_sum = 0;
double powb_sum = 0;
for (int i = 0; i < size; i++) {
double a = Double.parseDouble(va.get(i).toString());
double b = Double.parseDouble(vb.get(i).toString());
num = num + a * b;
powa_sum = powa_sum + (double) Math.pow(a, 2);
powb_sum = powb_sum + (double) Math.pow(b, 2);
}
double sqrta = (double) Math.sqrt(powa_sum);
double sqrtb = (double) Math.sqrt(powb_sum);
den = sqrta * sqrtb;
simVal = num / den;
return simVal;
}
}
案例分析:
姓名/興趣 | 吃蘋果 | 逛商店 | 看電視劇 | 打羽毛球 | 吃桔子 |
小紅 | 3.5 | 5 | 5 | 5 | |
xxx | 3.5 | 5 | 5 |
分别以小紅和xxx的興趣評分組成向量va和vb,運用多元餘弦相似公式,設向量 A =
(A1,A2,...,An),B = (B1,B2,...,Bn) :
計算得相似度為:0.9954774432988771
具體實作代碼:
import java.util.ArrayList;
public class SimilarityMain {
public static double similarity(ArrayList va, ArrayList vb) {
if (va.size() > vb.size()) {
int temp = va.size() - vb.size();
for (int i = 0; i < temp; i++) {
vb.add(0);
}
} else if (va.size() < vb.size()) {
int temp = vb.size() - va.size();
for (int i = 0; i < temp; i++) {
va.add(0);
}
}
int size = va.size();
double simVal = 0;
double num = 0;
double den = 1;
double powa_sum = 0;
double powb_sum = 0;
for (int i = 0; i < size; i++) {
double a = Double.parseDouble(va.get(i).toString());
double b = Double.parseDouble(vb.get(i).toString());
num = num + a * b;
powa_sum = powa_sum + (double) Math.pow(a, 2);
powb_sum = powb_sum + (double) Math.pow(b, 2);
}
double sqrta = (double) Math.sqrt(powa_sum);
double sqrtb = (double) Math.sqrt(powb_sum);
den = sqrta * sqrtb;
simVal = num / den;
return simVal;
}
public static void main(String[] args) {
String item[] = {"吃蘋果", "逛商店", "看電視劇", "打羽毛球", "吃桔子"};
float a[] = {(float) 4.5, 5, 5, 5,0};
float b[] = {(float) 3.5, 5, 5, 5,0};
ArrayList vitem = new ArrayList();
ArrayList<Float> va = new ArrayList();
ArrayList<Float> vb = new ArrayList();
for (int i = 0; i < a.length; i++)
{
vitem.add(item[i]);
va.add(new Float(a[i]));
vb.add(new Float(b[i]));
}
System.out.print("興趣");
System.out.println(vitem);
System.out.print("小紅");
System.out.println(va);
System.out.print("xxx");
System.out.println(vb);
SimilarityMain sim = new SimilarityMain();
double simVal = sim.similarity(va, vb);
System.out.println("The sim value is:" + simVal);
}
}
源碼gitee位址:
https://gitee.com/jockhome/