mongodb shell 統計相關指令

1 count

db.tbPosition.find().count(); #集合的總記錄數

db.tbPosition.find({Acnt_Id:437}).count() #符合條件的記錄總數

db.tbPosition.count({Acnt_Id:407});

2 distinct

db.tbPosition.distinct('Acnt_Id'); #以集合的方式傳回不重複Acnt_Id的總記錄數

3 group

db.tbPosition.group({'key':{'Acnt_Id':1},'$reduce':function(doc,prev)

{

if(doc._id>prev._id)

prev._id= doc._id

'initial':{'_id':1}

});

#類似 select Acnt_Id,count(1) from tbPosition group by Acnt_Id

db.runCommand({'group':{'ns':'tbPosition','key':{'Acnt_Id':1},'$reduce':function(doc,prev)

{

if(doc._id>prev._id)

prev._id= doc._id

'initial':{'_id':1}

}

});

db.tbPosition.group({'key':{'Acnt_Id':1},'$reduce':function(doc,prev)

{

if(doc._id>prev._id)

prev._id= doc._id

'initial':{'_id':1},

'condition':{'Acnt_Id':{$lt:100000,$gt:500}}

});

#類似 select Acnt_Id,count(1) from tbPosition where Acnt_Id<10000 and Acnt_Id>500 group by Acnt_Id

db.tbPosition.group({'key':{'Acnt_Id':1},'$reduce':function(doc,prev)

{

if(doc._id>prev._id)

prev._id= doc._id

'initial':{'_id':1},

'condition':{'Acnt_Id':{$lt:100000,$gt:100}},

'finalize':function(prev)

{

prev._id=''+parseInt(prev._id)

}

});

#finalize: 可以對輸出的結果修正

4 mapReduce

db.tbPosition.mapReduce(function(){var key=this.Acnt_Id;emit(key,{count:1})},

function(key,emits)

{

total=0;

for(var i in emits)

{

total+=emits[i].count;

}

return {"count":total}

{out:'mr'},

{query:{'Acnt_Id':{$lt:100000,$gt:100}}}

);

#結果輸出到了mr集合中，可在集合mr中看結果

db.mr.find();

5 aggregate

db.tbJobPosition.aggregate([

{$match:{'Acnt_Id':{$lt:100000,$gt:100}}},

{$group:{_id:"$Acnt_Id",sum:{'$sum':"$_id"}}}

]);

$project:

用于選擇從收集的一些具體字段。資料投影，主要用于重命名、增加和删除字段如：

db.article.aggregate(

{ $project : {

title : 1 ,

author : 1 ,

}}

);

這樣的話結果中就隻還有_id,tilte和author三個字段了，預設情況下_id字段是被包含的，如果要想不包含_id話可以這樣:

db.article.aggregate(

{ $project : {

_id : 0 ,

title : 1 ,

author : 1

}});

也可以在$project内使用算術類型表達式操作符，例如：

db.article.aggregate(

{ $project : {

title : 1,

doctoredPageViews : { $add:["$pageViews", 10] }

}});

通過使用$add給pageViews字段的值加10，然後将結果指派給一個新的字段:doctoredPageViews

注:必須将$add計算表達式放到中括号裡面

除此之外使用$project還可以重命名字段名和子文檔的字段名:

db.article.aggregate(

{ $project : {

title : 1 ,

page_views : "$pageViews" ,

bar : "$other.foo"

}});

也可以添加子文檔：

db.article.aggregate(

{ $project : {

title : 1 ,

stats : {

pv : "$pageViews",

foo : "$other.foo",

dpv : { $add:["$pageViews", 10] }

}

}});

産生了一個子文檔stats,裡面包含pv,foo,dpv三個字段。

$match:

這是一個濾波操作，是以可以減少量，作為下一階段的輸入給定的文檔。相當于 where

$match的文法和查詢表達式(db.collection.find())的文法相同

db.articles.aggregate( [

{ $match : { score : { $gt : 70, $lte : 90 } } },

{ $group: { _id: null, count: { $sum: 1 } } }

] );

$match用于擷取分數大于70小于或等于90記錄，然後将符合條件的記錄送到下一階段$group管道操作符進行處理。

注意：1.不能在$match操作符中使用$where表達式操作符。

2.$match盡量出現在管道的前面，這樣可以提早過濾文檔，加快聚合速度。

3.如果$match出現在最前面的話，可以使用索引來加快查詢。

$group:

_id必須的,"$Acnt_Id"是指定的統計字段 #sum 随便定的輸出字段名 $sum 求和 “$_id"求和的字段

$group的時候必須要指定一個_id域，同時也可以包含一些算術類型的表達式操作符：

db.article.aggregate(

{ $group : {

_id : "$author",

docsPerAuthor : { $sum : 1 },

viewsPerAuthor : { $sum : "$pageViews" }

}});

注意： 1.$group的輸出是無序的。

2.$group操作目前是在記憶體中進行的，是以不能用它來對大量個數的文檔進行分組。

$sort:

db.users.aggregate( { $sort : { age : -1, posts: 1 } });

按照年齡進行降序操作，按照posts進行升序操作

注意：1.如果将$sort放到管道前面的話可以利用索引，提高效率

2.MongoDB 24.對記憶體做了優化，在管道中如果$sort出現在$limit之前的話，$sort隻會對前$limit個文檔進行操作，這樣在記憶體中也隻會保留前$limit個文檔，進而可以極大的節省記憶體

3.$sort操作是在記憶體中進行的，如果其占有的記憶體超過實體記憶體的10%，程式會産生錯誤

$skip:

與此有可能向前跳過的檔案清單中的一個給定的的文檔數量。

$skip參數也隻能為一個正整數

db.article.aggregate(

{ $skip : 5 });

經過$skip管道操作符處理後，前五個文檔被“過濾”掉

$limit:

這限制了的文檔數量看一下由從目前位置開始的給定數

$limit的參數隻能是一個正整數

db.article.aggregate(

{ $limit : 5 });

這樣的話經過$limit管道操作符處理後，管道内就隻剩下前5個文檔了

$unwind:

這是用來平倉文檔的中使用數組。使用數組時，資料是一種pre-joinded，再次有個别檔案，此操作将被取消。是以，這個階段，數量會增加檔案的下一階段。

例如:article文檔中有一個名字為tags數組字段：

> db.article.find()

{ "_id" : ObjectId("528751b0e7f3eea3d1412ce2"),

"author" : "Jone", "title" : "Abook",

"tags" : [ "good", "fun", "good" ] }

使用$unwind操作符後：

> db.article.aggregate({$project:{author:1,title:1,tags:1}},{$unwind:"$tags"})

{

"result" : [

{

"_id" : ObjectId("528751b0e7f3eea3d1412ce2"),

"author" : "Jone",

"title" : "A book",

"tags" : "good"

{

"_id" : ObjectId("528751b0e7f3eea3d1412ce2"),

"author" : "Jone",

"title" : "A book",

"tags" : "fun"

{

"_id" : ObjectId("528751b0e7f3eea3d1412ce2"),

"author" : "Jone",

"title" : "A book",

"tags" : "good"

}

"ok" : 1

}

注意：a.{$unwind:"$tags"})不要忘了$符号

b.如果$unwind目标字段不存在的話，那麼該文檔将被忽略過濾掉，例如：

> db.article.aggregate({$project:{author:1,title:1,tags:1}},{$unwind:"$tag"})

{ "result" : [ ], "ok" : 1 }

将$tags改為$tag因不存在該字段，該文檔被忽略，輸出的結果為空

c.如果$unwind目标字段不是一個數組的話，将會産生錯誤，例如：

> db.article.aggregate({$project:{author:1,title:1,tags:1}},{$unwind:"$title"})

Error: Printing Stack Trace

at printStackTrace (src/mongo/shell/utils.js:37:15)

at DBCollection.aggregate (src/mongo/shell/collection.js:897:9)

at (shell):1:12

Sat Nov 16 19:16:54.488 JavaScript execution failed: aggregate failed: {

"errmsg" : "exception: $unwind: value at end of field path must be an array",

"code" : 15978,

"ok" : 0

} at src/mongo/shell/collection.js:L898

d.如果$unwind目标字段數組為空的話，該文檔也将會被忽略。

$goNear

$goNear會傳回一些坐标值，這些值以按照距離指定點距離由近到遠進行排序

具體使用參數見下表:

Field	Type	Description
near	GeoJSON point orlegacy coordinate pairs	The point for which to find the closest documents.
distanceField	string	The output field that contains the calculated distance. To specify a field within a subdocument, use dot notation.
limit	number	Optional. The maximum number of documents to return. The default value is 100. See also the num option.
num	number	Optional. The num option provides the same function as the limitoption. Both define the maximum number of documents to return. If both options are included, the num value overrides the limit value.
maxDistance	number	Optional. A distance from the center point. Specify the distance in radians. MongoDB limits the results to those documents that fall within the specified distance from the center point.
query	document	Optional. Limits the results to the documents that match the query. The query syntax is the usual MongoDB read operation query syntax.
spherical	Boolean	Optional. If true, MongoDB references points using a spherical surface. The default value is false.
distanceMultiplier	number	Optional. The factor to multiply all distances returned by the query. For example, use the distanceMultiplier to convert radians, as returned by a spherical query, to kilometers by multiplying by the radius of the Earth.
includeLocs	string	Optional. This specifies the output field that identifies the location used to calculate the distance. This option is useful when a location field contains multiple locations. To specify a field within a subdocument, usedot notation.
uniqueDocs	Boolean	Optional. If this value is true, the query returns a matching document once, even if more than one of the document’s location fields match the query. If this value is false, the query returns a document multiple times if the document has multiple matching location fields. See $uniqueDocsfor more information.

例如：db.places.aggregate([

{$geoNear: {

near: [40.724, -73.997],

distanceField: "dist.calculated",

maxDistance: 0.008,

query: { type: "public" },

includeLocs: "dist.location",

uniqueDocs: true,

num: 5

}

])

其結果為：

{

"result" : [

{ "_id" : 7,

"name" : "Washington Square",

"type" : "public",

"location" : [

[ 40.731, -73.999 ],

[ 40.732, -73.998 ],

[ 40.730, -73.995 ],

[ 40.729, -73.996 ]

"dist" : {

"calculated" : 0.0050990195135962296,

"location" : [ 40.729, -73.996 ]

}

{ "_id" : 8,"name" : "Sara D. Roosevelt Park",

"type" : "public",

"location" : [

[ 40.723, -73.991 ],

[ 40.723, -73.990 ],

[ 40.715, -73.994 ],

[ 40.715, -73.994 ]

"dist" : {

"calculated" : 0.006082762530298062,

"location" : [ 40.723, -73.991 ]

}

"ok" : 1}

其中，dist.calculated中包含了計算的結果，而dist.location中包含了計算距離時實際用到的坐标

注意： 1.使用$goNear隻能在管道處理的開始第一個階段進行

2.必須指定distanceField，該字段用來決定是否包含距離字段

3.$gonNear和geoNear指令比較相似，但是也有一些不同:distanceField在$geoNear中是必選的，而在 geoNear中是可選的；includeLocs在$geoNear中是string類型，而在geoNear中是boolen類型。

============

SQL Terms MongoDB Aggregation Operators

WHERE $match

GROUP BY $group

HAVING $match

SELECT $project

ORDER BY $sort

LIMIT $limit

SUM() $sum

COUNT() $sum

執行個體：

SQL Example	MongoDB Example
SELECT COUNT(*) AS count FROM mycol	db.mycol.aggregate( [{ $group: { _id: null,count: { $sum: 1 } } }] )
SELECT SUM(price) AS total FROM mycol	db.mycol.aggregate( [{ $group: { _id: null,total: { $sum: "$price" } } }] )
SELECT cust_id, SUM(price) AS totalFROM mycolGROUP BY cust_id	db.mycol.aggregate( [{ $group: { _id: "$cust_id",total: { $sum: "$price" } } }] ) #_ID是必須，但可以為NULL,
SELECT cust_id,SUM(price) AS total FROM mycol GROUP BY cust_id ORDER BY total	db.mycol.aggregate( [{ $group: { _id: "$cust_id",total: { $sum: "$price" } } },{ $sort: { total: 1 } }] )
SELECT cust_id,ord_date, SUM(price) AS total FROM mycol GROUP BY cust_id, ord_date	db.mycol.aggregate( [{ $group: { _id: { cust_id: "$cust_id", ord_date: "$ord_date" },total: { $sum: "$price" } } }] )
SELECT cust_id, count()FROM mycol GROUP BY cust_id HAVING count() > 1	db.mycol.aggregate( [{ $group: { _id: "$cust_id",count: { $sum: 1 } } },{ $match: { count: { $gt: 1 } } }] )
SELECT cust_id,ord_date,SUM(price) AS total FROM mycol GROUP BY cust_id, ord_date HAVING total > 250	db.mycol.aggregate( [{ $group: { _id: { cust_id: "$cust_id", ord_date: "$ord_date" },total: { $sum: "$price" } } },{ $match: { total: { $gt: 250 } } }] )
SELECT cust_id, SUM(price) as total FROM mycol WHERE status = 'A' GROUP BY cust_id	db.mycol.aggregate( [{ $match: { status: 'A' } },{ $group: { _id: "$cust_id",total: { $sum: "$price" } } }] )
SELECT cust_id,SUM(price) as total FROM mycol WHERE status = 'A' GROUP BY cust_id HAVING total > 250	db.mycol.aggregate( [{ $match: { status: 'A' } },{ $group: { _id: "$cust_id",total: { $sum: "$price" } } },{ $match: { total: { $gt: 250 } } }] )
SELECT cust_id,SUM(li.qty) as qty FROM mycol o, order_lineitem li WHERE li.order_id = o.id GROUP BY cust_id	db.mycol.aggregate( [{ $unwind: "$items" },{ $group: { _id: "$cust_id",qty: { $sum: "$items.qty" } } }] )
SELECT COUNT(*) FROM (SELECT cust_id, ord_date FROM mycol GROUP BY cust_id, ord_date) as DerivedTable	db.mycol.aggregate( [{ $group: { _id: { cust_id: "$cust_id", ord_date: "$ord_date" } } },{ $group: { _id: null, count: { $sum: 1 } } }])

$sum 總結從集合中的所有檔案所定義的值. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])

$avg 從所有文檔集合中所有給定值計算的平均. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])

$min 擷取集合中的所有檔案中的相應值最小. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])

$max 擷取集合中的所有檔案中的相應值的最大. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])

$push 值插入到一個數組生成文檔中. db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])

$addToSet 值插入到一個數組中所得到的文檔，但不會建立重複. db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])

$first 根據分組從源文檔中擷取的第一個文檔。通常情況下，這才有意義，連同以前的一些應用 “$sort”-stage. db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])

$last 根據分組從源文檔中擷取最後的文檔。通常，這才有意義，連同以前的一些應用 “$sort”-stage. db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

聚合操作符

比較類型聚合操作符

Name	Description
$cmp	Compares two values and returns the result of the comparison as an integer.
$eq	Takes two values and returns true if the values are equivalent.
$gt	Takes two values and returns true if the first is larger than the second.
$gte	Takes two values and returns true if the first is larger than or equal to the second.
$lt	Takes two values and returns true if the second value is larger than the first.
$lte	Takes two values and returns true if the second value is larger than or equal to the first.
$ne	Takes two values and returns true if the values are not equivalent.

算術類型聚合操作符

Name	Description
$add	Computes the sum of an array of numbers.
$divide	Takes two numbers and divides the first number by the second.
$mod	Takes two numbers and calcualtes the modulo of the first number divided by the second.
$multiply	Computes the product of an array of numbers.
$subtract	Takes two numbers and subtracts the second number from the first.

字元串類型聚合操作符

Name	Description
$concat	Concatenates two strings.
$strcasecmp	Compares two strings and returns an integer that reflects the comparison.
$substr	Takes a string and returns portion of that string.
$toLower	Converts a string to lowercase.
$toUpper	Converts a string to uppercase.

日期類型聚合操作符

Name	Description
$dayOfYear	Converts a date to a number between 1 and 366.
$dayOfMonth	Converts a date to a number between 1 and 31.
$dayOfWeek	Converts a date to a number between 1 and 7.
$year	Converts a date to the full year.
$month	Converts a date into a number between 1 and 12.
$week	Converts a date into a number between 0 and 53
$hour	Converts a date into a number between 0 and 23.
$minute	Converts a date into a number between 0 and 59.
$second	Converts a date into a number between 0 and 59. May be 60 to account for leap seconds.
$millisecond	Returns the millisecond portion of a date as an integer between 0 and 999.

條件類型聚合操作符

Name	Description
$cond	A ternary operator that evaluates one expression, and depending on the result returns the value of one following expressions.
$ifNull	Evaluates an expression and returns a value.

注：以上操作符都必須在管道操作符的表達式内來使用。各個表達式操作符的具體使用方式參見： http://docs.mongodb.org/manual/reference/operator/aggregation-group/

聚合管道的優化

1.$sort + $skip + $limit順序優化

如果在執行管道聚合時，如果$sort、$skip、$limit依次出現的話，例如：

{ $sort: { age : -1 } },

{ $skip: 10 },

{ $limit: 5 }

那麼實際執行的順序為：

{ $sort: { age : -1 } },

{ $limit: 15 },

{ $skip: 10 }

$limit會提前到$skip前面去執行。

此時$limit = 優化前$skip+優化前$limit

這樣做的好處有兩個:1.在經過$limit管道後，管道内的文檔數量個數會“提前”減小，這樣會節省記憶體，提高記憶體利用效率。2.$limit提前後，$sort緊鄰$limit這樣的話，當進行$sort的時候當得到前“$limit”個文檔的時候就會停止。

2.$limit + $skip + $limit + $skip Sequence Optimization

如果聚合管道内反複出現下面的聚合序列：

{ $limit: 100 },

{ $skip: 5 },

{ $limit: 10},

{ $skip: 2 }

首先進行局部優化為：可以按照上面所講的先将第二個$limit提前：

{ $limit: 100 },

{ $limit: 15},

{ $skip: 5 },

{ $skip: 2 }

進一步優化：兩個$limit可以直接取最小值，兩個$skip可以直接相加:

{ $limit: 15 },

{ $skip: 7 }

3.Projection Optimization

過早的使用$project投影，設定需要使用的字段，去掉不用的字段，可以大大減少記憶體。除此之外也可以過早使用我們也應該過早使用$match、$limit、$skip操作符，他們可以提前減少管道内文檔數量，減少記憶體占用，提供聚合效率。除此之外，$match盡量放到聚合的第一個階段，如果這樣的話$match相當于一個按條件查詢的語句，這樣的話可以使用索引，加快查詢效率。

聚合管道的限制

1.類型限制

在管道内不能操作 Symbol, MinKey, MaxKey, DBRef, Code, CodeWScope類型的資料( 2.4版本解除了對二進制資料的限制).

2.結果大小限制

管道線的輸出結果不能超過BSON 文檔的大小（16M),如果超出的話會産生錯誤.

3.記憶體限制

如果一個管道操作符在執行的過程中所占有的記憶體超過系統記憶體容量的10%的時候，會産生一個錯誤。當$sort和$group操作符執行的時候，整個輸入都會被加載到記憶體中，如果這些占有記憶體超過系統記憶體的%5的時候，會将一個warning記錄到日志檔案。同樣，所占有的記憶體超過系統記憶體容量的10%的時候，會産生一個錯誤。

分片上使用聚合管道

聚合管道支援在已分片的集合上進行聚合操作。當分片集合上進行聚合操縱的時候，聚合管道被分為兩成兩個部分，分别在mongod執行個體和mongos上進行操作。

聚合管道使用

首先下載下傳測試資料:http://media.mongodb.org/zips.json 并導入到資料庫中。

1.查詢各州的人口數

var connectionString = ConfigurationManager.AppSettings["MongodbConnection"];

var client = new MongoClient(connectionString);

var DatabaseName = ConfigurationManager.AppSettings["DatabaseName"];

string collName = ConfigurationManager.AppSettings["collName"];

MongoServer mongoDBConn = client.GetServer();

MongoDatabase db = mongoDBConn.GetDatabase(DatabaseName);

MongoCollection<BsonDocument> table = db[collName];

var group = new BsonDocument

{

{"$group", new BsonDocument

{

"_id","$state"

{

"totalPop", new BsonDocument

{

{ "$sum","$pop" }

}

};

var sort = new BsonDocument

{

{"$sort", new BsonDocument{ { "_id",1 }}}

};

var pipeline = new[] { group, sort };

var result = table.Aggregate(pipeline);

var matchingExamples = result.ResultDocuments.Select(x => x.ToDynamic()).ToList();

foreach (var example in matchingExamples)

{

var message = string.Format("{0}- {1}", example["_id"], example["totalPop"]);

Console.WriteLine(message);

}

2.計算每個州平均每個城市打人口數

> db.zipcode.aggregate({$group:{_id:{state:"$state",city:"$city"},pop:{$sum:"$pop"}}},

{$group:{_id:"$_id.state",avCityPop:{$avg:"$pop"}}},

{$sort:{_id:1}})

var group1 = new BsonDocument

{

{"$group", new BsonDocument

{

"_id",new BsonDocument

{

{"state","$state"},

{"city","$city"}

}

{

"pop", new BsonDocument

{

{ "$sum","$pop" }

}

};

var group2 = new BsonDocument

{

{"$group", new BsonDocument

{

"_id","$_id.state"

{

"avCityPop", new BsonDocument

{

{ "$avg","$pop" }

}

};

var pipeline1 = new[] { group1,group2, sort };

var result1 = table.Aggregate(pipeline1);

var matchingExamples1 = result1.ResultDocuments.Select(x => x.ToDynamic()).ToList();

foreach (var example in matchingExamples1)

{

var message = string.Format("{0}- {1}", example["_id"], example["avCityPop"]);

Console.WriteLine(message);

}

3.計算每個州人口最多和最少的城市名字

>db.zipcode.aggregate({$group:{_id:{state:"$state",city:"$city"},pop:{$sum:"$pop"}}},

{$sort:{pop:1}},

{$group:{_id:"$_id.state",biggestCity:{$last:"$_id.city"},biggestPop:{$last:"$pop"},smallestCity:{$first:"$_id.city"},smallestPop:{$first:"$pop"}}},

{$project:{_id:0,state:"$_id",biggestCity:{name:"$biggestCity",pop:"$biggestPop"},smallestCity:{name:"$smallestCity",pop:"$smallestPop"}}})

var sort1 = new BsonDocument

{

{"$sort", new BsonDocument{ { "pop",1 }}}

};

var group3 = new BsonDocument

{

"$group", new BsonDocument

{

"_id","$_id.state"

{

"biggestCity",new BsonDocument

{

{"$last","$_id.city"}

}

{

"biggestPop",new BsonDocument

{

{"$last","$pop"}

}

{

"smallestCity",new BsonDocument

{

{"$first","$_id.city"}

}

{

"smallestPop",new BsonDocument

{

{"$first","$pop"}

}

};

var project = new BsonDocument

{

"$project", new BsonDocument

{

{"_id",0},

{"state","$_id"},

{"biggestCity",new BsonDocument

{

{"name","$biggestCity"},

{"pop","$biggestPop"}

}},

{"smallestCity",new BsonDocument

{

{"name","$smallestCity"},

{"pop","$smallestPop"}

}

};

var pipeline2 = new[] { group1,sort1 ,group3, project };

var result2 = table.Aggregate(pipeline2);

var matchingExamples2 = result2.ResultDocuments.Select(x => x.ToDynamic()).ToList();

foreach (var example in matchingExamples2)

{

Console.WriteLine(example.ToString());

//var message = string.Format("{0}- {1}", example["_id"], example["avCityPop"]);

//Console.WriteLine(message);

}

總結

對于大多數的聚合操作，聚合管道可以提供很好的性能和一緻的接口，使用起來比較簡單，和MapReduce一樣，它也可以作用于分片集合，但是輸出的結果隻能保留在一個文檔中，要遵守BSONDocument大小限制（目前是16M)。

管道對資料的類型和結果的大小會有一些限制，對于一些簡單的固定的聚集操作可以使用管道，但是對于一些複雜的、大量資料集的聚合任務還是使用MapReduce。

=========

部分内容摘自其他部落格論壇

mongodb shell 統計相關指令mongodb shell 統計相關指令

mongodb shell 統計相關指令

1 count

2 distinct

3 group

4 mapReduce

5 aggregate

$project:

$match:

$group:

$sort:

$skip:

$limit:

$unwind:

$goNear

執行個體：

聚合操作符

比較類型聚合操作符

聚合管道的優化

繼續閱讀

hadoop之MapReduce---OutputFormat資料輸出OutputFormat接口實作類自定義OutputFormat使用場景及步驟

Hadoop-MapReduce-OutputFormat資料輸出

MapReduce分組輸出到多個檔案

一篇文章讓你精通Java JSP規範

幾種常見的疊代器

一分鐘教你如何養護随車吊

大資料技術原理與應用（最後三天備考了！！！）

《Hive權威指南》第八章：HiveQL索引8 HiveQL：索引

MapReduce運作Wordcount時一直卡在INFO mapreduce.Job: Running job，web檢視一直處于accepted階段

MapReduce(一)：入門級程式wordcount及其分析

HiveQl語句應用執行個體：WordCount具體步驟如下：

用mapreduce計算wordCount和手機流量統計程式運作過程WordCount統計手機流量統計

Hadoop之運作wordcount

Eclipse運作WordCount（詳細版）相關連接配接Eclipse運作WordCount

專家訪談：搜尋開源力量：Lucene技術前景

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

mongodb shell 統計相關指令mongodb shell 統計相關指令

mongodb shell 統計相關指令

1 count

2 distinct

3 group

4 mapReduce

5 aggregate

$project:

$match:

$group:

$sort:

$skip:

$limit:

$unwind:

$goNear

執行個體：

聚合操作符 比較類型聚合操作符

聚合管道的優化

繼續閱讀

聚合操作符

比較類型聚合操作符