数据的内存和硬盘使用 – 关系型数据库通常是驻留在一个硬盘内或一个网络存储空间里。sql查询或存储过程操作会把数据集提取到内存空间里。一些 (并不是全部) nosql 数据库可以直接在硬盘上操作,也可以通过内存来加快速度。
面向文档型(document-oriented), 面向集合型(collection-oriented), 面向列型(column-oriented),
在一次邮件交流里,charlie caro 对我说了下面的话:”如果 facebook 需要去管理 100,000,000 个用户的个人信息,一个分布式的、不依赖于环境的,、key-value 形式的存储模式是最适合不过了。在这样大数量的用户里查询会没有问题,但只要一个用户的更新操作就可能让传统的数据库过载宕机。多用户读数据时一个用户更新数据,这需要并发控制。在多数情况下, nosql 方案之所以能吸引它的用户群的原因是它的易于安装和使用的特征, sql 数据库需要较多的运行条件(schema 等), 但正是这些schema方案给了并行关系型数据系统的高性能。易使用的好处更多的是体现在编程开发的时候。今天的许多程序员都更倾向于使用脚本语言,而不是相同功能的更安全的静态类型检查的编译型语言。脚本型语言只是容错性强和易于上手,有些软件能把这些脚本程序编译成 .net/java 字节码来提高运行性能。” 我和他都认为,所有的这一切都是为了让我们在工作中有更好的工具使用,而且从来都是这样!当有螺丝刀时谁还用锤子去钉螺丝钉。
你想象不到,如今竟然有了那么多开源的/非开源的nosql数据库产品。而同时,每天都有新的品种出现。如果我的列举中遗漏了你喜爱的nosql数据库,请发评论告诉我。下面你将看到的就是各种不同类型的nosql数据库产品:面向文档的,面向集合的,面向列的,面向对象的,面向图的,面向有序集合的,面向行的,等等。
<a target="_blank"></a>
公司/组织:
franz inc.
类型:
graph
简介:
modern, high performance, persistent graph database.
存储方案:
disk based, meta-data and data triples.
api(s):
sparql, prolog
oracle
key/value
c language embeddable library for enterprise-grade, concurrent,transactional storage services. thread safe to avoid data corruption or loss
b-tree, hash table, persistent queue
c, c++ and java
备注:
sparse, distributed, persistent multidimensional sorted map.
distributed storage system for structured data. data model provides dynamic control over data layout and format. data can live in memory or on disk.
data is stored as an uninterpreted array of bytes. client applications can create structured and semi-structured data inside the byte arrays.
python, gql, sawzall api, rest, various.
apache
dimensional hash table
family data model.
clusters of multiple keyspaces. the keyspace is a name space for column families. columns are comprised of a name, value and timestamp.
java, ruby, perl, python, c#, thrift framework.
document
distributed database with incremental replication, bi-directional conflict detection and management.
ad-hoc and schema-free with a flat address space.
restful json api. javascript query language.
versant
object
java and .net dual license (commercial and open source) object database.
data objects are stored in the way they are defined in the application.
java, .net languages.
millstone creative works
json-based
schemaless database similar to amazon’s simpledb. open source, standalone java application server.
json data format, “bags” (similar to tables).
http and javascript apis
cliff moon
distributed key/valve store, pluggable storage engines.
thrift api
ibm
in-memory grid/cache
distributed cache processes, partitions, replicates and manages data across servers.
data and database cache, “near cache” for local subset of data. java persistent cache. map reduce support.
java apis, rest data service
fis
hierarchical, multi-dimensional sparse arrays, content associative memory
small footprint, multi-dimensional array with fill support for acid transactions, optimistic concurrency and software transactional memory.
unstructured array of bytes. can be key/value, document oriented, schema-less, dictionary or any other data model.
mumps, c/c++, sql
christoph rupp
embedded storage library
lightweight embedded database engine. supports on disk and in memory databases.
b+tree with variable length keys.
c++, python, .net and java
open source, distributed, column-oriented, “bigtable like” store
data row has a sortable row key and an arbitrary number of columns, each containing arrays of bytes.
java api, thrift api, restful api
zvents inc.
high performance distributed data storage system designed to run on distributed filesystems (but can run on local filesystems). modeled
after google bigtable.
row key (primary key), column family, column qualifier, time stamp.
c++, thrift api, hql
jboss community
grid/cache
scalable, highly available, peer to peer, data grid platform.
key/value pair with optional expiration lifespan.
java, php, python, ruby, c
internet graph database made up on nodes and edges. supports in-memory and persistent storage alternatives including rdbms, file system, file grid, and custom storage.
nodes (meshobjects) and edges (relationships). meshobjects can have entity types, properties and participage in relationships. meshobjects raise events.
restful web services.
scalien
distributed (master/slave) key-value data store delivering strong consistency, fault-tolerance and high availability.
uses berkeleydb library for for local storage. key/value pairs and their state are replicated to multiple servers.
c/c++, python, php, http
high performance, high realiability persistent storage engine for key/value object storage.
uses berkeleydb as storage library/backend.
memcache protocol, c, python, java, perl
ericsson
multiuser distributed database including support for replication and dynamic reconfiguration.
organized as a set of tables made up of erlang records. tables also have properties including type location, persistence, etc.
erlang
10gen
scalable, high-performance, open source, schema-free, document-oriented database
json-like data schemas, dynamic queries, indexing, replication, mapreduc
c,c++, java, javascript, perl, php, python, ruby, c#, erlang, go, groovy, haskell, scala, f#
neo technology
embedded, small footprint, disk based, transactional graph database written in java. dual license – free and commercial.
graph-oriented data model with nodes, relationships and properties.
java, python, ruby, scala, groovy, php, restful api.
key/value store with the dataset kept in memory and saved to disk asynchronously. “not just another key-value db”
values can be strings, lists sets and sorted sets.
python, ruby, php, erlang, lua, c, c#, java, scala, perl
amazon
item/attribute/value
scalable web service providing data storage, query and indexing in amazon’s cloud.
items (like rows of data), attributes (like column headers), and values (can be multiple values)
soap, rest
<a href="http://1978th.net/" target="_blank">mikio hirabayashi</a>
library (written in c) of functions for managing files of key/value pairs. multi-thread support.
keys and values can have variable byte length. binary data and strings can be used as a key and a value.
c, perl, ruby, java, lua.
hash table
“it is basically just a big, distributed, persistent, fault-tolerant hash table.” high performance and availability.
each key is unique to a store. each key can have at most one value. supported types: json, string, identity, protobuf, java-serialization.
java, c++, custom clients
有如此多的非关系型数据库可选择真是一件好事。积累一些nosql相关的知识和初步体验能帮助管理人员、架构师、开发人员将所知道的关系型数据库的长处和短处跟nosql数据库进行对比。关系型数据库和sql查询语言目前在各种数据库应用程序的设计、开发和管理过程中仍是主要元素和中枢系统。但当我们需要开始使用云数据库结构时,所有的我们了解的知识和收集的资料都能保证我们能迅速的进行迁移。这完全是根据用户和业务的需求,我们才能做出到底是使用现有的关系型数据库技术还是使用nosql进行替换。
如果你想收集更多的关于 nosql 和 非关系型数据库的信息,请参考下面的一些网站,博客和文章:
codemash january 14, 2010.
下面是几个将要举行的和最近刚举行的关于 nosql 的会议,架构师和开发人员能从这些会议里得到很有价值的信息。下面列出的只是其中的一部分:
emil eifrem (neo4j) commented: “you talk about scaling to size and
handling facebook’s 100m user profiles. that’s an important use case and
one that for example a key-value store handles brilliantly. but it
turns out most companies aren’t facebook. you can categorize the four
emerging categories of nosql databases (key-value stores, column family
stores, document dbs and graph databases) along the axes of scaling to
excels at representing complex and rapidly evolving domain models and
then traversing them with high performance.”
mongo-db developer commented: “we have seen the most common use case
to date being use of nosql solutions as operational data store of web
infrastructure projects. by operational, i mean, problems with real time
writes and reads (contrast with data warehousing with bulk occasional
loading). for these sort of problems these solutions work well and also
fit well with agile development methods where the somewhat ‘schemaless’
(or more accurately, columnless) nature of some of the solutions, and
the dynamically typed nature of the storage, really helps.”
peter r commented: “i have already seen, in the domain i work in,
the movement away from straight up sql databases. xml databases are one
technology that will be stealing a lot of sql’s thunder (if they haven’t
already). do i think sql will ever die? no. but the key is that there
will be/are more options that need to be thought about when designing a
system now.”
anonymous commented: “i agree object databases have a purpose. they
are great for large datasets that need to be replicated and called by a
key. however sql provides a very important capability and that it is to
be able to query data across a number of datasets very efficiently, this
will be very hard to duplicate in a simple key value database.”
johannes ernst commented: “one of the difficulties for “normal”
developers with many of the nosql technologies that you’ve described so
far has been the learning curve and the additional work required: e.g.
it’s easy and everybody knows how to put “every customer can place one
or more orders” into a relational database, but what if the only thing
you have is keys and opaque values? compared to many other nosql
alternatives, graph databases provide a high level of abstraction,
freeing developers to concentrate on their application, while still
involved in, you can define “customer” and “order” and their
relationship, and the infogrid graph database takes care of storing and
retrieving data and enforcing the relationship. in our experience, that
makes graph databases much more approachable to developers than many
other nosql technologies.”
database-ed commented: “the problem is that when folks think about
storing information that they need to retrieve, they are so ingrained to
sql that they fail to think of other means. the facebook example is a
case in point. who is ever going to ask for an accurate report of every
user in facebook? if you miss something the first time you go looking,
you can always present it later. the end user doesn’t know you lost it,
they assume it didn’t exist at the time and now it does. yet you still
need to store the data for easy retrieval. one problem with sql is that
it ties you into the relationships. facebook is about letting people
build the relationships based on the fields they want to build them on,
not the ones you might think of. i know, it can be done within the
confines of sql, but it is a lot harder to do when the size gets large.”
“some tasks that are poorly serviced by sql may get switched over to a
new method, but other implementations that are perfectly suited to sql
will continue using it. as they quoted eric evans in the article, “the
whole point of seeking alternatives is that you need to solve a problem
that relational databases are a bad fit for.”
migration away from sql and the like any time soon, i think more web
developers will start experimenting with data stores and other data
solutions as we move further into the cloud.”
“and as companies turn to ask their sql dbas what they think of this,
they’ll say “lets stick with sql.” honestly, there are so many people
that support sql right now that will not switch any time soon this
article is just bogus. you can’t make a switch like that until people
can support it properly.”
sort of analytics and data mining. great for workflow and such.”
significance of the nosql movement is that it adds new tools that offer
better solutions to specific problems. the future probably belongs to
nosql in the sense of ‘not-only sql’, rather than ‘no sql’. don’t
imagine that nosql solutions offer a free lunch though. i had an
educational experience when i changed a view definition in a couchdb
data store and my first trivial query took an hour to come back. couchdb
can be pleasingly fast when all its indexes are built, but if you have
to rebuild those indexes from scratch … well, let’s just say that’s
not something you want to do on a live client-facing site.”
cassandra, a distributed data store in the vein of which the article is
“sql will be around for awhile. it’s good at doing what it was designed
to do. however, there are many times when people use sql simply because
there is nothing better out there. as data complexity rises, a new
method for accessing and persisting that data will have to be
investigated. part of the problem with many of the alternate solutions
is that few people know how to use them.”
数年以后,我估计我们大多数还是要依赖于关系数据库和sql。我当然有愿望,我将会不断的研究寻找更好的方式去弱化和封装数据访问操作。一直以来, 任何工程决策都是跟用户和业务需求相适应的。对于以后的软件工程来说,我相信,
我们一定会找到一个合适的非关系型数据存储产品。你是否正在使用非关系型数据库呢?你是否已经放弃了sql和关系型数据库呢?你是否正在把你的数据转移到 一个公共的或私有的云数据库里呢?请发表评论。
<b>原文发布时间为:2012-02-21</b>
<b>本文来自云栖社区合作伙伴“linux中国”</b>