天天看点

SQL数据库的终结?

数据的内存和硬盘使用 – 关系型数据库通常是驻留在一个硬盘内或一个网络存储空间里。sql查询或存储过程操作会把数据集提取到内存空间里。一些 (并不是全部) nosql 数据库可以直接在硬盘上操作,也可以通过内存来加快速度。

面向文档型(document-oriented), 面向集合型(collection-oriented), 面向列型(column-oriented),

在一次邮件交流里,charlie caro 对我说了下面的话:”如果 facebook 需要去管理 100,000,000 个用户的个人信息,一个分布式的、不依赖于环境的,、key-value 形式的存储模式是最适合不过了。在这样大数量的用户里查询会没有问题,但只要一个用户的更新操作就可能让传统的数据库过载宕机。多用户读数据时一个用户更新数据,这需要并发控制。在多数情况下, nosql 方案之所以能吸引它的用户群的原因是它的易于安装和使用的特征, sql 数据库需要较多的运行条件(schema 等), 但正是这些schema方案给了并行关系型数据系统的高性能。易使用的好处更多的是体现在编程开发的时候。今天的许多程序员都更倾向于使用脚本语言,而不是相同功能的更安全的静态类型检查的编译型语言。脚本型语言只是容错性强和易于上手,有些软件能把这些脚本程序编译成 .net/java 字节码来提高运行性能。” 我和他都认为,所有的这一切都是为了让我们在工作中有更好的工具使用,而且从来都是这样!当有螺丝刀时谁还用锤子去钉螺丝钉。

你想象不到,如今竟然有了那么多开源的/非开源的nosql数据库产品。而同时,每天都有新的品种出现。如果我的列举中遗漏了你喜爱的nosql数据库,请发评论告诉我。下面你将看到的就是各种不同类型的nosql数据库产品:面向文档的,面向集合的,面向列的,面向对象的,面向图的,面向有序集合的,面向行的,等等。

<a target="_blank"></a>

公司/组织:

franz inc.

类型:

graph

简介:

modern, high performance, persistent graph database.

存储方案:

disk based, meta-data and data triples.

api(s):

sparql, prolog

oracle

key/value

c language embeddable library for enterprise-grade, concurrent,transactional storage services. thread safe to avoid data corruption or loss

b-tree, hash table, persistent queue

c, c++ and java

备注:

google

sparse, distributed, persistent multidimensional sorted map.

distributed storage system for structured data. data model provides dynamic control over data layout and format. data can live in memory or on disk.

data is stored as an uninterpreted array of bytes. client applications can create structured and semi-structured data inside the byte arrays.

python, gql, sawzall api, rest, various.

apache

dimensional hash table

family data model.

clusters of multiple keyspaces. the keyspace is a name space for column families. columns are comprised of a name, value and timestamp.

java, ruby, perl, python, c#, thrift framework.

document

distributed database with incremental replication, bi-directional conflict detection and management.

ad-hoc and schema-free with a flat address space.

restful json api. javascript query language.

versant

object

java and .net dual license (commercial and open source) object database.

data objects are stored in the way they are defined in the application.

java, .net languages.

millstone creative works

json-based

schemaless database similar to amazon’s simpledb. open source, standalone java application server.

json data format, “bags” (similar to tables).

http and javascript apis

cliff moon

distributed key/valve store, pluggable storage engines.

thrift api

ibm

in-memory grid/cache

distributed cache processes, partitions, replicates and manages data across servers.

data and database cache, “near cache” for local subset of data. java persistent cache. map reduce support.

java apis, rest data service

fis

hierarchical, multi-dimensional sparse arrays, content associative memory

small footprint, multi-dimensional array with fill support for acid transactions, optimistic concurrency and software transactional memory.

unstructured array of bytes. can be key/value, document oriented, schema-less, dictionary or any other data model.

mumps, c/c++, sql

christoph rupp

embedded storage library

lightweight embedded database engine. supports on disk and in memory databases.

b+tree with variable length keys.

c++, python, .net and java

open source, distributed, column-oriented, “bigtable like” store

data row has a sortable row key and an arbitrary number of columns, each containing arrays of bytes.

java api, thrift api, restful api

zvents inc.

high performance distributed data storage system designed to run on distributed filesystems (but can run on local filesystems). modeled

after google bigtable.

row key (primary key), column family, column qualifier, time stamp.

c++, thrift api, hql

jboss community

grid/cache

scalable, highly available, peer to peer, data grid platform.

key/value pair with optional expiration lifespan.

java, php, python, ruby, c

internet graph database made up on nodes and edges. supports in-memory and persistent storage alternatives including rdbms, file system, file grid, and custom storage.

nodes (meshobjects) and edges (relationships). meshobjects can have entity types, properties and participage in relationships. meshobjects raise events.

restful web services.

scalien

distributed (master/slave) key-value data store delivering strong consistency, fault-tolerance and high availability.

uses berkeleydb library for for local storage. key/value pairs and their state are replicated to multiple servers.

c/c++, python, php, http

high performance, high realiability persistent storage engine for key/value object storage.

uses berkeleydb as storage library/backend.

memcache protocol, c, python, java, perl

ericsson

multiuser distributed database including support for replication and dynamic reconfiguration.

organized as a set of tables made up of erlang records. tables also have properties including type location, persistence, etc.

erlang

10gen

scalable, high-performance, open source, schema-free, document-oriented database

json-like data schemas, dynamic queries, indexing, replication, mapreduc

c,c++, java, javascript, perl, php, python, ruby, c#, erlang, go, groovy, haskell, scala, f#

neo technology

embedded, small footprint, disk based, transactional graph database written in java. dual license – free and commercial.

graph-oriented data model with nodes, relationships and properties.

java, python, ruby, scala, groovy, php, restful api.

key/value store with the dataset kept in memory and saved to disk asynchronously. “not just another key-value db”

values can be strings, lists sets and sorted sets.

python, ruby, php, erlang, lua, c, c#, java, scala, perl

amazon

item/attribute/value

scalable web service providing data storage, query and indexing in amazon’s cloud.

items (like rows of data), attributes (like column headers), and values (can be multiple values)

soap, rest

<a href="http://1978th.net/" target="_blank">mikio hirabayashi</a>

library (written in c) of functions for managing files of key/value pairs. multi-thread support.

keys and values can have variable byte length. binary data and strings can be used as a key and a value.

c, perl, ruby, java, lua.

linkedin

hash table

“it is basically just a big, distributed, persistent, fault-tolerant hash table.” high performance and availability.

each key is unique to a store. each key can have at most one value. supported types: json, string, identity, protobuf, java-serialization.

java, c++, custom clients

有如此多的非关系型数据库可选择真是一件好事。积累一些nosql相关的知识和初步体验能帮助管理人员、架构师、开发人员将所知道的关系型数据库的长处和短处跟nosql数据库进行对比。关系型数据库和sql查询语言目前在各种数据库应用程序的设计、开发和管理过程中仍是主要元素和中枢系统。但当我们需要开始使用云数据库结构时,所有的我们了解的知识和收集的资料都能保证我们能迅速的进行迁移。这完全是根据用户和业务的需求,我们才能做出到底是使用现有的关系型数据库技术还是使用nosql进行替换。

如果你想收集更多的关于 nosql 和 非关系型数据库的信息,请参考下面的一些网站,博客和文章:

codemash january 14, 2010.

下面是几个将要举行的和最近刚举行的关于 nosql 的会议,架构师和开发人员能从这些会议里得到很有价值的信息。下面列出的只是其中的一部分:

emil eifrem (neo4j) commented: “you talk about scaling to size and

handling facebook’s 100m user profiles. that’s an important use case and

one that for example a key-value store handles brilliantly. but it

turns out most companies aren’t facebook. you can categorize the four

emerging categories of nosql databases (key-value stores, column family

stores, document dbs and graph databases) along the axes of scaling to

excels at representing complex and rapidly evolving domain models and

then traversing them with high performance.”

mongo-db developer commented: “we have seen the most common use case

to date being use of nosql solutions as operational data store of web

infrastructure projects. by operational, i mean, problems with real time

writes and reads (contrast with data warehousing with bulk occasional

loading). for these sort of problems these solutions work well and also

fit well with agile development methods where the somewhat ‘schemaless’

(or more accurately, columnless) nature of some of the solutions, and

the dynamically typed nature of the storage, really helps.”

peter r commented: “i have already seen, in the domain i work in,

the movement away from straight up sql databases. xml databases are one

technology that will be stealing a lot of sql’s thunder (if they haven’t

already). do i think sql will ever die? no. but the key is that there

will be/are more options that need to be thought about when designing a

system now.”

anonymous commented: “i agree object databases have a purpose. they

are great for large datasets that need to be replicated and called by a

key. however sql provides a very important capability and that it is to

be able to query data across a number of datasets very efficiently, this

will be very hard to duplicate in a simple key value database.”

johannes ernst commented: “one of the difficulties for “normal”

developers with many of the nosql technologies that you’ve described so

far has been the learning curve and the additional work required: e.g.

it’s easy and everybody knows how to put “every customer can place one

or more orders” into a relational database, but what if the only thing

you have is keys and opaque values? compared to many other nosql

alternatives, graph databases provide a high level of abstraction,

freeing developers to concentrate on their application, while still

involved in, you can define “customer” and “order” and their

relationship, and the infogrid graph database takes care of storing and

retrieving data and enforcing the relationship. in our experience, that

makes graph databases much more approachable to developers than many

other nosql technologies.”

database-ed commented: “the problem is that when folks think about

storing information that they need to retrieve, they are so ingrained to

sql that they fail to think of other means. the facebook example is a

case in point. who is ever going to ask for an accurate report of every

user in facebook? if you miss something the first time you go looking,

you can always present it later. the end user doesn’t know you lost it,

they assume it didn’t exist at the time and now it does. yet you still

need to store the data for easy retrieval. one problem with sql is that

it ties you into the relationships. facebook is about letting people

build the relationships based on the fields they want to build them on,

not the ones you might think of. i know, it can be done within the

confines of sql, but it is a lot harder to do when the size gets large.”

“some tasks that are poorly serviced by sql may get switched over to a

new method, but other implementations that are perfectly suited to sql

will continue using it. as they quoted eric evans in the article, “the

whole point of seeking alternatives is that you need to solve a problem

that relational databases are a bad fit for.”

migration away from sql and the like any time soon, i think more web

developers will start experimenting with data stores and other data

solutions as we move further into the cloud.”

“and as companies turn to ask their sql dbas what they think of this,

they’ll say “lets stick with sql.” honestly, there are so many people

that support sql right now that will not switch any time soon this

article is just bogus. you can’t make a switch like that until people

can support it properly.”

sort of analytics and data mining. great for workflow and such.”

significance of the nosql movement is that it adds new tools that offer

better solutions to specific problems. the future probably belongs to

nosql in the sense of ‘not-only sql’, rather than ‘no sql’. don’t

imagine that nosql solutions offer a free lunch though. i had an

educational experience when i changed a view definition in a couchdb

data store and my first trivial query took an hour to come back. couchdb

can be pleasingly fast when all its indexes are built, but if you have

to rebuild those indexes from scratch … well, let’s just say that’s

not something you want to do on a live client-facing site.”

cassandra, a distributed data store in the vein of which the article is

“sql will be around for awhile. it’s good at doing what it was designed

to do. however, there are many times when people use sql simply because

there is nothing better out there. as data complexity rises, a new

method for accessing and persisting that data will have to be

investigated. part of the problem with many of the alternate solutions

is that few people know how to use them.”

数年以后,我估计我们大多数还是要依赖于关系数据库和sql。我当然有愿望,我将会不断的研究寻找更好的方式去弱化和封装数据访问操作。一直以来, 任何工程决策都是跟用户和业务需求相适应的。对于以后的软件工程来说,我相信,

我们一定会找到一个合适的非关系型数据存储产品。你是否正在使用非关系型数据库呢?你是否已经放弃了sql和关系型数据库呢?你是否正在把你的数据转移到 一个公共的或私有的云数据库里呢?请发表评论。

<b>原文发布时间为:2012-02-21</b>

<b>本文来自云栖社区合作伙伴“linux中国”</b>