天天看點

SQL Server Parallel Data Warehouse (PDW) 介紹

最近大資料概念非常火熱,各個廠家都講大資料視為未來IT的一個重要方向,是以各個廠家都想在這個領域有所作為。前幾天參加了IBM大資料研讨會,會上IBM推出了他們針對于大資料的解決方案,三種一體機(PureSystem,另外IBM在推出了DB2 v10,為了打Oracle RAC專門設計的PureScale正式加入了DB2大版本中)。

在MPP架構方面,以前微軟是被诟病的,缺乏産品應對大資料的挑戰。之後從網上查了一下發現微軟從2008 R2之後也釋出了MPP資料倉庫架構,并且在今年會推出自己的一體機。

Microsoft has had clustering capabilities in SQL Server for a while, but the scalability part was lacking. This is where PDW comes in. Scalability in PDW means handling tens of terabytes of data and then moving to hundreds of terabytes worth (up to 600 TB). At about 50 terabytes to 60 terabytes of data, clustering is needed; thereafter, clustering starts to approach its limits, and that is when you need to move to PDW. Clustering brings concurrency to the system and reduces load, but it can’t reduce the time that a single query would take without any resource latency. To break this barrier, parallelism would be required to execute bits of the same request simultaneously and this is what exactly this setup would bring to the table. PDW partitions large tables across multiple physical nodes, each having its own dedicated CPU, memory, storage, and each running its own instance of SQL Server in a parallel shared nothing design. Tables can either be replicated, where a copy will be on each node (usually for dimension tables), or distributed, where portions of a table are uniformly distributed across all nodes (usually for fact tables).

Part of the technology incorporated into PDW includes a parallel database copy that enables rapid data movement and consistency between PDW and data marts used by SSAS.

In short, PDW is ideal for large data warehouses and BI, but not for OLTP systems. Write one check, and you get a complete soup-to-nuts data warehouse storage engine that includes everything from the servers, SAN, configuration, and training.

SQL Server Parallel Data Warehouse (PDW) 介紹
SQL Server Parallel Data Warehouse (PDW) 介紹

本文轉自 lzf328 51CTO部落格,原文連結:

http://blog.51cto.com/lzf328/1110581