天天看點

HINT:主存中間隔的層次索引(CS)

索引間隔是一個基本問題,找到了廣泛的應用。最近關于管理主存中大量時間間隔集合的工作主要集中在重疊連接配接和時間聚合問題上。在本文中,我們提出了新的和高效的記憶體中間隔索引技術,重點關注間隔範圍查詢,這是許多搜尋和分析任務的基本組成部分。首先,我們提出了單層(平面)域分區方法的優化版本,由于過度複制,該方法可能有較大的空間需求。在此基礎上,提出了一種分層分區方法,将每個區間劃分為每層最多兩個分區,并控制了空間需求。我們的技術的新元素包括将每個分區上的間隔根據它們是開始于分區邊界内部還是在分區邊界之前劃分為組,将存儲在每個分區上的資訊減少到絕對必要的程度,以及有效地處理資料稀疏性和傾斜。對不同特征的真實和合成區間集的實驗結果表明,我們的方法通常比最先進的方法快一個數量級。

原文題目:HINT: A Hierarchical Index for Intervals in Main Memory

原文:Indexing intervals is a fundamental problem, finding a wide range of applications. Recent work on managing large collections of intervals in main memory focused on overlap joins and temporal aggregation problems. In this paper, we propose novel and efficient in-memory indexing techniques for intervals, with a focus on interval range queries, which are a basic component of many search and analysis tasks. First, we propose an optimized version of a single-level (flat) domain-partitioning approach, which may have large space requirements due to excessive replication. Then, we propose a hierarchical partitioning approach, which assigns each interval to at most two partitions per level and has controlled space requirements. Novel elements of our techniques include the division of the intervals at each partition into groups based on whether they begin inside or before the partition boundaries, reducing the information stored at each partition to the absolutely necessary, and the effective handling of data sparsity and skew. Experimental results on real and synthetic interval sets of different characteristics show that our approaches are typically one order of magnitude faster than the state-of-the-art.

HINT 主存中間隔的層次索引.pdf