天天看點

《Information Security》談大資料安全分析

在《Information Security》的2014年4月刊中,有一篇大資料安全分析的訪談。現摘錄轉載于此:

核心觀點:

大資料安全分析并非想象中那麼美好,是一個正确的方向,但還有很多工作要做,目前還是要謹慎行事。

Marcus Ranum: Anton, today I thought we could talk about big data and one of the firstquestions I should ask: Is it still just marketing hype? What do you think big data is?

Anton Chuvakin: As I mentioned in a recent blog post, if you fertilizethe field of big data with enough marketing hype, something will grow. Well, keep waiting for it.Use of big data analytics approaches for security seems like the most ‘BS-rich' area of theentire InfoSec realm. However, there are definitely end-user organizations doing it for real.

You and I both come from backgrounds involving a lot of trolling through data --specifically, system logs -- so I tend to see big data as a sort of ‘buy a great big backhoebecause you can do anything with a big enough backhoe' approach to data exploration, ratherthan data analysis.

Once you've figured out what fields you want to analyze and what you want to do with them,precomputing the data as it comes into your input stream makes more sense. It seems to me, big datais predicated on you not knowing what you're going to do with your data, so you should just throwlots of storage and CPUs at it. Does that sound right?

Chuvakin: Big data is predicated on you not knowing what to do with it in advance, but that isactually a good thing. The magic here comes from so-called late schema binding. If you have nasty,messy data and you do want to know what to do with it, you can come up with a schema basedon that knowledge, normalize the data to that schema and then toss it into an RDBMS.On the other hand nasty, messy data that you want to explore somehow may not be easy tonormalize, at least not at once. Thus, big data does often mean exploration and flexibility.

【不要為了大資料而大資料。否則,你将掉入另一個黑洞。】

Is big data only going to appeal to large businesses that are retrofitting new analysis atopold data dumps? It seems to me that it's something an organization can avoid, if IT departmentsactually think about what data they're collecting, what it means and then preprocessing itaccordingly.

Chuvakin: At this point, building your own big data platform is not just for the large, matureType A organizations. At Gartner, we say that big data analytics for security is for the ‘Type A ofType A.'【先進企業中的先進企業可以考慮用BDSA】

Our research shows that big data use for security will continue to be populated by the mostadvanced, mature, Type A organizations for the near future. Security may well be becoming a bigdata problem, but riding that big data wave will stay difficult and expensive for mostorganizations, at least for the next one to two years.

To add to this, several factors will make any semblance of massadoption of big data technology for security unlikely in the near term.

More informally, you think Oracle/SQL is hard and scary? Don't evencome within a mile radius of Hadoop.【用Hadoop其實比用傳統關系型資料更複雜,你hold住嗎?】

1) Load your data into Hadoop; 2) !?!?; and 3) Profit! Ultimately, it seems like big dataisn't going to solve the age-old problem: If you don't know what you're looking for you won't knowhow to look for it.

We've both been bumping up against this issue for a very long time in our system log analysisefforts. Do you see anything coming down the pike that's promising?

Chuvakin: Well, if you phrase it like that, it starts to sound pessimistic. However, if I insert‘data exploration' as step 2, it changes now, doesn't it? Big data approaches often do go bythat flow: collect->explore->profit. And big data tools make this possible, even if it's not easy.

Exploring unstructured big data piles, however, is much harder than running SIEM reports and mayinvolve text analytics, hard-core statistical methods and other esoteric disciplines that are farremoved from traditional security skill sets. It is not all about the keyword search.

Apart from exploration, more goal-driven approaches were also found to work for big data. Start thinking of clear goals and then testing them on data. Some organizations report success from usingthis model on security data as well as other big data.【對大資料的探索是大資料分析的關鍵,但這并非易事,絕不是全文檢索那麼簡單。分析方法、算法、模式很重要】

繼續閱讀