天天看點

Storm 多語言支援

using non jvm languages with storm

<a href="https://github.com/nathanmarz/storm/wiki/using-non-jvm-languages-with-storm">https://github.com/nathanmarz/storm/wiki/using-non-jvm-languages-with-storm</a>

multilang protocol

<a href="https://github.com/nathanmarz/storm/wiki/multilang-protocol">https://github.com/nathanmarz/storm/wiki/multilang-protocol</a>

對于jvm語言比較簡單, 直接提高dsl封裝java即可 

對于非jvm語言就稍微複雜一些, storm分為兩部分, topology和component(blot和spout)

對于topology用其他語言實作比較easy, 因為nimbus是thrift server, 是以什麼語言最終都是都是轉化為thrift結構. 而且其實topology本身邏輯就比較簡單, 直接用java寫也行, 沒有太多的必要一定要使用其他的語言

對于component, 采用的方案和hadoop的一樣, 使用shell process來執行component, 并使用stdin, stdout作為component之間的通信 (json messages over stdin/stdout) 

目前storm, 實作python, ruby, 和fancy的版本, 如果需要支援其他的語言, 自己實作一下這個協定也應該很容易. 

其實component支援多語言比較必要, 因為很多分析或統計子產品, 不一定是使用java, 如果porting比較麻煩, 不象topology那麼簡單.

two pieces: creating topologies and implementing spouts and bolts in other languages

creating topologies in another language is easy since topologies are just thrift structures (link to storm.thrift)

implementing spouts and bolts in another language is called a "multilang components" or "shelling"

the thrift structure lets you define multilang components explicitly as a program and a script (e.g., python and the file implementing your bolt)

in java, you override shellbolt or shellspout to create multilang components

note that output fields declarations happens in the thrift structure, so in java you create multilang components like the following:

declare fields in java, processing code in the other language by specifying it in constructor of shellbolt

multilang uses json messages over stdin/stdout to communicate with the subprocess

storm comes with ruby, python, and fancy adapters that implement the protocol. show an example of python

python supports emitting, anchoring, acking, and logging

"storm shell" command makes constructing jar and uploading to nimbus easy

makes jar and uploads it

calls your program with host/port of nimbus and the jarfile id

bolt可以使用任何語言來定義. 用其它語言定義的bolt會被當作子程序(subprocess)來執行, storm使用json消息通過stdin/stdout來和這些subprocess通信. 

這個通信協定是一個隻有100行的庫, storm團隊給這些庫開發了對應的ruby, python和fancy版本.

python版本的bolt的定義, 和java版不同的是繼承shellbolt類

下面是splitsentence.py的定義: 

上面是使用python component的例子, 首先繼承shellbolt, 表示輸入輸出是通過shell stdin/stdout來完成的 

然後, 下面直接将python splitsentence.py作為子程序來調用

在python中, 首先import storm, 其中封裝了通信協定, 很簡單的100行, 可以看看

<a href="https://github.com/nathanmarz/storm/wiki/dsls-and-multilang-adapters">https://github.com/nathanmarz/storm/wiki/dsls-and-multilang-adapters</a>

<a href="https://github.com/velvia/scalastorm">scala dsl</a>

<a href="https://github.com/colinsurprenant/redstorm">jruby dsl</a>

<a href="https://github.com/nathanmarz/storm/wiki/clojure-dsl">clojure dsl</a>

前面說了, 對于jvm的語言, 很簡單隻是封裝一下java, 然後提供dsl即可, 上面列出所有官方提供的dsl 

可以簡單以clojure為例子, 了解一下

storm comes with a clojure dsl for defining spouts, bolts, and topologies. the clojure dsl has access to everything the java api exposes, so if you're a clojure user you can code storm topologies without touching java at all.

<a href="https://github.com/nathanmarz/storm/wiki/clojure-dsl">https://github.com/nathanmarz/storm/wiki/clojure-dsl</a>

<a href="https://github.com/nathanmarz/storm/wiki/defining-a-non-jvm-language-dsl-for-storm">https://github.com/nathanmarz/storm/wiki/defining-a-non-jvm-language-dsl-for-storm</a>

對于non-jvm語言, 通過storm shell指令也可以實作類似dsl

there's a "storm shell" command that will help with submitting a topology. its usage is like this:

storm shell will then package resources/ into a jar, upload the jar to nimbus, and call your topology.py script like this:

then you can connect to nimbus using the thrift api and submit the topology, passing {uploaded-jar-location} into the submittopology method. for reference, here's the submittopology definition:

void submittopology(1: string name, 2: string uploadedjarlocation, 3: string jsonconf, 4: stormtopology topology) throws (1: alreadyaliveexception e, 2: invalidtopologyexception ite);

本文章摘自部落格園,原文釋出日期: 2013-05-10