the following program is a complete, working example of streaming window word count application, that counts the words coming from a web socket in 5 second windows.


flink应用的代码结构如下,
flink datastream programs look like regular java programs with a <code>main()</code> method. each program consists of the same basic parts:
obtaining a <code>streamexecutionenvironment</code>,
connecting to data stream sources,
specifying transformations on the data streams,
specifying output for the processed data,
executing the program.
以这个例子,说明
首先会创建sockettextstream,从socket读入text流
接着是个flatmap,和map的不同是,map,1->1,而flatmap为1->n,而这个splitter就是将text用“”分割,将每个word作为一个tuple输出
最后,keyby产生一个有key的tuple流,这里是以word为key
基于5s的timewindow,对后面的计数进行sum
最终,output是print
太常用的就不列了
==============================================================================================
reduce
keyedstream → datastream
a "rolling" reduce on a keyed data stream. combines the current element with the last reduced value and emits the new value.


fold
a "rolling" fold on a keyed data stream with an initial value. combines the current element with the last folded value and emits the new value.
a fold function that, when applied on the sequence (1,2,3,4,5), emits the sequence "start-1", "start-1-2", "start-1-2-3", ...


fold和reduce的区别,fold可以有个初始值,而且foldfunciton可以将一种类型fold到另一种类型
而reduce function,只能是一种类型
aggregations
rolling aggregations on a keyed data stream.
the difference between min and minby is that min returns the minimun value, whereas minby returns the element that has the minimum value in this field (same for max and maxby).


可以认为是特殊的reduce
不带by,只是返回value
带by,返回整个element
=============================================================================================
union
datastream* → datastream
union of two or more data streams creating a new stream containing all the elements from all the streams. node: if you union a data stream with itself you will get each element twice in the resulting stream.
connect
datastream,datastream → connectedstreams
"connects" two data streams retaining their types. connect allowing for shared state between the two streams.
connect就是两个不同type的流可以共享一个流,tuple可以同时拿到来自两个流的数据
comap, coflatmap
connectedstreams → datastream
similar to map and flatmap on a connected data stream


split
datastream → splitstream
split the stream into two or more streams according to some criterion.


select
splitstream → datastream
select one or more streams from a split stream.
====================================================================================
project
datastream → datastream
selects a subset of fields from the tuples
===========================================================================================
window
keyedstream → windowedstream
基于keyedstream的window
windowall
datastream → allwindowedstream
warning: this is in many cases a non-parallel transformation. all records will be gathered in one task for the windowall operator.
主要,由于没有key,所以如果要对all做transform,是无法parallel的,只能在一个task里面做
window apply
windowedstream → datastream
allwindowedstream → datastream
applies a general function to the window as a whole. below is a function that manually sums the elements of a window.
note: if you are using a windowall transformation, you need to use an allwindowfunction instead.
window reduce
windowedstream → datastream
applies a functional reduce function to the window and returns the reduced value.
aggregations on windows
aggregates the contents of a window. the difference between min and minby is that min returns the minimun value, whereas minby returns the element that has the minimum value in this field (same for max and maxby).
window join
datastream,datastream → datastream
join two data streams on a given key and a common window.
类似storm的group方式,可以自己配置
hash partitioning, 等同于 groupby field
identical to keyby but returns a datastream instead of a keyedstream.
custom partitioning
uses a user-defined partitioner to select the target task for each element.
random partitioning,等同于shuffle
partitions elements randomly according to a uniform distribution.
rebalancing (round-robin partitioning)
partitions elements round-robin, creating equal load per partition. useful for performance optimization in the presence of data skew.
这个保证数据不会skew,round-robin就是每个一条,轮流来
broadcasting,等同于globle
broadcasts elements to every partition.
chaining two subsequent transformations means co-locating them within the same thread for better performance.
flink by default chains operators if this is possible (e.g., two subsequent map transformations).
the api gives fine-grained control over chaining if desired:
start new chain
begin a new chain, starting with this operator. the two mappers will be chained, and filter will not be chained to the first mapper.
注意startnewchain是应用于,左边的那个operator,所以上面从第一个map开始start new chain
disable chaining
do not chain the map operator
start a new resource group
start a new resource group containing the map and the subsequent operators.
isolate resources
isolate the operator in its own slot.
使用独立的slot
只有下面两个和batch的配置不同,
parameters in the <code>executionconfig</code> that pertain specifically to the datastream api are:
<code>enabletimestamps()</code> / <code>disabletimestamps()</code>: attach a timestamp to each event emitted from a source.<code>aretimestampsenabled()</code>returns the current value.
<code>setautowatermarkinterval(long milliseconds)</code>: set the interval for automatic watermark emission. you can get the current value with<code>long getautowatermarkinterval()</code>
a localenvironment is created and used as follows:
collection data sources can be used as follows:


flink also provides a sink to collect datastream results for testing and debugging purposes. it can be used as follows:
working with time
3种时间,
processing time,真正的处理时间
event time, 事件真正发生的时间
ingestion time,数据进入flink时间,在data source
默认是用processing 时间,
如果要用event time,you need to follow four steps:
set <code>env.setstreamtimecharacteristic(timecharacteristic.eventtime)</code>
use <code>datastream.assigntimestamps(...)</code> in order to tell flink how timestamps relate to events (e.g., which record field is the timestamp)
set <code>enabletimestamps()</code>, as well the interval for watermark emission (<code>setautowatermarkinterval(long milliseconds)</code>) in<code>executionconfig</code>.
for example, assume that we have a data stream of tuples, in which the first field is the timestamp (assigned by the system that generates these data streams), and we know that the lag between the current processing time and the timestamp of an event is never more than 1 second:


basic window constructs
tumbling time window,非滑动
defines a window of 5 seconds, that "tumbles".
sliding time window,滑动
defines a window of 5 seconds, that "slides" by 1 seconds.
tumbling count window
sliding count window
advanced window constructs
the general recipe for building a custom window is to specify (1) a <code>windowassigner</code>, (2) a <code>trigger</code> (optionally), and (3) an<code>evictor</code> (optionally).
上面的如timewindow,是封装好的,而如果用advanced构建方式,需要3步,
1. 首先是<code>windowassigner</code>,主要是滑动和非滑动两类,解决主要的是where的问题
global window
all incoming elements of a given key are assigned to the same window. the window does not contain a default trigger, hence it will never be triggered if a trigger is not explicitly specified.
用于count window
tumbling time windows
默认的trigger,
先理解watermark的含义:当我收到一个watermark时,表示我不可能收到event time 小于该water mark的数据
所以我收到的water mark都大于我window的结束时间,说明,window的数据已经到齐了,可以触发trigger
sliding time windows
默认的trigger与上同,
2. 第二步,是定义trigger,何时触发,解决的是when的问题
the <code>trigger</code> specifies when the function that comes after the window clause (e.g., <code>sum</code>, <code>count</code>) is evaluated (“fires”) for each window.
if a trigger is not specified, a default trigger for each window type is used (that is part of the definition of the<code>windowassigner</code>).
processing time trigger
a window is fired when the current processing time exceeds its end-value. the elements on the triggered window are henceforth discarded.
watermark trigger
a window is fired when a watermark with value that exceeds the window's end-value has been received. the elements on the triggered window are henceforth discarded.
continuous processing time trigger
a window is periodically considered for being fired (every 5 seconds in the example). the window is actually fired only when the current processing time exceeds its end-value. the elements on the triggered window are retained.
continuous watermark time trigger
a window is periodically considered for being fired (every 5 seconds in the example). a window is actually fired when a watermark with value that exceeds the window's end-value has been received. the elements on the triggered window are retained.
这个和上面的不同,在于,window在触发后,不会被discard,而是会保留,并且每隔一段时间会反复的触发
count trigger
a window is fired when it has more than a certain number of elements (1000 below). the elements of the triggered window are retained.
按count触发,window会被保留
purging trigger
takes any trigger as an argument and forces the triggered window elements to be "purged" (discarded) after triggering.
上面有些trigger是会retain数据的,如果你想discard,怎么搞? 用purgingtrigger
delta trigger
a window is periodically considered for being fired (every 5000 milliseconds in the example). a window is actually fired when the value of the last added element exceeds the value of the first element inserted in the window according to a `deltafunction`.
delta trigger,即,每次会通过getdelta比较新来的值和旧值的delta,当delta大于定义的阈值时,就会fire
3. 最后,指定<code>evictor</code>
after the trigger fires, and before the function (e.g., <code>sum</code>, <code>count</code>) is applied to the window contents, an optional <code>evictor</code>removes some elements from the beginning of the window before the remaining elements are passed on to the function.
说白了,当windows被触发时,我们可以选取部分数据进行处理,
evictor,清除者,即清除部分数据,保留你想要的
time evictor
evict all elements from the beginning of the window, so that elements from end-value - 1 second until end-value are retained (the resulting window size is 1 second).
count evictor
retain 1000 elements from the end of the window backwards, evicting all others.
逻辑是保留,而不是清除,比如countevictor.of(1000)是保留最后1000个,有点不好理解
delta evictor
starting from the beginning of the window, evict elements until an element with value lower than the value of the last element is found (by a threshold and a deltafunction).
recipes for building windows
下面给出一些window定义的例子,理解一下,例子给的太简单
windows on unkeyed data streams
window,也可以用于unkeyed的数据流,
不同,是在window后面加上all,
tumbling time window all
datastream → windowedstream
defines a window of 5 seconds, that "tumbles". this means that elements are grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. the notion of time used is controlled by the streamexecutionenvironment.
sliding time window all
defines a window of 5 seconds, that "slides" by 1 seconds. this means that elements are grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than one window (since windows overlap by at least 4 seconds) the notion of time used is controlled by the streamexecutionenvironment.
tumbling count window all
defines a window of 1000 elements, that "tumbles". this means that elements are grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, and every element belongs to exactly one window.
sliding count window all
defines a window of 1000 elements, that "slides" every 100 elements. this means that elements are grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, and every element can belong to more than one window (as windows overlap by at least 900 elements).
all transformations in flink may look like functions (in the functional processing terminology), but are in fact stateful operators.
you can make everytransformation (<code>map</code>, <code>filter</code>, etc) stateful by declaring local variables or using flink’s state interface.
you can register any local variable as managedstate by implementing an interface.
in this case, and also in the case of using flink’s native state interface, flink will automatically take consistent snapshots of your state periodically, and restore its value in the case of a failure.
the end effect is that updates to any form of state are the same under failure-free execution and execution under failures.
first, we look at how to make local variables consistent under failures, and then we look at flink’s state interface.
by default state checkpoints will be stored in-memory at the jobmanager. for proper persistence of large state, flink supports storing the checkpoints on file systems (hdfs, s3, or any mounted posix file system), which can be configured in the <code>flink-conf.yaml</code> or via<code>streamexecutionenvironment.setstatebackend(…)</code>.
这块是flink流式处理的核心价值,可以方便的checkpoint的local state,有几种方式,后面会具体说;
默认情况下,这些checkpoints 是存储在jobmanager的内存中的,当然也可以配置checkpoint到文件系统
checkpointing local variables
这个比较好理解
local variables can be checkpointed by using the <code>checkpointed</code> interface.
when the user-defined function implements the <code>checkpointed</code> interface, the <code>snapshotstate(…)</code> and <code>restorestate(…)</code> methods will be executed to draw and restore function state.


如上,只是实现snapshotstate和restorestate,就可以对local变量counter实现checkpoint,这个很好理解
n addition to that, user functions can also implement the <code>checkpointnotifier</code> interface to receive notifications on completed checkpoints via the<code>notifycheckpointcomplete(long checkpointid)</code> method. note that there is no guarantee for the user function to receive a notification if a failure happens between checkpoint completion and notification. the notifications should hence be treated in a way that notifications from later checkpoints can subsume missing notifications.、
除此,还能实现<code>checkpointnotifier</code> ,这样当完成checkpoints时,会调用<code>notifycheckpointcomplete,但不能保证一定触发</code>
using the key/value state interface
这个是显式调用state interface
the state interface gives access to key/value states, which are a collection of key/value pairs.
because the state is partitioned by the keys (distributed accross workers), it can only be used on the <code>keyedstream</code>, created via <code>stream.keyby(…)</code> (which means also that it is usable in all types of functions on keyed windows).
the handle to the state can be obtained from the function’s <code>runtimecontext</code>.
the state handle will then give access to the value mapped under the key of the current record or window - each key consequently has its own value.
the following code sample shows how to use the key/value state inside a reduce function.
when creating the state handle, one needs to supply a name for that state (a function can have multiple states of different types), the type of the state (used to create efficient serializers), and the default value (returned as a value for keys that do not yet have a value associated).


state updated by this is usually kept locally inside the flink process (unless one configures explicitly an external state backend). this means that lookups and updates are process local and this very fast.
the important implication of having the keys set implicitly is that it forces programs to group the stream by key (via the<code>keyby()</code> function), making the key partitioning transparent to flink. that allows the system to efficiently restore and redistribute keys and state.
the scala api has shortcuts that for stateful <code>map()</code> or <code>flatmap()</code> functions on <code>keyedstream</code>, which give the state of the current key as an option directly into the function, and return the result with a state update:


state checkpoints in iterative jobs
flink currently only provides processing guarantees for jobs without iterations. enabling checkpointing on an iterative job causes an exception. in order to force checkpointing on an iterative program the user needs to set a special flag when enabling checkpointing:<code>env.enablecheckpointing(interval, force = true)</code>.
please note that records in flight in the loop edges (and the state changes associated with them) will be lost during failure.
对于iterative,即有环的case,做checkpoint更加复杂点,并且恢复后,会丢失中间过程,比如n次迭代,执行到n-1次,失败,还是要从1开始
for example, here is program that continuously subtracts 1 from a series of integers until they reach zero:


这个直接看例子,
首先,someintegers是一个由0到1000的datastream
对于每个tuple,都需要迭代的执行一个map function,在这儿,会不断减一
什么时候结束,
根据iteration.closewith,closewith后面是一个filter,如果filter返回为true,这个tuple就继续iterate,如果返回为false,就close iterate
而最后的lessthanzero是someintegers经过iterate后,最终产生的输出datastream
connectors provide code for interfacing with various third-party systems.
currently these systems are supported:
只看下kafka,
then, import the connector in your maven project:


如何fault tolerance?
with flink’s checkpointing enabled, the flink kafka consumer will consume records from a topic and periodically checkpoint all its kafka offsets, together with the state of other operations, in a consistent manner. in case of a job failure, flink will restore the streaming program to the state of the latest checkpoint and re-consume the records from kafka, starting from the offsets that where stored in the checkpoint.
原理就是会和其他state一起把所有的kafka partition的offset都checkpoint下来,这样恢复的时候,可以从这些offset开始读;
to use fault tolerant kafka consumers, checkpointing of the topology needs to be enabled at the execution environment:
if checkpointing is not enabled, the kafka consumer will periodically commit the offsets to zookeeper.
由于用的是simple consumer,所以就算不开checkpoint,offset也要被记录;这里使用通常的做法把kafka的offset记录到zookeeper
也可以把数据写入kafka,<code>flinkkafkaproducer</code>
the <code>flinkkafkaproducer</code> writes data to a kafka topic. the producer can specify a custom partitioner that assigns recors to partitions.